On March 10th of 2020 a new deployment of the user interface application combined with high memory consumption in some frontend servers, caused an exhaustion of memory. During that timeframe, an increase of 502 errors was detected in our monitoring system and the engineering team took action.
Timeline:
Mar 10, 11:40 CET
An increase of 502 errors and memory consumption in some frontend servers raise alerts in our monitoring system.
Mar 10, 11:42 CET
The issue is detected during the deployment of a new release and the process is stopped. The affected servers are restored and the investigation starts.
Mar 10, 12:14 CET
A new deployment is executed in a reduced number of servers.
Mar 10, 12:29 CET
A new deployment is executed in the rests of the servers.
Mar 10, 12:30 CET
The service is stable and no more 502 response codes are detected.
Root cause:
Abnormal memory consumption affecting some frontend servers during a deployment.
Preventive Actions:
Increased the resilience of the frontend layer
Improved the deployment process to avoid service affectations during new releases
Posted Mar 13, 2020 - 13:49 CET
Resolved
This incident has been resolved.
Posted Mar 10, 2020 - 12:30 CET
Investigating
Our operations team is working to identify the root cause and implement a solution.