On May 8th 2024 the Service Management API, ActiveDocs Proxy and Echo API suffered an outage for 41 minutes.
Timeline:
May 8, 2023 12:58 UTC: Update of a production cluster
May 8, 2023 13:17 UTC: Alerts start firing for the internal API, Echo API and ActiveDocs Proxy.
May 8, 2023 13:58 UTC: Root cause identified and solution deployed
May 8, 2023 14:00 UTC: Service is restored
Root Cause:
Unexpected behaviour of the network controller managing load balancers recreated some resources with invalid configuration.
Preventative Actions:
The network controller will be replaced with one that should behave as expected.
Posted May 09, 2024 - 17:31 CEST
Resolved
This incident has been resolved.
The incident cause was a misconfiguration in our load balancing layer, we are currently investigating what triggered that unexpected change. We will write a postmortem once we have all the information available.
Posted May 07, 2024 - 15:59 CEST
Investigating
Our operations team is working to identify the root cause and implement a solution.