On November 8th of 2021 a very high rate of 500 HTTP responses was detected in the Service Management API frontend affecting 90% of the traffic. The root cause was tracked down to an intermediate proxy layer between the frontend instances and the sharded storage layer. Remediation actions were undertaken in this proxy layer to restore the service.
Nov 8, 2021 09:55 UTC - high rate of HTTP 5XX errors starts (90% of traffic).
Nov 8, 2021 10:33 UTC - back to 100% traffic OK.