Red Hat 3scale Status - Service Management API

Service Management API - Service disruption

Incident Report for Red Hat 3scale

Postmortem

On November 8th of 2021 a very high rate of 500 HTTP responses was detected in the Service Management API frontend affecting 90% of the traffic. The root cause was tracked down to an intermediate proxy layer between the frontend instances and the sharded storage layer. Remediation actions were undertaken in this proxy layer to restore the service.

Timeline:

Nov 8, 2021 09:55 UTC - high rate of HTTP 5XX errors starts (90% of traffic).
Nov 8, 2021 10:33 UTC - back to 100% traffic OK.

Root cause:

Unexpected error on an intermediate proxy layer between Service Management API fronted instances and sharded storage layer.

Posted Nov 08, 2021 - 16:21 CET

Resolved

This incident has been resolved.

Several incidents were opened and closed during the service disruption due to the fact that our status page is updated automatically based on HTTP probe results in order to give users quick feedback on the status of the service. As a 10% of the requests were still being correctly processed, some probes were successful during the incident, causing the status page incidents to be resolved.

A postmortem will be published soon.

Posted Nov 08, 2021 - 11:33 CET

Investigating

Our operations team is working to identify the root cause and implement a solution.

Posted Nov 08, 2021 - 10:55 CET

This incident affected: Service Management API.