We are currently experiencing an elevated level of API errors in our 3scale Management API
Incident Report for Red Hat 3scale
Postmortem

On Jan 24th 2019 a complete outage of 12 minutes happened on Admin Portal UI, Developer Portal UI and APIs for Account Management, Analytics, and Billing. Note the Service Management API for auth services of API traffic was not affected in any way. It was due to a database maintenance which was not supposed to impact any customer.

Timeline:

Jan. 24, 10:07:11 UTC - queries were using a deleted column

Jan. 24, 10:16:11 UTC - everything was back to normal

Jan. 24, 16:18:11 UTC - queries were using a deleted column

Jan. 24, 16:21:11 UTC - everything was back to normal

Service impact:

No impact on Service Management API i.e. no impact on 3scale SLAs and API traffic auth services continued as normal

Jan. 24, 10:07:11 UTC - Admin Portal UI, Developer Portal UI and 3scale APIs not accepting requests - 5XX response code returned

Jan. 24, 10:16:11 UTC - Admin Portal UI, Developer Portal UI and 3scale APIs back to 100% traffic after 9 minutes

Jan. 24, 16:18:11 UTC - Admin Portal UI, Developer Portal UI and 3scale APIs not accepting requests - 5XX response code returned

Jan. 24, 16:21:11 UTC - Admin Portal UI, Developer Portal UI and 3scale APIs back to 100% traffic after 3 minutes

Root Cause

During a database schema migration, some cached application processes attempted to use a column being deleted, causing some queries to the database to fail.

Preventative Actions

Ensure that the deleted columns are not cached in the running processes.

Catch those migrations during development time and write a procedure how to do them properly

Posted 6 months ago. Feb 01, 2019 - 22:10 CET

Resolved
This incident has been resolved.
Posted 6 months ago. Jan 24, 2019 - 11:31 CET
Monitoring
A fix has been implemented and we are monitoring the results.
Posted 6 months ago. Jan 24, 2019 - 11:26 CET
Identified
Our operations team is working to identify the root cause and implement a solution.
Posted 6 months ago. Jan 24, 2019 - 11:24 CET
This incident affected: Account Management API, Analytics API, Billing API.