On Jan 24th 2019 a complete outage of 12 minutes happened on Admin Portal UI, Developer Portal UI and APIs for Account Management, Analytics, and Billing. Note the Service Management API for auth services of API traffic was not affected in any way. It was due to a database maintenance which was not supposed to impact any customer.
Timeline:
Jan. 24, 10:07:11 UTC - queries were using a deleted column
Jan. 24, 10:16:11 UTC - everything was back to normal
Jan. 24, 16:18:11 UTC - queries were using a deleted column
Jan. 24, 16:21:11 UTC - everything was back to normal
Service impact:
No impact on Service Management API i.e. no impact on 3scale SLAs and API traffic auth services continued as normal
Jan. 24, 10:07:11 UTC - Admin Portal UI, Developer Portal UI and 3scale APIs not accepting requests - 5XX response code returned
Jan. 24, 10:16:11 UTC - Admin Portal UI, Developer Portal UI and 3scale APIs back to 100% traffic after 9 minutes
Jan. 24, 16:18:11 UTC - Admin Portal UI, Developer Portal UI and 3scale APIs not accepting requests - 5XX response code returned
Jan. 24, 16:21:11 UTC - Admin Portal UI, Developer Portal UI and 3scale APIs back to 100% traffic after 3 minutes
During a database schema migration, some cached application processes attempted to use a column being deleted, causing some queries to the database to fail.
Ensure that the deleted columns are not cached in the running processes.
Catch those migrations during development time and write a procedure how to do them properly