Service Management API Errors

Incident Report for Red Hat 3scale

Postmortem

On July 24, 2017, we experienced an elevated number of errors in our Service Management API. The outage was caused by an automated database failover which has failed during a maintenance task, corrupting the data on one of the shards in our database cluster.

To reestablish the service our team had to restore the database from a backup file. Due to the size of the backup file, the restore operation has taken several minutes to be accomplished.

After the backup restoration, the database has been successfully restored, and all the errors in our Service Management API have been resolved.

Posted Jul 24, 2017 - 21:06 CEST

Resolved

The incident has been resolved.

Posted Jul 24, 2017 - 19:36 CEST

Monitoring

The team has restored the affected service and we are monitoring the situation

Posted Jul 24, 2017 - 19:14 CEST

Identified

The issue has been identified, and we are working on a fix.

Posted Jul 24, 2017 - 19:06 CEST

Investigating

We are investigating an elevated number of errors in our Service Management API.

Posted Jul 24, 2017 - 18:54 CEST