Outage on some Ada and Babbage models
Incident Report for OpenAI
Postmortem

Summary:

On July 6, 2022 22:17 PDT (July 7 05:17 UTC), the following models had partial or complete outages:

  • text-similarity-ada-001
  • text-search-babbage-doc-001
  • content-filter-alpha
  • text-search-babbage-query-001
  • text-search-ada-doc-001

Models were unavailable for one and a half hours. This was caused by an unanticipated maintenance event performed by Azure which took offline VMs that these models were served. Models were reconfigured to depend on VMs that were not taken offline by the maintenance event.

Root Cause:

A misconfiguration in our Azure activity log alerts caused this scheduled maintenance to occur without prior notice on our end.

Future Mitigations:

Evaluate the activity log notification configurations for our critical resources to ensure notifications are received in a timely manner.

Posted Jul 11, 2022 - 18:03 PDT

Resolved
All models have recovered and are operational. Thank you for your patience.
Posted Jul 07, 2022 - 01:12 PDT
Monitoring
We’ve moved affected models to other regions, and are seeing broad recovery. We are continuing to monitor while we investigate root cause.
Posted Jul 06, 2022 - 23:56 PDT
Update
We have lost the ability to communicate with virtual machines in one of the regions in which we operate. We are mitigating by moving capacity to other regions. We are starting to see recovery on some affected models.
Posted Jul 06, 2022 - 23:44 PDT
Identified
We have identified an issue affecting some Ada and Babbage models and are working on a remediation.
Posted Jul 06, 2022 - 23:07 PDT
This incident affected: API.