OpenAI

Full outage on all API endpoints
Resolved

At 3:28pm PDT on Mar 16, 2023, a feature flag caused unexpected failures on a critical service. Despite a steady rollout from 1% -> 50% -> 100%, the failure was not identified until critical systems went offline. As a result, APIs across all of OpenAI’s products experienced 7 minutes of downtime before the cause was identified, reverted, and service was restored. We will be increasing the observability of such changes to ensure we catch these problems before they fully roll out, and make root causes clearer to reduce recovery time.

Thu, Mar 16, 2023, 10:30 PM
(2 years ago)
·
Affected components

No components marked as affected

Updates

Resolved

At 3:28pm PDT on Mar 16, 2023, a feature flag caused unexpected failures on a critical service. Despite a steady rollout from 1% -> 50% -> 100%, the failure was not identified until critical systems went offline. As a result, APIs across all of OpenAI’s products experienced 7 minutes of downtime before the cause was identified, reverted, and service was restored. We will be increasing the observability of such changes to ensure we catch these problems before they fully roll out, and make root causes clearer to reduce recovery time.

Thu, Mar 16, 2023, 10:30 PM
Powered by

Availability metrics are reported at an aggregate level across all tiers, models and error types. Individual customer availability may vary depending on their subscription tier as well as the specific model and API features in use.