Increased latencies, failures in text-davinci-002

Resolved·Partial outage

We’ve published a write-up of this incidentRead the write-up

Read it here

Affected components

Sep 6, 2022, 10:35 PM

11:23 PM

Updates

Write-up published

Read it here

Resolved

An internal configuration error caused text-davinci-002 and code-davinci-002 to receive unanticipated load starting at 21:03 UTC on 2022-09-06. The sudden increase in workload led to increased failure rates and longer response times, negatively impacting customer experience. We reverted the misconfiguration and rebalanced the load.

‌

To prevent this from happening again, we are working on improving the test coverage for regressions in this area and have improved the rollout logic and alerting to catch issues earlier.

Thu, Sep 8, 2022, 12:32 AM

Resolved

The system continues to operate as expected. We are marking this incident as resolved.

Wed, Sep 7, 2022, 01:40 AM(22 hours earlier)

Monitoring

Latencies appear to have returned to normal. We will continue to monitor.

Tue, Sep 6, 2022, 11:23 PM(2 hours earlier)

Identified

Our mitigation rollout appears to be working, latencies are in the process of returning to normal & failure rates have dropped.

Tue, Sep 6, 2022, 11:10 PM(13 minutes earlier)

Investigating

We are experiencing increased latencies for models across the board starting 2pm, leading to increased load & errors in text-davinci-002 by around 3pm. We have a potential mitigation we are going to try.

Tue, Sep 6, 2022, 09:00 PM(2 hours earlier)

Availability metrics are reported at an aggregate level across all tiers, models and error types. Individual customer availability may vary depending on their subscription tier as well as the specific model and API features in use.