An internal configuration error caused text-davinci-002 and code-davinci-002 to receive unanticipated load starting at 21:03 UTC on 2022-09-06. The sudden increase in workload led to increased failure rates and longer response times, negatively impacting customer experience. We reverted the misconfiguration and rebalanced the load.
To prevent this from happening again, we are working on improving the test coverage for regressions in this area and have improved the rollout logic and alerting to catch issues earlier.
Posted Sep 07, 2022 - 17:54 PDT
The system continues to operate as expected. We are marking this incident as resolved.
Posted Sep 06, 2022 - 18:40 PDT
Latencies appear to have returned to normal. We will continue to monitor.
Posted Sep 06, 2022 - 16:23 PDT
Our mitigation rollout appears to be working, latencies are in the process of returning to normal & failure rates have dropped.
Posted Sep 06, 2022 - 16:10 PDT
We are experiencing increased latencies for models across the board starting 2pm, leading to increased load & errors in text-davinci-002 by around 3pm. We have a potential mitigation we are going to try.