On Wednesday May 12 2023, from approximately 11:45am to 12:10pm GPT-3.5, GPT-3.5 Turbo and GPT-4 models were unavailable due to an incorrect deployment to our safety classifier configuration. Most customers started experiencing errors at 11:55am. After we detected an elevated error rate, we quickly rolled back the configuration change which restored service.
We have since fixed our tooling to catch these errors before they hit production. Also, we have added increased alerting to detect these errors more quickly to enable us to roll back faster. Lastly, we have a project already in progress to incrementally deploy changes, so that we can detect and revert errors while only running on a small percentage of traffic. That project will be operational within the quarter.