Last week, a cascading set of failures, partially resulting from historically high load and unexpected upstream interruptions, led to degraded performance on our API. Not all customers were affected, but some observed increased latencies when making requests for completions, in some cases leading to timeouts. Some customers, particularly those using fine-tuned models, also observed HTTP 429 errors, with a message that the requested model was still being loaded. And in some instances requests were dropped with HTTP 503 errors.
We have taken immediate steps to resolve these issues. We’ve also made investments, and prioritized others, to ensure we never encounter these issues again, even as our request volume continues to increase. We’ve fixed several newly identified bugs in our system; made ourselves more resilient to upstream failures from our cloud provider; and improved the scaling of historically fixed capacities in our stack to adapt to increased load.
Latency and reliability are the highest priority for our team. We deeply apologize for the service interruptions and degradation of service.