On March 20, 2025, between 4:41 PM and 5:12 PM PT, our chat completions endpoint experienced an elevated rate of HTTP 431 errors in a single cluster. Customers whose requests were routed to the impacted cluster experienced an average error rate of 70%. The global error rate was approximately 8%. Customers routed through other clusters experienced no impact.
The root cause was a faulty code deployment that was only enabled in the affected cluster. Upon identifying the issue, we took immediate action to evacuate traffic from the impacted cluster. Full service was restored by 5:12 PM. The faulty code has since been reverted, and additional monitoring has been implemented to prevent similar incidents in the future.