All Systems Operational
API ? Operational
90 days ago
99.97 % uptime
Playground Site ? Operational
90 days ago
99.98 % uptime
DALL·E ? Operational
90 days ago
99.96 % uptime
Degraded Performance
Partial Outage
Major Outage
Major outage
Partial outage
No downtime recorded on this day.
No data exists for this day.
had a major outage.
had a partial outage.
Past Incidents
Jun 26, 2022

No incidents reported today.

Jun 25, 2022

No incidents reported.

Jun 24, 2022
Resolved - All models are operational. Thank you for your patience.
Jun 24, 08:25 PDT
Update - Babbage is now stable. We're investigating 1-2 remaining issues with our fine tuned curie models and a few lesser used engines.
Jun 24, 08:11 PDT
Update - At this time davinci fine-tuned models should be back to normal. We're investigating an issue with our babbage engine.
Jun 24, 07:30 PDT
Update - We have brought back our original cluster and are bringing back traffic. As of this post, davinci fine-tuned models should be normalizing in latency and error rates.
Jun 24, 06:46 PDT
Update - Davinci fine-tuned models are coming back up but are seeing increased latency. We are continuing to work to resolve this outage.
Jun 24, 05:24 PDT
Update - We do not have a resolution on this incident but we are working with our upstream partners for support. Users of davinci fine-tuned models are still advised to use text-davinci-002 for the time being.
Jun 24, 04:14 PDT
Investigating - Fine-tuned Davinci model inference is still degraded. We are exploring alternate theories as to what is causing very high latency on these models. Given the set of root causes that have already been ruled out, this unfortunately is indicating that a much more extensive investigation will be needed to fully remediate fine-tuned Davinci model performance.

We suggest using the text-davinci-002 model as a temporary backup while we work to restore fine-tuned Davinci. The text-davinci-002 model is both fully operational and can approach the capability of fine-tuned Davinci models for many applications.

All other public production models are operating nominally and we have restored the original cluster that had an outage.

Jun 24, 02:38 PDT
Update - Fine-tuned curie model inference has returned to normal.

Fine-tuned davinci model inference is still in a degraded state.

Jun 24, 01:10 PDT
Monitoring - We are seeing error rates drop on curie fine-tuned models as well as davinci fine-tuned models. We're actively monitoring the situation.
Jun 24, 00:26 PDT
Identified - We are continuing to address health issues with fine-tuned curie and fine-tuned davinci models.

In addition to aforementioned model loading issues, we are experiencing limits in our capacity while we restore the cluster that went out.

All other models are operational.

Jun 23, 23:45 PDT
Monitoring - We believe we have found a stable arrangement of our infrastructure. All models are responding to requests; however, fine-tuned davinci and fine-tuned curie have an elevated rates of 429s and 499s.

The fine-tuned davinci and fine-tuned curie model errors are due to customer model weights taking a long time to load. Normally these weights are heavily cached; however, due to these cluster rearrangements, those caches need to be restored. The sudden influx of requests to restore those caches is causing slowdowns upstream from our storage accounts. We expect the error rates to steadily decline, but may take longer than normal due to these bottlenecks.

Jun 23, 23:03 PDT
Update - We are continuing to move infrastructure around in our operational clusters to ensure all models are performing optimally with the resources we have. We are much closer to a stable configuration, but are still re-allocating resources to better bring down error rates.

Some Fine-tuned curie models are the most heavily affected right now as we continue to move resources around.

Jun 23, 22:15 PDT
Update - We have now moved all models from the broken cluster to new clusters; however, we are still suffering from some warmup and capacity issues.

Fine-tuned davinci and curie models are warming up. Their performance should improve over time and the rates of 429s and 499s should steadily decrease.

We're also experiencing capacity issues with Codex davinci and cushman engines. We are actively working to fix these. Until then, they will have degraded performance until these issues get resolved.

Jun 23, 21:39 PDT
Identified - One of our clusters has suffered a major communication outage within kubernetes. This has affected the models that are hosted in that cluster.

This includes the following models:
- Inference for fine-tuned davinci and curie models
- Codex: code-davinci-001, and code-cushman-001
- Legacy curie, babbage, and ada
- Embeddings models

We are actively working to migrate most of these models to a functioning cluster. Affected models should be coming online as this happens.

Due to capacity constraints, we unfortunately expect to see some temporary performance and latency degradations in other models as we move infrastructure around.

Jun 23, 21:14 PDT
Update - We are currently in a state of degraded performance for most engines. We are still working to recover.
Jun 23, 20:43 PDT
Update - We know the source of the outage and are working to mitigate.
Jun 23, 20:10 PDT
Investigating - One of our clusters has had an outage affecting some engines. We are investigating.
Jun 23, 19:02 PDT
Jun 23, 2022
Jun 22, 2022

No incidents reported.

Jun 21, 2022
Resolved - A fix has been implemented by our service provider. Playground & DALL-E are now accessible.
Jun 21, 00:38 PDT
Monitoring - Playground & DALL-E websites are currently unavailable due to an outage from our authentication service provider. We're currently monitoring the situation.
Jun 21, 00:01 PDT
Identified - Playground & DALL-E websites are currently unavailable due to an outage from our authentication service provider. We're currently monitoring the situation.
Jun 21, 00:01 PDT
Investigating - Investigating reports of Playground not being accessible.
Jun 21, 00:00 PDT
Jun 20, 2022
Resolved - Serving of babbage models is now healthy. The cause of the issue appears to have been some sort of sporadic failure. We will continue to investigate root cause and future mitigations.
Jun 20, 10:06 PDT
Monitoring - The babbage models appear to have recovered, we are monitoring and continuing to investigate root cause.
Jun 20, 09:52 PDT
Investigating - Base Babbage models are currently down. This appears to currently apply to base babbage usage, not fine tuned babbage models.
Jun 20, 09:34 PDT
Jun 19, 2022

No incidents reported.

Jun 18, 2022

No incidents reported.

Jun 17, 2022

No incidents reported.

Jun 16, 2022

No incidents reported.

Jun 15, 2022

No incidents reported.

Jun 14, 2022

No incidents reported.

Jun 13, 2022

No incidents reported.

Jun 12, 2022

No incidents reported.