Issues with API Platform Audit Logs

Write-up

Root Cause

In November 2025, OpenAI split a monolithic service into smaller services, each responsible for fewer endpoints. During the migration, an environment variable required for publishing API Platform audit logs was not set correctly for one service. This resulted in audit log events being dropped for the endpoints migrated to the new service. The issue was not immediately detected by OpenAI’s alerting due to a second unrelated issue in the audit log publishing path that disrupted monitoring. At the time we did not have alerting on reduction in audit log publishing. We discovered this issue when it was raised by a customer report and promptly took action.

Incident Response

On February 19, 2026, 16:47 PST, OpenAI opened an incident after receiving reports from impacted customers. Engineers discovered logs indicating failure to write to the Kafka topic for API audit logs, identified the root cause as a missing Kafka environment variable, and promptly created a fix PR. On February 19, 19:04 PST, the fix PR was deployed and the engineers confirmed that the issue was fixed going forward.

Following the immediate fix, the on-call engineers identified the affected audit log event types and began investigating mechanisms to recover the data in the affected period. On February 20, 2026, OpenAI reached out to Azure, but the Azure team confirmed that detailed database changelogs were unavailable. OpenAI also looked into our internal CDC pipelines, but these did not store all historical changelogs. On February 22, 2026, OpenAI proceeded with backfills based on existing data sources for a partial recovery of the missing data.

Data recovery status

We are restoring as many of these audit log entries as possible and if you are an affected customer you will receive a notification once the recovery is complete. Our investigation identified the following impacted event types:

api_key.created
api_key.updated
api_key.deleted
service_account.created
service_account.updated
service_account.deleted
invite.sent
invite.accepted
invite.deleted
organization.updated
project.created
project.updated
project.deleted
user.added
user.updated
user.deleted
rate_limit.updated
rate_limit.deleted

However, there are certain data elements across the aforementioned impacted event types that cannot be restored because the underlying data is not available in our retained data stores:

Events are missing the actor field, which contains the following:
- api_key (user, tracking_id, service_account)
- session (user, ip_address, user_agent, ja3, ja4, ip_address_details).
- These details were not logged to other data sources and are lost.

For a subset of event types, only the latest update per resource is reconstructed due to a lack of detailed historical changelogs. This impacted the following event types:
- api_key.updated
- service_account.updated
- organization.updated
- project.updated
- user.updated
- rate_limit.update

Our response and what we’ve changed

We implemented and deployed a fix to restore audit log publishing.
We fixed the underlying failure mode so audit log publishing errors surface and emit error metrics across code paths.
We added redundant monitoring in the log collection system to detect unexpected drops in event volume by event type.
We are adding the `actor` details to our database objects as a second source for data recovery.
We validated that audit log producing services are logging correctly.

What this means for you

If you were affected by this incident, for the period of impact, some audit logs may be incomplete, which could adversely impact some investigative or compliance workflows. Consider using your own internal identity and change-management logs as a potential secondary source for investigations spanning this period.

We take immense care in being custodians of your work and data. We remain resolute in protecting your information and communicating transparently when issues arise. Thank you for your continued trust in us.