Resolved
Fine-tuning service is running normally.
Monitoring
Error rates have stabilized and Fine-tuning API endpoints are operating normally again. Jobs are being processed as expected. We will continue to monitor this service closely for elevated error rates.
Monitoring
We are experiencing a temporary increase in error rates from the /v1/fine_tuning API, including job creation and listing and event listing. We expect these errors to subside by 12:45pm PT.
Monitoring
Fine-tuning jobs are being processed again, though the service is still experiencing elevated error rates.
Identified
The Fine-tuning API is currently online and accepting job creation requests, but there is a delay in job processing. Jobs will remained queued for the time being.
Investigating
Another spike of errors just occurred preventing jobs from being created. A mitigation is being pushed.
Monitoring
A fix has been deployed and the service is operating normally. We are continuing to monitor this service for errors
Investigating
We are experiencing elevated error rates (500 responses) on the POST /v1/fine_tuning/jobs endpoint.