Degraded performance across endpoints / models
Incident Report for Cohere
Resolved
The incident is now fully resolved, and we won't need to schedule a maintenance window regarding the DB scale up.
Impact of downtime:
* period of 2 hours with higher latencies
* average of 10% of requests were timing out.
* /classify was most hit with 80% of requests failing
Posted Apr 16, 2024 - 15:22 EDT
Monitoring
A fix has been implemented, error rates & latency response times have been resolved since 2:10 PM.
Posted Apr 16, 2024 - 14:53 EDT
Identified
We have identified an issue with the database related to increased pressure on the system. A subset of requests experienced high latency during a window from 12:05PM. We have root caused and are deploying mitigating issues until we can schedule a bigger maintenance window for the fix.
Posted Apr 16, 2024 - 12:25 EDT
This incident affected: Endpoints (chat), Models (command-r-plus, command-r, command, command-light, embed-english-v2.0, embed-english-light-v2.0, embed-english-v3.0, embed-multilingual-v3.0, embed-multilingual-light-v3.0, rerank-english-v2.0, rerank-multilingual-v2.0, rerank-english-v3.0, rerank-multilingual-v3.0), and Coral showcase, Dashboard.