Service Incident August 15th 2016 Access issues on Pod 5

17:10 UTC | 10:10 PT
The access issues are now resolved and Zendesk services are restored.

16:24 UTC | 09:24 PT
Performance continues to improve, but our investigation is ongoing.


Root cause investigation determined that translation micro-service hosts in Pod 5 were configured to run with 4 resource workers, but each host had only 2 workers configured, leading to severe resource starvation at the peak of weekday traffic. This caused a backlog of requests to pile up. The resolution of the issue came when the resource-starved hosts self-recovered as request traffic trended down about 30 minutes later. To prevent this issue from recurring, a change has been made to increase resource memory for our translation micro-service hosts in Pod 5.


For current system status information about your Zendesk, check out our system status page. During an incident, you can also receive status updates by following @ZendeskOps on Twitter. The summary of our post-mortem investigation is usually posted here a few days after the incident has ended. If you have additional questions about this incident, please log a ticket with us.