Service Incident March 24th 2016

17:31 UTC | 10:31 PT
We have resolved the service issues impacting some customers. Post-mortem to follow:

16:43 UTC | 09:43 PT
Performance is improving for impacted customers. We continue to monitor to ensure stability.

16:12 UTC | 09:12 PT
We continue working toward resolution for service issues impacting some customers.

15:37 UTC | 08:37 PT
We have narrowed the incident cause and are working on a solution for impacted customers.

14:53 UTC | 07:53 PT
Our operations team continues to investigate the service issues affecting some customers.


There have been two major service-impacting disruptions this week that are identical in nature. The first occurred on March 23 and is recorded here, and the second occurred on March 24 and is described below. These events impacted customers served from our east coast data center.

The Zendesk engineering team has identified a preliminary root cause, which is attributed to a suspected issue with a firmware version on a network device. This consequently affected the availability of our core network supporting the east coast data center.

As a troubleshooting measure, we decided to perform a controlled failover of our primary load balancer. Once this action was carried out, the failover provided relief and services were restored.

Pursuant to this recovery effort, we engaged our hardware vendor. In parallel, we began to review all recent changes made to our load balancers and focused on a firmware upgrade that was recently deployed. This upgrade was successfully implemented under our standard change review process. The upgraded version had been in production for the previous 3 weeks until March 24, when we decided to downgrade the firmware to its prior version. Once we performed the firmware downgrade and recovered from failover, the service has remained stable.

We have provided our vendor with log outputs to conduct forensic analysis and this specific case is currently under investigation.


Please subscribe to this article for regular updates until the issue is resolved. If you aren't subscribed to our Twitter feed, we encourage you to do so in order to get the most current information about any service issues. We also record all site outages on our system status page where you can see the past 12 months of service uptime. If you have questions about this issue, please open a ticket with us by sending a note to