Service Incident August 26th 2015

SUMMARY

As of August 26th 19:59 GMT, we are working to resolve the following service incident:

Some customers hosted on our west coast data center are experiencing issues with Zendesk Apps

16:40 PDT / 00:40 GMT

We continue to investigate slow response from some apps in our West Coast data center.  

These issues are intermittent and not impacting all users. If you are observing performance degradation on an apps in your Zendesk, please report them to support.

 18:40 PDT / 02:40 GMT

App issues have subsided, though we continue to monitor closely. Post-mortem to follow.

POST-MORTEM 

This incident was caused by an issue in an app, which led to an increased number of requests for agents using the play/next functionality in Zendesk. These overloaded the capacity of our proxy. As our rate limits are Zendesk-wide, the problem impacted other customers using other apps, though the errors were mostly sporadic. 

Even after setting up additional hardware for the proxy, it was still getting exhausted. Therefore, we asked customers using the app in question to disable it and have their agents refresh their browsers; however, the volume of requests only decreased slightly, possibly due to a bug in apps framework which allows apps to make requests even after they're destroyed or disabled. We then worked to pinpoint the issue with the app, after which we requested permissions to modify the app and update it. This resolved the problem.

To prevent recurrences, we're going to fix any affecting bugs in the apps framework (Apps still able to make requests after they've been destroyed). We are also looking at changing our approach to rate limiting to avoid having a single app overloading proxies.