Service Incident August 27th 2015

SUMMARY

As of August 27th 19:00 GMT, we are working to resolve the following service incident:

Some customers hosted on our east coast data center are experiencing issues with application performance

12:58 PDT / 19:58 GMT

The investigation into this service disruption is ongoing.  

14:28 PDT /  21:28 GMT

Performance has stabilized, although we continue to monitor closely.  Post-mortem to follow.

POST-MORTEM

This incident affected our data centers located in the East Coast in the US and resulted in a widespread outage affecting multiple services. An application failure caused a very substantial sudden spike in tickets submitted to one of our customers' Zendesk. We have seen over 1 million tickets created in a 48 hour period as a result. This influx of tickets and requests overwhelmed our firewall.  

Our Operations performed failover of firewall as part of the remediation of possible hardware issues, and service began to show signs of recovery but we were once again inundated with requests, causing the firewall to once again fail.

To fully resolve the issue, we needed to change routing of the affected customer's traffic to ease the pressure on firewall. After the routing change propagated, the system quickly stabilised.

We're currently looking into monitoring connections so we can identify upticks in customer requests quicker, as well as investigating firewall bottlenecks.