Service Disruption on October 14th 2015

SUMMARY

As of 14:49 GMT+1 we are working to resolve the following service incident:
We are investigating some issues affecting the search functionality on Zendesk for some customers. More information to come.

Update As of 15:35 GMT+1
Our search functionality is now stable. We are monitoring it to ensure full service is restored.

POST-MORTEM 

On October 14th, we started seeing a significant degradation in performance of the search service in one of our data centres. This in turn caused an increase in search service latencies, leading to zero results on search queries for many accounts in that data centre. During the incident period, the impact in % of search requests fluctuated in alignment with the search latencies. 

A badly configured Zendesk account in said data centre triggered a mail loop which doubled the size of multiple tickets every few minutes. Eventually, this lead to tickets so large that the indexer responsible for this server would repeatedly crash, causing same ticket data to be indexed over and over. This in turn lead to excessive merges on the search side. To fix this, we have restarted the search service repeatedly in response to timeout alerts.

Note that this incident caused two further service notifications to be posted on October 15th and October 16th as we were receiving repeated reports of service degradation. The root cause was finally fixed on October 20th, when we deleted the ticket containing the mail loop.