22:30 UTC | 4:30 PT
The errors we saw with some Insights accounts is now resolved. Data may take few hours to fully reload.
21:30 UTC | 2:30 PT
We are seeing significant improvement after our last update, some projects syncing again. We are on target for full resolution in 1-2 hrs.
20:13 UTC | 1:13 PT
Initial fix didn't resolve Insights syncing issue for all accounts. We are working with our vendor to resolve for all accounts. Estimated 2-3 hrs for full restoration. We will update again here in 1 hour.
18:52 UTC | 11:52 PT
We are working to confirm the fix in affected customers' accounts. Will update once we confirm the fix or work to update.
18:26 UTC | 11:26 PT
We're working with our analytics partner to roll out a possible fix to alleviate the syncing issues facing some customers.
18:05 UTC | 11:05 PT
We are actively engaged and working with our partner to resolve syncing issues affecting some customers' Insights projects.
This post-mortem summary applies to incidents occurring on July 16 and 18, 2016.
On July 17th, 2016, after a planned GoodData platform release, we started to receive intermittent network connection failures between Zendesk and GoodData as well as an unusual reduction in network throughput. During the incident GoodData reports they tried several remedial actions to address the problem, including reversion of some deployed components, and multiple re-configurations and re-balancing of the network gateway resources. Despite these actions, the GoodData team was unable to identify the cause of the connection failures. Instead, the connection and performance issues subsided for no clear reason. When this release was re-deployed with the same configuration, no network issues were found even after significant additional testing/monitoring. GoodData has not ruled out the possibility that the cause was extraneous to its release.
Absent solid evidence of cause, we have identified improvements to our staging environment to more closely match production and increase likelihood issues will be found prior to production release. The Zendesk and GoodData teams are also working to tighten notification and escalation processes to improve communication and reduce time to resolution.
FOR MORE INFORMATION
For current system status information about your Zendesk, check out our system status page. During an incident, you can also receive status updates by following @ZendeskOps on Twitter. The summary of our post-mortem investigation is usually posted here a few days after the incident has ended. If you have additional questions about this incident, please log a ticket with us.