Increased latency
Incident Report for hull
Postmortem

Issue

At approximately 7:30 UTC on 2019-07-12 the data ingestion pipeline experienced a spike in incoming data which lead to increased latency in processing. Users might have experienced delays up to 2 hours before data become visible in the dashboard and could be sent to other services.

Our engineering team mitigated the issue at 9:30 UTC and latencies were back to normal. However this first mitigation proved insufficient and starting at 14:00 UTC, latencies were over 30 minutes, increasing to over 90 minutes at 16:00 UTC.

Resolution

At 17:15 UTC, the problematic data traffic had been isolated and ingestion latencies have been back to normal levels. We monitored the situation until 22:00 UTC to ensure the applied resolution is sufficient.

Future Mitigation Plans

Over the past few months, we have been working on an overhaul to our ingestion pipeline with better isolation to prevent problematic traffic patterns from effecting other parts of the service. We apologize for any inconveniences this may have caused.

Posted Jul 15, 2019 - 11:52 EDT

Resolved
At approximately 7:30 UTC on 2019-07-12 the data ingestion pipeline experienced a spike in incoming data which lead to increased latency in processing. Users might have experienced delays up to 2 hours before data become visible in the dashboard and could be sent to other services.
Posted Jul 12, 2019 - 02:30 EDT