On Tuesday January 15, Hull data pipeline stopped processing messages and the segment builder became unavailable for 58 minutes between 22:11 pm and 23:09pm UTC. Data ingestion was not affected by this outage and no data loss should have occured.
This incident was caused by the failure of an Elasticsearch cluster used for segmentation in the Hull data pipeline. The cluster itself was brought down by a StackOverflowException caused by a problematic query. This is a known issue in the version of Lucene used by that ElasticSearch cluster.
Our engineering team is working on a mitigation plan to make sure this ElasticSearch cluster is protected against that particular issue.