Partial outage of flow and BGP ingest

Incident Report for Kentik SaaS EMEA Cluster

Postmortem

ROOT CAUSE

30% of our ingest servers restarted concurrently, causing ingest load balancing issues and resource constraints.

RESOLUTION

Kentik Operations allowed the servers to come back online, monitored the automated restart of all Kentik software, and tracked the reestablishment of BGP sessions with any peerings that flapped.

Additional ingest capacity will be brought online by 2022-10-14 to help alleviate these types of scenarios in the future.

Posted Oct 07, 2022 - 16:30 UTC

Resolved

This incident has been resolved.
Posted Sep 16, 2022 - 00:19 UTC

Monitoring

BGP Ingest has returned to fully operational.
Posted Sep 15, 2022 - 22:47 UTC

Update

Flow ingest has returned to fully operational.
Posted Sep 15, 2022 - 22:12 UTC

Identified

Some BGP sessions will bounce and limited amounts of flow may be lost.
Posted Sep 15, 2022 - 21:25 UTC
This incident affected: BGP (BGP Peering and Enrichment) and Flow Ingest.