(Resolved) 2017-03-21 Degraded Enterprise Agent Connectivity

Last updated: Mon Dec 24 11:39:48 GMT 2018

Beginning at approximately 13:50 UTC on March 21st, some Enterprise Agents were unable to check in with their designated agent collector, preventing Agents from committing collected data to the ThousandEyes platform.

Our operations team was alerted and the issue was resolved by 14:32 UTC.  The root cause was an agent collector whose file system went into a read-only state.

Timeline follows:

1350 UTC: some agent commits begin failing
1412 UTC: initial alerts dispatched, ops team members paged
1415 UTC: page acknowledgement
1430 UTC: one agent collector is identified as having a read only file system
1432 UTC: affected agent collector removed from load balancing pool
1445 UTC: all agents are able to connect to the collector
1530 UTC: affected server is shut down and repaired, then added back into the load balancing pool
1545 UTC: connections rebalanced across collectors