The ThousandEyes platform experienced delays in delivering all forms of Notifications (email, Webhooks and Integrations) for Alert Rules, between 2018-08-06 21:30 UTC and 2018-08-07 03:00 UTC.
Affected scope
All Notifications for Alert Rules were delayed during the period of the issue.
Root Cause
Notifications were created normally, but were held in a queue instead of being dispatched immediately. The cause of this failure was a restart of the queueing service outside of the system configuration management, causing the service to start without the correct configuration.
Our internal monitoring for notifications in the queue did trigger an alert, but the alert was sent to a lower-priority channel, due to past noisiness of the alert. We will address the alert sensitivity issue to ensure this situation is not repeated.
Status
The issue has been resolved.
Event Timeline
2018-08-07 03:00 UTC: Issue resolved. Notifications dispatch services were restored and queued notifications were delivered.
2018-08-06 21:30 UTC: Notifications dispatch services were delayed and notifications were held in queue.