How Alerts work

Last updated: Sat Aug 05 00:01:59 GMT 2017

The ThousandEyes platform allows customers to configure highly customizable Alert Rules and assign them to tests, in order to highlight or be notified of events of interest.  For customers who want simplicity in alert configuration and management, the ThousandEyes platform ships with default Alert Rules configured and enabled for each test.

Notifications

Alert notifications are delivered either via email, Webhooks or via PagerDuty integration. Email notification is sent to the alert recipients defined in the Alert Rule.  Recipients are configured in the Alert Rule's Notifications tab. When multiple alerts are raised simultaneously, their data will be grouped into a single email notification. Alerts will be raised as long as your Alert Rule criteria are met (after any threshold is reached; see Consecutive Rounds of Alert Criteria below) but email notification will only occur once. Alerts can optionally be configured to send another email once the alert has cleared.

Webhooks integration permits users to send JSON-formatted alert information to a webhooks-enabled server via HTTP. The information can then be programmatically processed and subsequent actions taken automatically. For more information on configuring ThousandEyes Alerts with webhooks, refer to the ThousandEyes Knowledge Base article Using Webhooks.

PagerDuty integration allows you to create an Escalation Policy in your PagerDuty service which sets rules for notification destinations, repeat notifications and other actions, when ThousandEyes sends notification to PagerDuty after an alert is raised. For more information on configuring ThousandEyes Alerts with PagerDuty, refer to the ThousandEyes Knowledge Base article PagerDuty Integration.

The Slack and HipChat integrations provide a new way to consume alert and agent notifications, alongside email, PagerDuty, and Webhooks. You can customize your Web, Network, BGP, DNS, Voice Alerts and then choose which alert rules will send notifications to the Slack channel and/or HipChat room of your choice.

Display

In addition to notification mechanisms, alerts can be viewed on the Alerts page (https://app.thousandeyes.com/alerts/list/).  The Alerts View page is divided into two tabs:

1. Active Alerts: List of Alerts currently active for any test within your Account Group.
2. Alerts History: List of Alerts raised by tests on your Account Group, shown chronologically on a timeline, along with a table below for more details.

Active Alerts

Active_Alerts.png

The Active Alerts tab shows all Alerts currently "active" under your Account Group. The active alerts tab is built to auto-refresh every two minutes, similar to ThousandEyes Dashboard.

  1. Search for alerts.  By entering text into the search box, you will search for alerts matching based on the following criteria: Alert ID, Alert Rule Name,  Alert Type, Test ID, Test Name, Test Type or Status. Entering text followed by the return/enter key will execute a search and refine the table below. To filter events by more than one criteria, click either All or Any links to specify whether the table rows must match all (AND) or either (OR) of the selected criteria.  
  2. Alert Status: 
    • A red colored box indicates that the Alert Rule is currently active for that Test. 
    • A green colored box indicates that the Alert was recently cleared for the test. A cleared Alert will be shown under Alert History tab.
    • A grey colored box indicates that the Alert Rule was disabled for that Test
  3. Alert Rule Name: Name of the Alert Rule currently active. Expand an Alert Rule for more detailed information by Agent, BGP monitor, Start/End Time, Metrics at Alert Start, Metrics at Alert End and Duration for which the Alert was active.
  4. Test Name: Name of the Test for which the Alert Rule is currently active
  5. Alert ID: When gathering details for an Alert via ThousandEyes API, use the Alert ID to reference a particular Alert.

 

Alert History

Alert_History.png

The Alerts History tab tabulates triggered Alerts which are currently in "cleared" or "inactive" state or are "disabled". To interact with the Alert History page,

  1. Date and Time slider: Input the date endpoints to view Alerts active during that timespan.  Click and drag on either the start or end bars and drag to the desired date.  Your selection will update the From and To date and time fields automatically. 
  2. Date and Time selector: The From and To fields allow manual input of the date and time endpoints to display Alerts active at that time.  Clicking in the date field will both allow manual entry of dates and display a clickable calendar to select a date. Click on the calendar arrows to navigate in the current view (default is the month view). To change to a view of months in a year or a range of years, click the current title (month, year or year range) at the top-middle of the calendar.  The view will cycle to the next timeframe: month -> months -> years.
  3. Search for alerts.  By entering text into the search box, you will search for alerts matching based on the following criteria: Alert ID, Alert Rule Name,  Alert Type, Test ID, Test Name, Test Type or Status. Entering text followed by the return/enter key will execute a search and refine the table below. To filter events by more than one criteria, click either All or Any links to specify whether the table rows must match all (AND) or either (OR) of the selected criteria.
  4. Alert Rule Name: Expand an Alert Rule for more detailed information by Agent, BGP monitor, Start/End Time, Metrics at Alert Start, Metrics at Alert End and Duration for which the Alert was active.
  5. Test Name: Name of the Test for which the Alert Rule was triggered.
  6. Duration: Length of time for which the Alert Rule was active for that test.
  7. Alert ID: When gathering details for an Alert via ThousandEyes API, use the Alert ID to reference a particular Alert.

Assignment to tests

Once you have created an Alert Rule it can be assigned to any test which has the Enable box checked, on the test configuration page.  By default, each test has the rule "Default <test type> Rule" assigned to it, with your account's email address configured as the recipient for email notification. To add or remove rules, click the pull-down menu below the Enable box, and select or deselect rules.  To create a new rule, click the Edit Alert Rules link to acces the Add New Alert Rules page, and create your rule.  You will then return to the test configuration page, and use the pull-down menu to assign your new rule to the test.

Rule configuration

Each rule has a name, a series of tests against which it is enabled, a scope of locations to which the Alert Rule applies, Boolean criteria defining the alert conditions, and the number of locations from which the alert conditions must be met in order to trigger an alert.  The rule also can include a notification mechanism, such as a list of email recipients (recipients need not be users of ThousandEyes in order to receive email notifications), a PagerDuty Service or one or more Webhooks.

The image below displays the configuration options of a new Alert Rule.

21458889-2-Add_New_Alert_Rule.png

  1. Layer: The test layer type for this Alert Rule.  Selecting a Layer will display the Alert Types for that Layer.
  2. Alert Type: The type of Alert that this Alert Rule will be.  Choosing an Alert Type value will display configuration options specific for this Alert Type. Additionally, the Compatible Test Types box will display Test Types to which this Alert Rule can be assigned.
  3. Rule Name: An alphanumeric string naming this Alert Rule.

Settings tab

  1. Tests: Select tests to which this Alert Rule is assigned.  You may choose to configure no tests with this Alert Rule, and assign it to tests at a later time.
  2. Monitors, Countries, Agents: This selector will display either "Monitors" for a Routing Layer Alert Rule, "Countries" for a DNS+ Layer Alert Rule or "Agents" for all other Alert Rules.  The selector has one of three values:
    • All: This Alert Rule applies to all Agents or Monitors for a test to which this Alert Rule is assigned. 
    • All except:  This Alert Rule applies to all Agents or Monitors for a test to which this Alert Rule is assigned, except for the Agents specified in the selector that will appear when "All except" value is chosen.
    • Specific: This Alert Rule applies only to specific Agents or Monitors for a test to which this Alert Rule is assigned. The Agents or Monitors are specified in the selector that will appear when "Specific" value is chosen.
  3. Threshold: Specify the threshold value for locations (Agents, Monitors or Countries, depending on rule type) that must meet the alert conditions in order to trigger this Alert Rule. This value will be either a number of Agents/Monitors/Countries, or a percentage of Agents/Monitors/Countries, as specified in the next setting.

    NOTE: When a percentage of Agents, Monitors or Countries is used, and the percentage results in a non-whole number threshold value of actual Agents, Monitors or Countries, the fractional part of the value is significant.  For example, when an Alert Rule with a threshold of 25% of all Agents is applied to 13 Agents, the threshold is 3.25 Agents. This threshold will require 4 Agents to meet the alert criteria in order to trigger the Alert Rule.
     
  4. Threshold units: Select either Agent, Monitor or Country, or percentage of Agents, Monitors or Countries.
  5. All or Any: Select either "All" to require all of the following alert conditions must be met in order to trigger this Alert Rule, or "Any" to trigger the Alert Rule on any one of the following conditions.
  6. Rounds (met): Select the number of test rounds that the following alert condtion(s) must be met out of a total number of rounds in order to trigger the Alert Rule.  See the Rounds (total) entry below.
  7. Rounds (total): Select the total number of test rounds in which the Rounds (met) selection is evaluated.  For example, if Rounds (met) = 2 and Rounds (total) = 3 then for every three rounds, the Alert Rule will trigger if the condition(s) were met twice.
  8. Metric: Select a test metric for this condition.
  9. Operators: The following operators are used in rule configuration:

≥, ≤, <, >, is, is not, is in, is not in, is not empty, is incomplete, is present

The units and operators available for each type of alert are shown in the table below.

  1. Threshold: The value that the Metric setting will be compared against, using the chosen operator.  Note that some operators do not have a Value field.
  2. Add/Delete: Click the + or - icon to add or delete alert criteria to this Alert Rule.  Criteria can be nested for some types of Alert Rule.
  3. Compatible Test Types: Test types to which this Alert Rule can be assigned.

NOTE: Agents displaying a Local Agent Problems message on a test results page are excluded from alert calculations.

DNS Server Alert Rules

DNS Server Tests differ from other ThousandEyes tests in that multiple servers can be explicitly targeted in a single test.  As a result, DNS Server Alert Rules are evaluated on a per-server basis; each server in the DNS Servers field of the test configuration will have the Alert Conditions evaluated separately from all other servers in the DNS Servers field. For example, consider an Alert Rule that has the following Alert Conditions:

21458889-1-Alerts-DNS_Server.ds.png

When assigned to a DNS Server test with two servers configured as the targets, each server will be evaluated separately against the above Alert Condition.  To trigger the Alert Rule, at least four Agents must receive an Error against same DNS server.  The Alert Rule would not be triggered if, for example, three Agents received an Error when testing the first DNS Server and a fourth Agent received an Error when testing the second DNS server.

BGP Alert Rules

A BGP Alert Rule can be applied to a Routing Layer BGP test, or to a different Layer type that provides the BGP Route Visualization View. It is important to note that some Alert Rule conditions can be applied differently depending on which type of test the rule is assigned to.  For example, a BGP test has only a single target prefix which will be evaluated against the Alert Conditions.  If the "Covered Prefixes" box is checked, any covered prefixes found are not evaluated against the Alert Conditions except the explicit "Covered Prefix" condition.

In contrast, a non-BGP test type can have one or more targets. DNS Server tests can explicitly test multiple DNS servers.  An Agent to Server target's domain name can resolve to multiple servers IP addresses.  When creating the BGP Path Visualization, the Prefix selector will show these multiple target prefixes, and evaluate each prefix against any BGP Alert Rules assigned to the test.  Thus, prefixes which would be considered covered prefixes under a BGP test and not evaluated by the Alert Rule (unless by a "Covered Prefix" condition) are evaluated when assigned to the non-BGP test.  Similarly, the "Covered Prefix" condition does not have any relevance when assigned to a non-BGP test.

Notifications tab

In addition to presenting the Alert in the app.thousandeyes.com UI, the ThousandEyes platform can deliver notifications of alerts through a number of services. The image below displays the Notifications configuration options of a new Alert Rule.

AlertNotifications.png

  1. Send emails to: A list of addresses to which an alert email will be sent when the Alert Rule is first triggered.  Addressees need not be users of the ThousandEyes platform.
  2. Edit emails: Click this link to add email addresses to the Notifications address book.
  3. Send an email: Check this box to send an email when the Alert Rule is no longer active.
  4. Add/Remove Message: Enter text to be added to the body of the Alert Rule's email notification.
  5. Webhooks: Configure a Webhooks service for Alert Rule notification. See the ThousandEyes Knowledge Base article Using Webhooks for more details.
  6. Integrations: Configure PagerDuty policy, Slack, or Hipchat integration for Alert Rule notification. See the ThousandEyes Knowledge Base article PagerDuty Integration and Slack and Hipchat Integration for more details.

Note: Alerts will be active as long as your Alert Rule criteria are met, but any configured email notification will only occur once.

Available Operators, Metrics and units

The following table shows a list of test types which are available in the ThousandEyes platform, along with a list of metrics and operators that are measurable and alertable.   

Test LayerAlert typeMetricOperatorsUnits
NetworkEnd-to-End (Server), End-to-End (Agent)Packet loss≤,≥%
NetworkEnd-to-End (Server), End-to-End (Agent)Latency1≤,≥ms
NetworkEnd-to-End (Server), End-to-End (Agent)Jitter≤,≥ms
NetworkEnd-to-End (Server), End-to-End (Agent)Erroris presentn/a
NetworkEnd-to-End (Agent)Throughput≤,≥Kbps
NetworkEnd-to-End (Server)Available Bandwidth≤,≥Mbps
NetworkEnd-to-End (Server)Capacity≤,≥Mbps
NetworkPath TraceDelay≤,≥ms
NetworkPath TraceIP Address3in, not inIP address or prefix
NetworkPath TraceASN3in, not inList of ASNs
NetworkPath TracerDNS3in, not inexact hostname or wildcard-based match to domain
NetworkPath TraceMPLS Label3is empty, is not empty 
NetworkPath TraceDSCP3is, is notDSCP value selected from list
NetworkPath TraceServer IPin, not inIP address, prefix
NetworkPath TraceServer MSS<, >bytes
NetworkPath TracePath MTU<, >bytes
NetworkPath TracePath Length<, >hops
NetworkPath TraceTrace is incompleten/a 
DNSServer, Trace DNSSECErroris presentn/a
DNSServerResolution time≤,≥ms
DNSServer, TraceMappingis not inquoted <comma-separated list of mappings>
DNS+Server Latency, DomainResolution Time≤,≥ms
DNS+DomainAvailability≤,≥%
DNS+DomainMappingis not inquoted <comma-separated list of mappings>
WebHTTP ServerResponse codeisany error (≥ http/400 or no response)
ok (http/200)
redirect (http/300
WebHTTP ServerResponse Headermatches, does not matchPOSIX Extended Regular Expression Syntax
WebHTTP ServerDNS time≤,≥ms
WebHTTP ServerConnect time≤,≥ms
WebHTTP ServerSSL negotiation time≤,≥ms
WebHTTP ServerWait time≤,≥ms
WebHTTP ServerReceive time≤,≥ms
WebHTTP ServerResponse time1≤,≥ms
WebHTTP ServerTotal Fetch Time≤,≥ms
WebHTTP ServerThroughput≤,≥kBps
WebHTTP ServerError typeis, is notDNS, Connect, SSL, Send, Receive, Content, HTTP, Any
Web

 

HTTP ServerClient SSL Alert Codeis, is notSSL error type. 
eg. Unexpected message ( 10 ), Bad Certificate (42)
WebHTTP ServerServer SSL Alert Code

 

is, is notSSL error type. 
eg. Unexpected message ( 10 ), Bad Certificate (42)
WebPage LoadPage loadIs incompleten/a
WebPage LoadResponse time≤,≥ms
WebPage LoadDOM load time≤,≥ms
WebPage LoadPage load time1≤,≥ms
WebPage LoadError Count≤,≥#
WebPage LoadDomain Name2is in, is not inquoted <comma-separated list of mappings>
WebPage LoadTotal Fetch Time2≤,≥ms
WebPage LoadBlocked Time2≤,≥ms
WebPage LoadDNS Time2≤,≥ms
WebPage LoadConnect Time2≤,≥ms
WebPage LoadSend Time2≤,≥ms
WebPage LoadWait Time2≤,≥ms
WebPage LoadReceive Time2≤,≥ms
WebPage LoadSSL Negotiation Time2≤,≥ms
WebPage LoadComponent Load2is incompleten/a
WebTransactionErroris presentn/a
WebTransactionDuration≤,≥ms
RoutingBGPReachability<,>%
RoutingBGPPath Changes<,>n/a
RoutingBGPOrigin ASNis in, is not incomma-separated list of ASNs.
RoutingBGPNext Hop ASNis in, is not incomma-separated list of ASNs.
RoutingBGPPrefixis in, is not incomma-separated list of covered prefixes
RoutingBGPCovered Prefixexists, is in, is not incomma-separated list of sub-prefixes
VoiceRTP StreamMOS≤,≥#
VoiceRTP StreamPacket loss≤,≥%
VoiceRTP StreamDiscards≤,≥%
VoiceRTP StreamDSCPis, is notDSCP Values. 
eg Best Effort (0), Expedited Forwarding (46)
VoiceRTP StreamLatency≤,≥ms
VoiceRTP StreamPacket Delay Variation≤,≥ms
VoiceRTP StreamErroris, is notpresent
  1. For three special cases (Web/Page Load, Web/HTTP Server, Network/Latency) the “Auto” option is available, which permits an alert to self-determine when he alert is raised. The Auto option will compare the value to a weighted average of all historical data from that test. If the value is greater than the expected value by 1.5 standard deviations, then the alert is raised.
  2. These cases are configurable under the primary "Any Component" alert condition in Page Load Tests. You can select any/all sub-conditions to be met in order to trigger the alert for "Any Component".
  3. These cases are configurable under the "Any Hop", "Last Hop", or "Hop #" entries in Path Trace alert rules.  Select any/all sub-conditions to be met in order to trigger the alert.
  4. Only BGP Routing tests provide Covered Prefix data.  Do not assign a BGP Alert Rule with a Covered Prefix metric to a non-BGP test type that has BGP Path Visualization measurements enabled.  For non-BGP test types, use an Alert Rule that does not include the Covered Prefix metric, and if needed create a separate BGP test and an a separate Alert Rule with the Covered Prefix metric.

Each metric from the table above is defined in the ThousandEyes Knowledge Base article ThousandMetrics: what do your results mean?


Default alerting rules

Default Alert Rules are defined according to the following list.  Within the Account Group, Default Alert Rules can be changed by any user having a role with the View alert rules and Edit alert rules permissions, such as the built-in Account Admin or Organization Admin roles.  Default rules can be configured with zero or more alert rules representing the default alert rule for each type.

NameCriteriaMinimum Locations
Default Network Alert RulePacket loss ≥ 20%2 locations
Default DNS Trace Alert RuleError is present2 locations
Default DNS Server Alert RuleError is present2 locations
Default DNSSEC Alert RuleError is present2 locations
Default DNS+ Domain Alert RuleAvailability ≤ 90% and Reference Availability ≥ 90%2 countries
Default DNS+ Server Alert RuleResolution time ≥ 100ms1 country
Default HTTP Alert RuleError type is any2 locations
Default Page Load Alert RulePage load is incomplete2 locations
Default Transaction Alert RuleError is present2 locations
Default BGP Alert RuleReachability < 100%2 locations
Default Voice Alert RuleError is present1 location