Creating and Editing Alert Rules

You can configure alert rules for different types of tests. This article will walk you through the common parts of alert rule configuration, plus those parts that are unique to each kind of test.

As for common characteristics, each alert rule has:

  • A name.

  • A series of tests against which it is enabled.

  • A scope of alert triggers (such as agents or monitors) to which the alert rule applies (with the exception of Endpoint Agent scheduled tests).

  • Criteria defining the alert conditions.

  • The number of alert triggers that the alert conditions must meet in order to activate an alert.

Alert rules also include a notification mechanism, such as a list of email recipients (recipients do not need to be users of ThousandEyes in order to receive email notifications), a PagerDuty service or one or more webhooks.

Each alert rule assigned to a test is evaluated independently. For tests with multiple alert rules assigned, any alert can be triggered when alert conditions are met. A test with multiple alert rules assigned to it can show zero, one, or multiple triggered alerts depending on what alert criteria were met during a single test pass.

Adding a New Alert Rule

To create a new alert rule, click Alerts > Alert Rules. The Alert Rules page opens.

From the tabs at the top of the page, select the desired alert source:

  • Cloud and Enterprise Agents

  • Endpoint Agents

  • BGP Routing

  • Devices

  • Internet Insights

Then click Add New Alert Rule. The Add New Alert Rule panel opens. The image below shows the panel that opens for Cloud and Enterprise Agents.

Configuring the Alert Rule

Every new alert panel within each alert source opens with three sections. The top section is where you choose the type of alert you wish to configure and give it a name. The bottom two panels consist of the Settings tab, where you specify the alert triggers (middle section) and alert conditions (bottom section). For information about the Notifications tab, see Alert Notifications.

Naming the Alert Rule

In the top section of the panel for each new alert, you will find:

  • Alert Type: Select the test layer for this alert rule.

  • Compatible Test Types (for Cloud, Enterprise, and Endpoint Agents only): As you select the test layer in the Alert Type field, the dropdown field to the right displays the test types to which this alert rule can be assigned.

  • Rule Name: Specify a name for the alert rule.

Selecting the Alert Triggers

The middle and bottom sections of the panel consist of the Settings tab. The middle section is where you configure your alert triggers (such as agents, monitors, or catalog providers). The fields in this section vary depending on the alert source and type, set out below.

Cloud and Enterprise Agents

  • Direction (only for Network: Agent to Agent and Network: Path Trace tests): Enables you to choose whether the alert triggers in the Source-to-Target, Target-to-Source, or Both (Agent to Agent) or Either (Path Trace) direction.

  • Tests: A dropdown menu listing all the tests set up in your account group. Select one or more tests to assign them to this alert rule.

  • Agents: Select the agents to which you will assign this alert rule. The options are:

    • All agents: All agents will be assigned this alert rule.

    • All agents except: All agents will be assigned this alert rule except for the ones selected.

    • Specific agents: Only the selected agents will be assigned to this alert rule.

      Note: Selecting All agents except or Specific agents opens another dropdown menu where you can select the agents you do or don't want to alert on.

  • Severity: Choose from Info, Minor, Major, and Critical.

Endpoint Agents

Real User Tests

  • Agents: Select the agents to which you will assign this alert rule. The options are:

    • All agents: All Endpoint Agents belonging to the account group will be assigned this alert rule.

    • Specific agents: Only the selected Endpoint Agents will be assigned to this alert rule.

    • Agent labels: Only the Endpoint Agents with the specified label will be assigned to this alert rule.

      Note: Selecting Specific agents or Agent labels opens another dropdown menu where you can select the agents or labels you want to alert on.

  • Visited Sites: Select the sites for which this alert will be triggered. The options are:

    • Any visited site: Any site within the monitored domain set that a user visits will be assigned to this alert rule.

    • Specific visited sites: Only the selected visited sites will be assigned to this alert rule. If you select this option, a dropdown menu appears where you can select from a number of suggested domains or type in a custom domain.

  • Severity: Choose from Info, Minor, Major, and Critical.

Scheduled Tests

  • Tests: A dropdown menu listing the all compatible Endpoint tests set up in your account group. Select one or more tests to assign them to this alert rule.

  • Severity: Choose from Info, Minor, Major, and Critical.

BGP Routing

  • Tests: A dropdown menu listing the all the tests set up in your account group. Select one or more tests to assign them to this alert rule.

  • Prefix Length: A dropdown menu allowing you to specify the length of prefix for both IPv4 and IPv6. The length defaults to between 16-32 for IPv4 and 32-128 for IPv6.

  • Monitors: Select the monitors to which you will assign this alert rule. The options are:

    • All monitors: All monitors will be assigned this alert rule.

    • All monitors except: All monitors will be assigned this alert rule except for the ones selected.

    • Specific monitors: Only the selected monitors will be assigned to this alert rule.

      Note: Selecting All monitors except or Specific monitors will open another dropdown menu where you can select the monitors you do or don't want to alert on.

  • Severity: Choose from Info, Minor, Major, and Critical.

Devices

  • Devices (for the Device alert type only): A dropdown menu listing all the monitored devices set up in your account group. Select one or more devices to assign them to this alert rule.

  • Interfaces (for the Interface alert type only): A dropdown menu listing all the monitored interfaces set up in your account group. Select one or more interfaces to assign them to this alert rule.

Internet Insights

  • Affected Tests: Select the affected tests to which you will assign this alert rule. The options are:

    • Any: Any affected tests will be assigned this alert rule.

    • Specific: Only the selected affected tests will be assigned to this alert rule. If you select this option, a dropdown menu will appear where you can select the affected tests you want to alert on.

  • Catalog Providers: Select the catalog providers to which you will assign this alert rule. The options are:

    • Any: Any catalog providers will be assigned this alert rule.

    • Specific: Only the selected catalog providers will be assigned to this alert rule. If you select this option, a dropdown menu will appear where you can select the catalog providers you want to alert on.

  • Severity: Choose from Info, Minor, Major, and Critical.

Setting the Alert Conditions

As with the alert triggers, the alert conditions vary depending on alert type, but also test type. First, we'll explain how to apply alert conditions to alert triggers (the first line under Alert Conditions in the image below); these are called global alert conditions. Then we'll explain how to set the alert conditions themselves (those items with a "-/+" next to them in the image below); these are called location alert conditions. For more information about global and location alert conditions see Global and Location Alert Conditions.

Setting Global Alert Conditions

The global alert condition is where you specify how your location alert conditions will be applied to your alert triggers, including how many location alert conditions, the number or percent of alert triggers, and how many test rounds must be met before alerting. Except for Internet Insights, all alert rules have similar options for configuring global alert conditions (Internet Insights is automatically configured to have any or all conditions met by the outage network within 5 minutes). The following list will explain how to configure each configurable field as you read the global condition from left to right.

  • All/Any: Select All when all the specified location alert conditions must be met (AND) or select Any when any one of the specified location alert conditions must be met (OR).

When only one location alert condition is specified, the system defaults to "All" conditions. You must add at least one other location alert condition to see the dropdown options.

  • any of/the same (for Cloud and Enterprise Agents and BGP routing only): Select any of if you want an alert activated when any set of alert triggers (agents or monitors) meet the alert condition(s) in consecutive rounds. Select the same if you want an alert activated only if the same set of alert triggers meet the alert conditions(s) in consecutive rounds. When you select the same, this is called selecting "sticky triggers".

    Sticky triggers: For example, an alert rule is configured for the same agent to trip a specified threshold in three consecutive rounds. If the Atlanta Cloud Agent trips the rule in round one, the Ashburn Cloud Agent trips it in round two, and the San Francisco Cloud Agent trips it in round three, an alert would not be activated. Either Atlanta, Ashburn, or San Francisco would need to trip the rule in three consecutive rounds to activate an alert.

  • Threshold value (not applicable to Devices): Specify the threshold value for alert triggers that must meet the alert conditions in order to trigger this alert rule. This value will be either a number of alert triggers or a percentage of alert triggers, as specified in the next setting.

    Note: When a percentage of alert triggers is used, and the percentage results in a non-whole number threshold value of actual alert triggers, the fractional part of the value is significant. For example, when an alert rule with a threshold of 25% of all agents is applied to 13 agents, the threshold is 3.25 agents. This threshold will require 4 agents to meet the alert criteria in order to activate the alert rule.

  • Threshold units: Select either the alert trigger, or the percentage thereof. The options for each alert rule are:

    • Cloud and Enterprise Agents (all test types): agent(s) or % of agents.

    • Endpoint Agent Real User Tests - Application test type: agent(s) or % of agents.

    • Endpoint Agent Real User Tests - Endpoint test type: visited site(s) or % of visited sites.

    • For Endpoint Scheduled Tests (all test types), you select a threshold value for both number and percentage of Endpoint Agents.

    • BGP Routing: monitor(s) or % of monitors.

    • For Devices, the threshold unit is part of the location alert condition, where the options are: Any interface or Any interface matching.

  • Rounds (met): Select the number of test rounds that the subsequent location alert condition(s) must meet out of a total number of rounds in order to activate the alert rule. See also the Rounds (total) entry below.

  • Rounds (total): Select the total number of test rounds against which the Rounds (met) selection is evaluated. For example, if Rounds (met) = 2 and Rounds (total) = 3, then for every three rounds, the alert rule will activate if the condition(s) were met twice.

Setting Location Alert Conditions

Location alert conditions are where you set the specific metrics on which an alert becomes active. You can set any number of metrics for an alert, though bear in mind that the more metrics you set, the less likely it is an alert will activate. Location alert conditions are configured by choosing at least one metric (the test characteristic against which you're measuring change) and one operator (the type of measure). Depending on the metric, other configurable options include threshold values and units. Reading left to right, location alert conditions include the following configurable fields:

  • Metric: Select a test metric for this condition.

  • Operators: Select an operator for this condition. There are many operators to choose from, some of which are self-explanatory. Below is a selection with more explanation. For a full list of metrics, operators and units, see the table under [Available Metrics, Operators, and Units].

    • >, <, ≥, ≤ : Numerical comparisons for greater than, less than, greater than or equal to, and less than or equal to. Available for all numerical (integer only) measures, such as packet loss percentage on network layer tests, or error count on page load tests.

    • is, is not: Non-numeric comparison for values that are not continuous ranges (e.g., HTTP response codes) or that are a fixed string value, such as the error type (e.g., "DNS", "Connect", "SSL"). Also, when suffixed with "empty", determines whether a metric has a value or has no value.

    • in, not in: Numeric or string comparison to a list of values. For example, a BGP routing rule compares a test metric's AS number (integer) to a list of one or more AS numbers to determine if the test metric is found or not found in the list. Use a wildcard * when matching against word spaces. For example, "10 * aspmx3.googlemail.com."

    • is incomplete: Determines whether a test completed the operations for a given metric. For example, this metric can be used to determine whether a path trace reached its destination, or a page load test fully loaded a page.

    • is present: Used when an error condition is present.

    • matches, does not match: Determines whether the POSIX regular expression in the alert rule is found within the string produced by the test metric (i.e., a substring will produce a match). For example, an alert rule for the Error metric of an HTTP server test with the following alert condition

      will alert when the test's Error Details text is "SSL certificate problem: certificate has expired":

      because the regular expression "certificate\s*\w*:" matches the sub-string "certificate problem:". The operators available per type of alert rule are also shown in the table below.

  • Threshold: The value that the metric setting will be compared against, using the chosen operator. Note that some operators do not have a value field.

  • Unit: Often, the unit is fixed once an operator is chosen, such as threshold value, %, ms, or kbps, but sometimes you can choose the unit, such as for dynamic baselines or for device interface thresholds.

  • Add/Delete: Click the + or - icon to add or delete location alert criteria to this alert rule. Criteria can be nested for some types of alert rule.

Available Metrics, Operators, and Units

The following table shows a list of test types which are available in the ThousandEyes platform, and the test metrics and operators.

Test LayerAlert TypeMetricOperatorsUnits

Network

End-to-End (Server), End-to-End (Agent)

Packet loss

≤, ≥

%

Network

End-to-End (Server), End-to-End (Agent)

Latency1

≤, ≥

ms

Network

End-to-End (Server), End-to-End (Agent)

Jitter

≤, ≥

ms

Network

End-to-End (Server), End-to-End (Agent)

Error

is present, matches, does not match

n/a

Network

End-to-End (Agent)

Throughput

≤, ≥

Kbps

Network

End-to-End (Server)

Available Bandwidth

≤,≥

Mbps

Network

End-to-End (Server)

Capacity

≤, ≥

Mbps

Network

Path Trace

Delay

≤, ≥

ms

Network

Path Trace

IP Address2

in, not in

IP address or prefix

Network

Path Trace

ASN2

in, not in

List of ASNs

Network

Path Trace

rDNS2

in, not in

exact hostname or wildcard-based match to domain

Network

Path Trace

MPLS Label2

is, is not

empty

Network

Path Trace

DSCP2

is, is not

DSCP value selected from list

Network

Path Trace

Server IP

in, not in

IP address, prefix

Network

Path Trace

Server MSS

<, >

bytes

Network

Path Trace

Path MTU

<, >

bytes

Network

Path Trace

Path Length

<, >

hops

Network

Path Trace

Trace is incomplete

n/a

DNS

Server, Trace DNSSEC

Error

is present, matches, does not match

n/a

DNS

Server

Resolution time

≤, ≥

ms

DNS

Server, Trace

Mapping

in, not in

quoted <comma-separated list of mappings> use * when matching against word spaces. For example, "10*aspmx3.googlemail.com."

Web

HTTP Server

Response code

is

any error (≥ http/400 or no response) ok (http/200) redirect (http/300

Web

HTTP Server

Response Header

matches, does not match

Web

HTTP Server

DNS time

≤, ≥

ms

Web

HTTP Server

Connect time

≤, ≥

ms

Web

HTTP Server

SSL negotiation time

≤, ≥

ms

Web

HTTP Server

Wait time

≤, ≥

ms

Web

HTTP Server

Receive time

≤, ≥

ms

Web

HTTP Server

Response time1

≤, ≥

ms

Web

HTTP Server

Total Fetch Time

≤, ≥

ms

Web

HTTP Server

Throughput

≤, ≥

kBps

Web

HTTP Server

Error

is present, matches, does not match

n/a

Web

HTTP Server

Error type

is, is not

DNS, Connect, SSL, Send, Receive, Content, HTTP, Any

Web

HTTP Server

Client SSL Alert Code

is, is not

SSL error type. E.g., Unexpected message ( 10 ), Bad Certificate (42)

Web

HTTP Server

Server SSL Alert Code

is, is not

SSL error type. E.g., Unexpected message ( 10 ), Bad Certificate (42)

Web

Page Load

Page load

Is incomplete

n/a

Web

Page Load

Response time

≤, ≥

ms

Web

Page Load

DOM load time

≤, ≥

ms

Web

Page Load

Page load time1

≤, ≥

ms

Web

Page Load

Error Count

≤, ≥

#

Web

Page Load

Domain Name3

in, not in

quoted <comma-separated list of mappings>

Web

Page Load

Total Fetch Time3

≤, ≥

ms

Web

Page Load

Blocked Time3

≤, ≥

ms

Web

Page Load

DNS Time3

≤, ≥

ms

Web

Page Load

Connect Time3

≤, ≥

ms

Web

Page Load

Send Time3

≤, ≥

ms

Web

Page Load

Wait Time3

≤, ≥

ms

Web

Page Load

Receive Time3

≤, ≥

ms

Web

Page Load

SSL Negotiation Time3

≤, ≥

ms

Web

Page Load

Component Load3

is incomplete

n/a

Web

Transaction (Classic)

Error

is present

n/a

Web

Transaction (Classic)

Transaction Time

≤, ≥

ms

Web

Transaction (Classic)

Completion

≤, ≥

%

Web

Transaction (Classic)

Steps Completed

≤, ≥, is

#

Web

Transaction (Classic)

Any Steps meets

any, all

of the following conditions: Step Duration

Web

Transaction (Classic)

Step # meets

any, all

of the following conditions: Step Duration

Web

Transaction (Classic)

Any Page meets

any, all

of the following conditions: Page Duration

Web

Transaction (Classic)

Page # meets

any, all

of the following conditions: Step Duration

Web

Transaction

Page

URL, Host, Page #

Web

Transaction

Page/Any Page > Page Load Time

≤, ≥

ms

Web

Transaction

Page/Any Page > Page Load Error

is present, matches

Web

Transaction

Page/Any Page > Response Time

≤, ≥

ms

Web

Transaction

Page/Any Page > DOM Load Time

≤, ≥

ms

Web

Transaction

Marker (name)

exact textual matching, case-sensitive

n/a

Web

Transaction

Marker (presence)

is present, is not present

n/a

Web

Transaction

Marker (duration)

≤, ≥

ms

Web

Transaction

Assert Error

is present, matches, does not match

Web

Transaction

Transaction Time

≤, ≥

ms

Web

Transaction

Transaction Completion

is finished, has error, has internal error, timed out

n/a

Web

Transaction

Error

is present, matches, does not match

Routing

BGP

Reachability

<, >

%

Routing

BGP

Path Changes

<, >

n/a

Routing

BGP

Origin ASN

in, not in

comma-separated list of ASNs.

Routing

BGP

Next Hop ASN

in, not in

comma-separated list of ASNs.

Routing

BGP

Prefix

in, not in

comma-separated list of covered prefixes

Routing

BGP

Covered Prefix4

exists, in, not in

comma-separated list of sub-prefixes

Voice

RTP Stream

Error

is present, matches, does not match

n/a

Voice

RTP Stream

MOS

≤, ≥

#

Voice

RTP Stream

Packet loss

≤, ≥

%

Voice

RTP Stream

Discards

≤, ≥

%

Voice

RTP Stream

DSCP

is, is not

DSCP Values. E.g., Best Effort (0), Expedited Forwarding (46)

Voice

RTP Stream

Latency

≤, ≥

ms

Voice

RTP Stream

Packet Delay Variation

≤, ≥

ms

Device

Device

Interface name

matches, doesn't match

Device

Device

Interface type

n/a

Device

Device

Exclude interfaces

n/a

Device

Device

IP address

matches

IP address, range, or prefix

Device

Device

Throughput

either, in, out ≥, >, ≤, <

Mbps, %

Device

Device

Discards

either, in, out ≥, >, ≤, <

pps, %

Device

Device

Errors

either, in, out ≥, >, ≤, <

pps, %

Device

Device

Discards and Errors

either, in, out ≥, >, ≤, <

pps, %

Device

Device

Operational Status

offline, online

Device

Device

Admin Status

Disabled, Enabled

Device

Device

State

Unchanged, Changed

  1. For some metrics, dynamic baselines can be configured. For more information, see Dynamic Baselines.

  2. These metrics are configurable under the "Any Hop", "Last Hop", or "Hop #" entries in path trace alert rules. Select "Any or "All" for multiple sub-conditions.

  3. These metrics are accessed under the "Any Component" alert condition in page load tests. Select "Any or "All" for multiple sub-conditions.

  4. Only BGP routing tests provide covered prefix data. Do not assign a BGP alert rule with a covered prefix metric to a non-BGP test type that has BGP path visualization measurements enabled. For non-BGP test types, use an alert rule that does not include the covered prefix metric, and if needed create a separate BGP test and an a separate alert rule with the covered prefix metric.

For Cloud and Enterprise Agent tests, each metric from the table above is defined in the article ThousandMetrics: What Do Your Results Mean? For Endpoint Agent tests, metrics are defined at Data Collected by the Endpoint Agent. For device tests, metrics are defined at Device Discovery Results.

Editing an Alert Rule

Editing an alert rule follows the same configuration steps set out above as adding a new alert rule. The only difference is that to edit an alert rule, you click an existing alert rule instead of clicking the Add New Alert Rule button. A panel appears with the current alert rule configuration; you can then change any of the field settings to your desired configuration.

Reducing Noise in Alerts

If an alert is throwing notifications that exceed your operational requirements, you can adjust the alert condition thresholds.

  1. Go to Alerts > Alert Rules.

  2. Select the name of the alert rule that you want to adjust.

  3. On the Settings tab, in the Alert Conditions section, review the current thresholds.

  4. Make changes to these settings to reduce the frequency of alerts, according to your requirements.

For the best way to reduce noise, try using dynamic baselines in your alert configuration instead of static thresholds. To learn more about dynamic baselines, see Dynamic Baselines.

After you adjust a noisy alert to meet your service-level expectations, the alert should begin to clear. An active alert that clears is moved to the Alert History tab. To view cleared alerts, go to Alerts > Alert List > Alerts History.

DNS Server Alert Rules

DNS server tests differ from other ThousandEyes tests in that multiple servers can be explicitly targeted in a single test. As a result, DNS server alert rules are evaluated on a per-server basis. That is, for each server in the DNS Servers field of the test configuration, the alert conditions are evaluated separately from all other servers in the DNS Servers field. For example, consider an alert rule that has the following alert conditions:

When assigned to a DNS server test with two servers configured as the targets, each server will be evaluated independently against the above alert condition. To activate the alert rule, at least four agents must receive an error against the same DNS server. The alert rule would not be triggered if, for example, three agents received an error when testing the first DNS server and a fourth agent received an error when testing the second DNS server.

BGP Alert Rules

A BGP alert rule can be applied to a routing-layer BGP test, or to a different layer type that provides the BGP route visualization View. It is important to note that some alert rule conditions can be applied differently depending on which type of test the rule is assigned to. For example, a BGP test has only a single target prefix that will be evaluated against the alert conditions. If the Covered Prefixes box is checked, any covered prefixes found are not evaluated against the alert conditions except the explicit Covered Prefix condition.

In contrast, a non-BGP test type can have one or more targets. DNS server tests can explicitly test multiple DNS servers. An agent-to-server test target's domain name can resolve to multiple servers' IP addresses. When creating the BGP path visualization, the Prefix selector shows these multiple target prefixes, and evaluates each prefix against any BGP alert rules assigned to the test. Thus, prefixes that would be considered covered prefixes under a BGP test and not evaluated by the alert rule (unless by a Covered Prefix condition) are evaluated when assigned to the non-BGP test. Similarly, the Covered Prefix condition does not have any relevance when assigned to a non-BGP test.

BGP alert rules have a parameter named Prefix Length, which is used to determine the length of prefixes evaluated by the rule. The Prefix Length can be individually configured for IPv4 and IPv6 protocols.

The default BGP alert rule will activate when 10% of monitors have less than 100% reachability.

Last updated