ORL (Observability Reference Language) alerts are used to monitor data from various datasets and trigger notifications when specific conditions are met. ORL alerts are defined by a set of properties that specify the characteristics and behavior of the alert. They are based on ORL queries, which are used to retrieve and analyze the data. This allows you to monitor your systems and services and be notified when there are issues or anomalies that require attention.
Note that an alert can only be set for queries that include calculations. It is not possible to set an alert for a query that does not have any calculations.
Sample Alert Spec
Here’s a sample ORL spec that uses all of the supported settings for defining alerts in Baselime. Use it to get started creating your own alerts.
# Define a new alert called "lambda-timeout-alarm" lambda-timeout-alarm: # This is an alert type type: alert properties: # A description of what this alert is for description: The prod-vortex-function Lambda function has exceeded 10 timeouts over the past 15 minutes. # Whether this alert is enabled or not enabled: true parameters: # The query to run to check for the condition that triggers the alert query: !ref lambda-timeout-query # How often to run the query frequency: 5mins # The time window for the query window: 15mins # The threshold that triggers the alert threshold: "> 10" channels: # A Slack channel to send the alert to - type: slack targets: - 'dynamodb-alerts' # A webhook URL to send the alert to - type: webhook url: https://example.com/alerts # An email address or list of addresses to send the alert to - type: email targets: - 'email@example.com' - 'firstname.lastname@example.org' # A PagerDuty service key to use for sending the alert to PagerDuty - type: pagerduty serviceKey: 'abc123' # The action to take in PagerDuty when the alert is triggered eventAction: 'trigger' # The name of the client/application that the alert is associated with client: 'Example App' # A URL for the client/application clientUrl: 'http://example.com'
ORL alerts have a set of properties that define the alert's characteristics and behavior.
description of the ORL alert is a string that provides more information about the alert. It can include details about the data being monitored, the conditions or thresholds being checked, and any other relevant information.
description: This alert triggers when the average request latency exceeds 500ms for more than 5 minutes.
enabled property is a boolean that specifies whether the ORL alert is currently active or inactive. If set to true, the alert will be triggered when the conditions or thresholds are met. If set to false, the alert will be disabled and will not trigger.
parameters of an ORL alert define the query to use, the frequency at which the query is run, the window of time over which the query's results are analyzed, and the threshold or condition that triggers the alert.
query parameter is a reference to an ORL query that defines the data to be monitored for the alert. It is specified as a string in the format
!ref query_id, where
query_id is the id of the ORL query.
query: !ref request-latency
frequency parameter is a string that specifies how often the alert is checked. It can follows the format
number time_unit, where
number is a positive integer and
time_unit is one of the following:
frequency can also be defined as a cron expression, following the AWS Cron Reference
15 10 * * ? *: 10:15 AM (UTC) every day
0 18 ? * MON-FRI *: 6:00 PM Monday through Friday
0 8 1 * ? *: 8:00 AM on the first day of the month
0/10 * ? * MON-FRI *: Every 10 min on weekdays
0/5 8-17 ? * MON-FRI *: Every 5 minutes between 8:00 AM and 5:55 PM weekdays
0 9 ? * 2#1 *: 9:00 AM on the first Monday of each month
The alert is checked at the specified interval, and if the conditions are met, the alert is triggered.
window parameter is a string that specifies the time window to consider for the alert. It follows the same format as the frequency parameter, but cannot be defined as a CRON expression.
The alert is only triggered if the conditions are met within the specified time window.
threshold parameter is a string that specifies the threshold at which the alert is triggered. It is a value that inculdes the comparison and the value (e.g.
The threshold is compared to the result of the first calculation in the query of the alert. If the result meets the specified condition, the alert is triggered.
The following comparison operators are supported:
!=: Does not equal
>: Greater than
>=: Greater than or equal to
<: Less than
<=: Less than or equal to
threshold: < 5
channels parameter is an array of objects that specify the channels to send the alert to. ORL supports the following types of channels:
slack: Sends the alert to a Slack channel
pagerduty: Triggers a PagerDuty incident
webhook: Sends the alert to a custom webhook URL
Each channel type has its own set of properties that define the behavior of the channel.
slack channel type sends the alert to a Slack channel. It has the following properties:
targets: An array of strings that specify the Slack channels to send the alert to. Each string should be the name of a Slack channel (e.g. general).
channels: - type: slack targets: - 'alerts' - 'errors'
Note that it is necessary to install the Baselime Slack app and follow the Slack onboarding to get alerts on Slack.
targets: An array of strings that specify the email addresses to send the alert to. Each string should be a valid email address.
channels: - type: email targets: - 'email@example.com' - 'firstname.lastname@example.org'
pagerduty [Coming Soon]
pagerduty channel type triggers a PagerDuty incident. It has the following properties:
serviceKey: A string that specifies the PagerDuty service key to use for the incident. This key is used to identify the PagerDuty service that the incident should be created in.
eventAction: A string that specifies the action to take when creating the PagerDuty incident. Valid values are trigger (default) and resolve.
client: A string that specifies the name of the client that the incident should be associated with. This is optional and can be used to provide context for the incident.
clientUrl: A string that specifies the URL of the client that the incident should be associated with. This is optional and can be used to provide context for the incident.
channels: - type: pagerduty serviceKey: 'abc123' eventAction: 'trigger' client: 'Example App' clientUrl: 'http://example.com'
webhook channel type sends the alert to a custom webhook URL. It has the following properties:
url: A string that specifies the URL to send the alert to.
method: A string that specifies the HTTP method to use when sending the alert. Valid values are POST (default) and GET.
headers: An object that specifies the headers to include in the request.
body: A string or object that specifies the body of the request. If a string is provided, it will be sent as-is. If an object is provided, it will be serialized as JSON and sent as the request body. (Coming Soon)
channels: - type: webhook url: 'http://example.com/webhook' method: 'POST' headers: 'Content-Type': 'application/json' body: message: 'This is an alert from ORL' # Coming soon
Example ORL Alerts
Here are example ORL alerts that combine all of the above properties.
DynamoDB ConsumedWriteCapacityUnits Alert
This alert is triggered when the ConsumedWriteCapacityUnits metric for a DynamoDB table exceeds a specified threshold over a specified time window.
- The alert is set to run every 15 minutes and check the metric over the past hour.
- If the ConsumedWriteCapacityUnits exceed 5 over the past hour, the alert is triggered.
- The alert is sent to a Slack channel called
dynamodb-capacity-alarm: type: alert properties: description: > The average consumed read capacity for the dynamodd tables has exceeded 5 units over the past hour. enabled: true parameters: query: !ref dynamodb-capacity-query frequency: 15mins window: 1h threshold: '> 5' channels: - type: slack targets: - 'dynamodb-alerts'
Lambda Timeout Alarm
This alert checks the number of invocations that have timed out for Lambda functions in the service, and triggers if the count exceeds 10 over the past 15 minutes. It sends a notification to a custom webhook URL every 5 minutes.
lambda-timeout-alarm: type: alert properties: description: The prod-vortex-function Lambda function has exceeded 10 timeouts over the past 15 minutes. enabled: true parameters: query: !ref lambda-timeout-query frequency: 5mins window: 15mins threshold: "> 10" channels: - type: webhook url: https://example.com/alerts