Alerts

Alerts notify your team when AI agent metrics exceed defined thresholds. VoltOps evaluates alert conditions every minute and triggers notifications through configured channels.

Alert Components

An alert consists of:

Metric: What to monitor (error rate, latency)
Condition: Threshold and condition type (count or percent)
Time Window: Evaluation period (5, 15, 30, or 60 minutes)
Filters: Scope the alert to specific traces
Channels: Where to send notifications (webhook, Slack)
Cooldown: Minimum time between notifications

Creating an Alert

Navigate to the Alerts page in VoltOps and click "Create Alert".

Selecting a Metric

Metric	Description
Errored Runs	Counts traces with `status: error` or `error_count > 0`
Latency	Calculates average trace duration in the time window

Condition Types

For error rate alerts:

Count: Trigger when error count exceeds N runs
Percent: Trigger when error percentage exceeds N%

For latency alerts:

Trigger when average latency exceeds N seconds

Time Windows

Available windows: 5, 15, 30, or 60 minutes. The alert evaluates all traces within this rolling window.

Cooldown Period

After an alert triggers, VoltOps waits for the cooldown period before sending another notification. Available options: 5, 15, 30, 60, or 120 minutes.

Filters

Filters narrow the scope of an alert to specific traces. Multiple filters are combined with AND logic.

Available Filter Fields

Field	Type	Operators	Description
Status	select	eq, neq	Trace status: `error`, `success`, `in_progress`
Latency (ms)	number	gt, lt	Trace duration in milliseconds
Model	text	eq, neq	LLM model name
User ID	text	eq, neq	User identifier from trace
Input	text	contains	Trace input content
Output	text	contains	Trace output content
Error Message	text	eq, contains	Error message from failed traces
Agent/Workflow Name	text	eq, neq, contains	Name of the root span
Entity Type	select	eq, neq	`agent` or `workflow`
Metadata	text	eq, neq, contains	Custom metadata key-value pairs

Filter Operators

Operator	Description
`eq`	Equals
`neq`	Not equals
`gt`	Greater than
`lt`	Less than
`contains`	Contains substring (case-insensitive)

Metadata Filters

Filter by custom metadata fields using dot notation:

metadata.environment - Direct metadata key
metadata.context.region - Nested context key

Notification Channels

Webhook

Send HTTP POST requests to any URL when an alert triggers.

Configuration:

URL: Endpoint to receive the webhook
Headers: Optional HTTP headers (JSON format)
Body: Optional custom payload (JSON format)

Default Payload:

{
  "alert_id": "uuid",
  "alert_name": "High Error Rate",
  "metric": "error_rate",
  "value": 15.5,
  "threshold": 10,
  "timestamp": "2024-01-15T10:30:00Z"
}

Slack

Send notifications to a Slack channel using Incoming Webhooks.

Setup:

Create an Incoming Webhook in your Slack workspace (Slack documentation)
Copy the webhook URL (format: https://hooks.slack.com/services/...)
Paste the URL in the Slack channel configuration

Slack notifications include:

Alert name and metric
Current value and threshold
Link to view the incident
Link to view a sample trace (when available)

Testing Notifications

Click "Send Test Notification" to verify your channel configuration. Test webhooks include an is_test: true field.

Incidents

When an alert triggers, VoltOps creates an incident. Incidents track the lifecycle of an alert from trigger to resolution.

Incident Statuses

Status	Description
Open	Alert triggered, awaiting response
Acknowledged	Team member is investigating
Snoozed	Temporarily muted until a specified time
Resolved	Issue addressed, incident closed

Incident Workflow

Alert triggers → Incident created with status open
Team member takes ownership → Status changes to acknowledged
Investigation complete → Status changes to resolved

Alternatively:

Snooze: Temporarily silence the incident. When the snooze period expires, the incident reopens if conditions still trigger.

Incident Details

Each incident contains:

Payload: Metric value at trigger time, threshold, and sample trace ID
Assignee: Team member responsible for resolution
Notes: Comments added during investigation
Timestamps: Triggered at, resolved at

Dashboard

The Alerts dashboard displays:

Metric	Description
Total Incidents	Number of incidents in the selected period
Active Incidents	Currently open, acknowledged, or snoozed incidents
Avg Resolve Time	Mean time from trigger to resolution
Sparkline	Daily incident counts over the period

Toggle between 7-day and 30-day views using the period selector.

Alert Evaluation

VoltOps runs a scheduled job every minute that:

Queries traces within each alert's time window
Applies the configured filters
Calculates the metric value
Compares against the threshold
Creates an incident if triggered and no open incident exists
Sends notifications respecting the cooldown period

If an incident is snoozed and the snooze period expires, the incident reopens and notifications resume.

Alerts

Alert Components​

Creating an Alert​

Selecting a Metric​

Condition Types​

Time Windows​

Cooldown Period​

Filters​

Available Filter Fields​

Filter Operators​

Metadata Filters​

Notification Channels​

Webhook​

Slack​

Testing Notifications​

Incidents​

Incident Statuses​

Incident Workflow​

Incident Details​

Dashboard​

Alert Evaluation​

Table of Contents