Evaluation Datasets

Datasets are collections of test cases used to evaluate agent performance. Each dataset item contains an input prompt, an optional expected output, and metadata to help organize and analyze results.

Dataset Structure

Datasets follow a consistent JSON structure whether stored locally or in VoltOps:

{
  "name": "customer-support-qa",
  "description": "Customer support question-answer pairs",
  "tags": ["support", "qa", "production"],
  "metadata": {
    "version": "1.0.0",
    "created": "2025-01-10"
  },
  "data": [
    {
      "name": "refund-policy",
      "input": "What is your refund policy?",
      "expected": "We offer a 30-day money-back guarantee...",
      "extra": {
        "category": "policies",
        "difficulty": "easy"
      }
    }
  ]
}

Field Descriptions

Field	Type	Required	Description
`name`	string	Yes	Unique identifier for the dataset
`description`	string	No	Human-readable description
`tags`	string[]	No	Labels for filtering and organization
`metadata`	object	No	Additional structured data
`data`	array	Yes	Collection of dataset items

Dataset Item Structure

Each item in the data array contains:

Field	Type	Required	Description
`name`	string	No	Item identifier for tracking
`input`	any	Yes	Input to the agent (string, object, or array)
`expected`	any	No	Expected output for comparison
`extra`	object	No	Additional context or metadata

Creating Datasets

JSON Files

Store datasets as JSON files in .voltagent/datasets/:

{
  "name": "math-problems",
  "description": "Basic arithmetic problems",
  "data": [
    {
      "input": "What is 15 + 27?",
      "expected": "42"
    },
    {
      "input": {
        "operation": "multiply",
        "a": 7,
        "b": 8
      },
      "expected": 56
    }
  ]
}

Inline Datasets

Datasets can also be defined inline within experiment files:

const inlineDataset = {
  items: [
    { input: "Hello", expected: "Hi there" },
    { input: "Goodbye", expected: "See you later" },
    { input: "How are you?", expected: "I'm doing well, thanks!" },
  ],
};

CLI Commands

Push Dataset to VoltOps

Upload a local dataset file to VoltOps:

voltagent eval dataset push --name math-problems

Options:

Flag	Description	Default
`--name <name>`	Dataset name (required)	-
`--file <path>`	Path to JSON file	`.voltagent/datasets/<name>.json`

Environment Variables:

VOLTAGENT_DATASET_NAME - Default dataset name
VOLTAGENT_API_URL - VoltOps API endpoint
VOLTAGENT_PUBLIC_KEY - Authentication key
VOLTAGENT_SECRET_KEY - Authentication secret

Example:

# Push custom file path
voltagent eval dataset push --name production-qa --file ./data/qa-pairs.json

# Use environment variable for name
export VOLTAGENT_DATASET_NAME=production-qa
voltagent eval dataset push

Pull Dataset from VoltOps

Download a dataset version from VoltOps:

voltagent eval dataset pull --name math-problems

Options:

Flag	Description	Default
`--name <name>`	Dataset name	Interactive prompt
`--id <id>`	Dataset ID (overrides name)	-
`--version <id>`	Version ID	Latest version
`--output <path>`	Output file path	`.voltagent/datasets/<name>.json`
`--overwrite`	Replace existing file	false
`--page-size <n>`	Items per API request	200

Interactive Mode:

When no dataset is specified, the CLI presents an interactive menu:

voltagent eval dataset pull

? Select a dataset to pull
❯ customer-support (5 versions)
  math-problems (3 versions)
  product-catalog (1 version)

? Select a version to pull for customer-support
❯ v3 • 150 items — Production dataset
  v2 • 100 items
  v1 • 50 items — Initial version

File Conflict Resolution:

When the target file exists:

? Local file already exists. Choose how to proceed:
❯ Overwrite existing file
  Save as new file (math-problems-remote.json)
  Cancel

VoltOps Console

The VoltOps Console provides a web interface for dataset management at https://console.voltagent.dev/evals/datasets.

Creating Datasets

Click Create Dataset
Enter dataset name and description
Add tags for organization
Submit to create an empty dataset

Adding Items

Single Item:

Open a dataset
Click Add Item
Enter JSON for input and expected fields
Optionally add labels and metadata
Save the item

Bulk Import:

Click Import Items
Paste JSON array of items:

[
  {
    label: "test-1",
    input: "What is 2+2?",
    expected: "4",
    extra: { category: "math" },
  },
  {
    label: "test-2",
    input: "What is the capital of France?",
    expected: "Paris",
    extra: { category: "geography" },
  },
];

Version Management

Datasets automatically version when items change:

Each modification creates a new version
Versions are numbered sequentially (v1, v2, v3)
Previous versions remain immutable
Experiments reference specific versions

Working with Dataset Items

Input Formats

Input fields accept any JSON-serializable value:

// String input
{ input: "Translate 'hello' to Spanish" }

// Object input
{ input: { text: "Hello", targetLang: "es" } }

// Array input
{ input: ["item1", "item2", "item3"] }

// Complex nested structure
{
  input: {
    messages: [
      { role: "user", content: "Hi" },
      { role: "assistant", content: "Hello" }
    ],
    context: { userId: "123", sessionId: "abc" }
  }
}

Expected Output Patterns

Expected values are compared by scorers:

// Exact string match
{ expected: "The answer is 42" }

// Numeric comparison
{ expected: 3.14159 }

// Structured data
{ expected: { status: "success", result: 100 } }

// Partial matching with extra metadata
{
  expected: "Paris",
  extra: {
    acceptableAnswers: ["Paris", "Paris, France"],
    scoreThreshold: 0.8
  }
}

Using Extra Metadata

The extra field provides context without affecting scoring:

{
  input: "Summarize this article",
  expected: "Key points of the article...",
  extra: {
    articleLength: 500,
    domain: "technology",
    tags: ["ai", "machine-learning"],
    sourceUrl: "https://example.com/article",
    testPriority: "high"
  }
}

Dataset Registration

// register-datasets.ts
import { registerExperimentDataset } from "@voltagent/evals";
import mathDatasetJson from "./.voltagent/datasets/math-problems.json";
import qaDatasetJson from "./data/qa-pairs.json";

// Register a JSON dataset
registerExperimentDataset({
  name: "math-problems",
  items: mathDatasetJson.data,
});

// Register another JSON dataset
registerExperimentDataset({
  name: "qa-pairs",
  items: qaDatasetJson.data,
});

// Register with VoltOps integration
registerExperimentDataset({
  name: "production-qa",
  descriptor: {
    id: "dataset_abc123",
    versionId: "version_xyz789",
  },
});

// Register with async data loader
registerExperimentDataset({
  name: "dynamic-dataset",
  resolver: async ({ limit, signal }) => {
    // Load data from API, database, etc.
    const response = await fetch("https://api.example.com/test-data", { signal });
    const data = await response.json();
    return {
      items: limit ? data.items.slice(0, limit) : data.items,
      total: data.total,
    };
  },
});

Advanced Dataset Features

Async Dataset Resolvers

Load datasets dynamically with resolver functions:

registerExperimentDataset({
  name: "api-dataset",
  resolver: async ({ limit, signal }) => {
    // Parameters:
    // - limit: Maximum items requested
    // - signal: AbortSignal for cancellation

    const items = await loadFromDatabase(limit);

    return {
      items, // Array or AsyncIterable of items
      total: await getTotalCount(), // Optional total hint
      dataset: {
        // Optional metadata
        id: "db-dataset-1",
        name: "Database Dataset",
        metadata: { source: "postgres" },
      },
    };
  },
});

Evaluation Datasets

Dataset Structure​

Field Descriptions​

Dataset Item Structure​

Creating Datasets​

JSON Files​

Inline Datasets​

CLI Commands​

Push Dataset to VoltOps​

Pull Dataset from VoltOps​

VoltOps Console​

Creating Datasets​

Adding Items​

Version Management​

Working with Dataset Items​

Input Formats​

Expected Output Patterns​

Using Extra Metadata​

Dataset Registration​

Advanced Dataset Features​

Async Dataset Resolvers​

Table of Contents

Dataset Structure

Field Descriptions

Dataset Item Structure

Creating Datasets

JSON Files

Inline Datasets

CLI Commands

Push Dataset to VoltOps

Pull Dataset from VoltOps

VoltOps Console

Creating Datasets

Adding Items

Version Management

Working with Dataset Items

Input Formats

Expected Output Patterns

Using Extra Metadata

Dataset Registration

Advanced Dataset Features

Async Dataset Resolvers