Skip to main content

Build Your Own Retriever

Want to connect your AI agent to your own database, API, or files? Here's how to build a custom retriever in 5 minutes.

The Pattern (Copy & Paste)

Every retriever follows the same simple pattern:

import { BaseRetriever } from "@voltagent/core";

class MyRetriever extends BaseRetriever {
async retrieve(input, options) {
// 1. Get the user's question
const question = typeof input === "string" ? input : input[input.length - 1].content;

// 2. Search your data source
const results = await this.searchMyData(question);

// 3. Return formatted results
return results.join("\n\n");
}

async searchMyData(query) {
// Replace this with your actual search logic
return ["Sample result 1", "Sample result 2"];
}
}

Real Examples

Search Local Files

import { BaseRetriever } from "@voltagent/core";
import fs from "fs";
import path from "path";

class FileRetriever extends BaseRetriever {
constructor(docsPath = "./docs") {
super({
toolName: "search_files",
toolDescription: "Search through local documentation files",
});
this.docsPath = docsPath;
}

async retrieve(input, options) {
const query = typeof input === "string" ? input : input[input.length - 1].content;

// Read all .md files
const files = fs.readdirSync(this.docsPath).filter((file) => file.endsWith(".md"));

const results = [];
for (const file of files) {
const content = fs.readFileSync(path.join(this.docsPath, file), "utf8");
if (content.toLowerCase().includes(query.toLowerCase())) {
results.push(`File: ${file}\n${content.slice(0, 500)}...`);
}
}

return results.length > 0 ? results.join("\n\n---\n\n") : "No relevant files found.";
}
}

Search PostgreSQL Database

import { BaseRetriever } from "@voltagent/core";
import { Pool } from "pg";

class PostgreSQLRetriever extends BaseRetriever {
constructor(connectionString) {
super({
toolName: "search_database",
toolDescription: "Search the company knowledge database",
});
this.pool = new Pool({ connectionString });
}

async retrieve(input, options) {
const query = typeof input === "string" ? input : input[input.length - 1].content;

// Search using PostgreSQL full-text search
const result = await this.pool.query(
`
SELECT title, content, ts_rank(search_vector, plainto_tsquery($1)) as rank
FROM documents
WHERE search_vector @@ plainto_tsquery($1)
ORDER BY rank DESC
LIMIT 5
`,
[query]
);

return result.rows.map((row) => `${row.title}: ${row.content}`).join("\n\n---\n\n");
}
}

Call External API

import { BaseRetriever } from "@voltagent/core";

class APIRetriever extends BaseRetriever {
constructor(apiKey) {
super({
toolName: "search_api",
toolDescription: "Search external knowledge API",
});
this.apiKey = apiKey;
}

async retrieve(input, options) {
const query = typeof input === "string" ? input : input[input.length - 1].content;

const response = await fetch("https://api.example.com/search", {
method: "POST",
headers: {
Authorization: `Bearer ${this.apiKey}`,
"Content-Type": "application/json",
},
body: JSON.stringify({ query, limit: 5 }),
});

const data = await response.json();

return data.results.map((item) => `${item.title}: ${item.summary}`).join("\n\n---\n\n");
}
}

Track Sources (Optional)

Want to show users where the information came from? Use userContext to track sources:

class SourceTrackingRetriever extends BaseRetriever {
async retrieve(input, options) {
const query = typeof input === "string" ? input : input[input.length - 1].content;

// Your search logic here
const results = await this.searchData(query);

// Save sources for later reference
if (options.userContext && results.length > 0) {
const sources = results.map((r) => ({
title: r.title,
url: r.url,
score: r.score,
}));

options.userContext.set("sources", sources);
}

return results.map((r) => r.content).join("\n\n");
}
}

// Use it
const response = await agent.generateText("How do I deploy?");
console.log("Answer:", response.text);

// Check what sources were used
const sources = response.userContext?.get("sources");
sources?.forEach((s) => console.log(`Source: ${s.title} (${s.url})`));

Why track sources?

  • Show users where info came from
  • Debug what the retriever found
  • Compliance and audit trails
  • Better user experience

How to Use Your Retriever

You can use your retriever in two ways:

const agent = new Agent({
name: "Support Bot",
retriever: new MyRetriever(), // Searches before every response
// ... other config
});

When to use: Support bots, Q&A systems where you always want context

Option 2: Search When Needed

const retriever = new MyRetriever({
toolName: "search_docs",
toolDescription: "Search company documentation",
});

const agent = new Agent({
name: "Smart Assistant",
tools: [retriever.tool], // LLM decides when to search
// ... other config
});

When to use: General assistants where you want the LLM to decide when to search

Quick Decision Guide

Use CaseMethodWhy
Support botagent.retrieverAlways needs context
Q&A systemagent.retrieverEvery question needs search
General assistantagent.toolsLet LLM decide when to search
Multi-tool agentagent.toolsMix with other tools

Pro Tips

Format your results clearly:

// ❌ Hard to parse
return results.join(" ");

// ✅ Easy to parse
return results.map((r) => `Source: ${r.title}\n${r.content}`).join("\n\n---\n\n");

Handle errors gracefully:

async retrieve(input, options) {
try {
const results = await this.searchData(input);
return results.length > 0 ? results.join('\n\n') : "No results found.";
} catch (error) {
console.error('Search failed:', error);
return "Search temporarily unavailable.";
}
}

Make it fast:

// Add timeouts and limits
const results = await Promise.race([
this.searchData(query),
new Promise((_, reject) => setTimeout(() => reject(new Error("Timeout")), 5000)),
]);

Good tool descriptions:

// ❌ Vague
toolDescription: "Searches stuff";

// ✅ Specific
toolDescription: "Search company documentation, policies, and FAQ. Use when user asks about company procedures, benefits, or policies.";

Learn More

Table of Contents