Retries and Fallbacks
VoltAgent supports per-call retries and ordered model fallback lists. Retries are handled by VoltAgent with exponential backoff. After retries are exhausted for a model, VoltAgent attempts the next model in the list.
Configure Retries
You can set a default retry count at the agent level, then override it per model or per call.
import { Agent } from "@voltagent/core";
const agent = new Agent({
name: "Support",
instructions: "Resolve tickets quickly.",
model: "openai/gpt-4o-mini",
maxRetries: 3,
});
// Per-call override
await agent.generateText("Summarize this ticket", { maxRetries: 0 });
maxRetriesis the number of retry attempts for a single model call (total attempts =maxRetries + 1).- Priority order: per-call
maxRetries> per-modelmaxRetries> agentmaxRetries> default3. 0disables retries for that call.- Retry delay uses exponential backoff: 1s, 2s, 4s, 8s (max 10s).
- Errors with
isRetryable: falseskip retries; fallback can still run.
Configure Fallbacks
Provide a list of models. VoltAgent tries them in order until one succeeds. Each entry can have its own retry count and can be toggled on/off.
import { Agent } from "@voltagent/core";
import { anthropic } from "@ai-sdk/anthropic";
const agent = new Agent({
name: "FallbackAgent",
instructions: "Be concise.",
model: [
{ id: "primary", model: "openai/gpt-4o-mini", maxRetries: 2 },
{ id: "secondary", model: anthropic("claude-3-5-sonnet"), maxRetries: 1 },
{ id: "tertiary", model: "google/gemini-2.0-flash", enabled: true },
],
});
Notes:
- The first enabled model is the primary.
maxRetriesfalls back to the agent default if omitted.enabled: falselets you keep a model configured but temporarily disabled.
How Fallback Works
VoltAgent tries each enabled model in order:
- Resolve the model (dynamic or static).
- Execute the call with that model, retrying up to
maxRetrieson error. - If retries are exhausted, move to the next model.
Fallback does not trigger for:
- Abort or bail errors.
- Guardrail blocks.
- Tool execution errors.
Streaming behavior:
- Retries and fallback happen only if the stream fails before the first output chunk.
- If a stream fails after output starts, the error is surfaced to the caller and the stream is not restarted.
Middleware Retries
Middleware retries are separate from model retries and fallback. A middleware can call abort("reason", { retry: true }) to restart the full attempt.
- Controlled by
maxMiddlewareRetries(agent-level or per-call). - Each retry reruns the full pipeline: middleware, guardrails, hooks, model selection, model retries, and fallback.
- In streaming calls, only input middlewares can trigger a retry before the stream starts.
See Middleware for configuration and API details.
Real-World Scenarios
1. Provider Outage During a Traffic Spike
The primary provider returns 5xx errors during a traffic spike. Configure a fallback provider with its own retry count.
const agent = new Agent({
name: "LaunchAssistant",
instructions: "Handle onboarding questions.",
model: [
{ model: "openai/gpt-4o-mini", maxRetries: 1 },
{ model: "anthropic/claude-3-5-sonnet", maxRetries: 2 },
],
});
Expected behavior:
- Primary errors are retried up to
maxRetries. - After retries, the secondary provider is attempted.
2. Rate Limits During Peak Hours
The primary model returns 429 during peak hours. Set maxRetries: 0 on the primary and allow retries on the fallback.
const agent = new Agent({
name: "PeakTrafficAgent",
instructions: "Answer pricing questions.",
model: [
{ model: "openai/gpt-4o-mini", maxRetries: 0 },
{ model: "groq/llama-3.3-70b-versatile", maxRetries: 2 },
],
});
Expected behavior:
- The primary fails without retry delay.
- The fallback is attempted and can retry if needed.
3. Timeouts in One Region
Requests in a region start timing out. Configure a regional primary with a global fallback.
const agent = new Agent({
name: "RegionalAgent",
instructions: "Support users in APAC.",
model: [
{ model: "google/gemini-2.0-flash", maxRetries: 1 },
{ model: "openai/gpt-4o-mini", maxRetries: 1 },
],
});
Expected behavior:
- Timeouts on the primary move the call to the fallback after retries.
- The fallback handles calls that the primary fails.
4. Compliance or Data Residency Constraints
Some users require a specific provider for data residency. Select the primary model dynamically and keep a fallback entry.
const agent = new Agent({
name: "ComplianceAgent",
instructions: "Handle regulated requests.",
model: [
{
model: async ({ context }) => {
const region = (context.get("region") as string) || "us";
return region === "eu" ? "mistral/mistral-large-latest" : "openai/gpt-4o-mini";
},
maxRetries: 1,
},
{ model: "anthropic/claude-3-5-sonnet", maxRetries: 1 },
],
});
Expected behavior:
- The model selection uses
contextvalues on each call. - If the selected model fails, the fallback is attempted.
5. Cost Control with Fallback
Use a lower-cost primary model and a second model for error cases.
const agent = new Agent({
name: "CostAwareAgent",
instructions: "Handle receipts and billing questions.",
model: [
{ model: "openai/gpt-4o-mini", maxRetries: 1 },
{ model: "openai/gpt-4o", maxRetries: 0 },
],
});
Expected behavior:
- The primary handles successful calls.
- The fallback is only used when the primary errors.
Troubleshooting Tips
- If fallbacks never trigger, check for guardrail blocks or tool errors.
- If retries are skipped, check whether the provider marks the error as
isRetryable: false. - If all configured models are disabled, VoltAgent throws
MODEL_LIST_EMPTY. - For long streaming sessions, consider application-level retry on stream errors.
- Keep the fallback list short to avoid long end-to-end latency when providers are degraded.