Neural Chunker
Segments text using a boundary detector function, then applies token chunking to enforce budgets.
Usage
import { NeuralChunker } from "@voltagent/rag";
// Detector returns character offsets where chunks should split
const detector = (text: string) => [25, 60];
const chunks = await new NeuralChunker().chunk(
"First segment that is long enough. Second segment. Third segment.",
{ detector, maxTokens: 120 }
);
// Output:
// [
// { content: "First segment that is long enough.", metadata: { sourceType: "neural" } },
// { content: " Second segment.", metadata: { sourceType: "neural" } },
// { content: " Third segment.", metadata: { sourceType: "neural" } },
// ]
If any detector-produced chunk exceeds the token budget, it is further split with token chunking and labeled sourceType: "neural-token".
Options
detector(required):(text) => number[] | Promise<number[]>returning character offsetsmaxTokens(number): token budget for each detector or fallback chunk, default300tokenizer:{ tokenize(text), countTokens(text) }, default tiktoken (cl100k_base)label(string): prefix for chunk ids, default"neural"
tokenizer may be provided on the constructor or per-call to adjust token counts.
Output
metadata.sourceType: "neural"for detector chunksmetadata.sourceType: "neural-token"for token fallback chunks