JSON Chunker
Walks JSON into path: value lines and chunks the combined text. Invalid JSON falls back to token chunking.
Usage
import { JsonChunker } from "@voltagent/rag";
const json = JSON.stringify(
{ user: { name: "alice", age: 30 }, tags: ["a", "b"], active: true },
null,
2
);
const chunks = new JsonChunker().chunk(json, { maxTokens: 120 });
// Output:
// [
// {
// content: "user.name: alice\nuser.age: 30\ntags.0: a\ntags.1: b\nactive: true",
// metadata: {
// sourceType: "json",
// fields: 5,
// samplePaths: [
// { path: "user.name", line: 1 },
// { path: "user.age", line: 2 },
// { path: "tags.0", line: 3 },
// ],
// },
// },
// ]
If the JSON cannot be parsed, the raw string is split with token chunking.
Options
maxTokens(number): token budget per chunk, default300tokenizer:{ tokenize(text), countTokens(text) }, default tiktoken (cl100k_base)label(string): prefix for chunk ids, default"json"
tokenizer can be set on the constructor or per-call.
Output
metadata.fields: count of walked fieldsmetadata.samplePaths: first few paths with their line numbers in the flattened viewmetadata.sourceType: "json"
Limitations
- Paths are flattened strings; schema-aware typing or attribute metadata is not added.
- Only path/value/line are emitted; complex structures (e.g., arrays of objects) are still flattened into
path: valuelines.