trajectory.json file to $OUTPUT_DIR with token usage, tool calls, and latency data. This powers the analytics on your dashboard — cost tracking, token breakdowns, latency percentiles, and tool call patterns.
It’s optional. Your agent will work fine without it. But if you want to understand what your agent is doing across hundreds of benchmark instances, trajectory data is how you get there.
The format
We designedtrajectory.json to be minimal and flexible. Only two fields are required:
Field reference
Top-level fields
| Field | Type | Required | Description |
|---|---|---|---|
schema_version | string | Yes | Always "1.0" |
instance_id | string | Yes | From $INSTANCE_ID env var |
model | string | No | Model name/ID used |
total_tokens | int | No | prompt_tokens + completion_tokens |
prompt_tokens | int | No | Total input tokens |
completion_tokens | int | No | Total output tokens |
total_latency_ms | int | No | Wall-clock time in milliseconds |
cache_read_tokens | int | No | Tokens served from cache |
cache_write_tokens | int | No | Tokens written to cache |
steps | array | No | Ordered list of agent actions |
Step fields
| Field | Type | Required | Description |
|---|---|---|---|
step | int | Yes (if steps) | 1-indexed step number |
type | string | Yes (if steps) | "tool_call", "model_call", or "observation" |
tool | string | No | Tool name (Bash, Edit, Read, etc.) |
input | any | No | Input to the tool or model |
output_tokens | int | No | Tokens produced in this step |
latency_ms | int | No | Wall-clock time for this step |
cache_hit | bool | No | Whether this step hit a cache |
What the dashboard computes
From trajectory data across all instances in a run, the dashboard shows:- Resolve rate — instances solved / total
- Token usage — avg, p50, p95 across instances
- Latency — avg, p50, p95 wall-clock time
- Tool calls — avg per instance, breakdown by tool name
- Cache hit rate — across all steps
- Cost — estimated from token counts and model pricing
Writing a converter
Your agent probably already logs its output in some format — JSONL events, structured logs, a custom format. You don’t need to change your agent. Just write a small converter that runs at the end ofrunner.sh and transforms your agent’s native output into trajectory.json.
Here’s the pattern:
Real examples
Claude Code (stream-json format)
Claude Code outputs JSONL withtype: "assistant" and type: "result" events:
Ante (event stream format)
Ante outputs JSONL withevent.ToolStart and event.UsageUpdate: