Skip to main content
This guide uses a built-in agent (Claude Code) so you can see Benchspan in action without writing any code. To onboard your own agent, see Onboard Your Agent after this.

1. Install the CLI

pip install benchspan

2. Log in

benchspan login
This opens your browser to authenticate. Once logged in, the CLI stores your session locally. Verify with:
benchspan whoami

3. Add your API key

The built-in Claude Code agent needs an Anthropic API key. Go to your dashboard settings and add:
Env VarValue
ANTHROPIC_API_KEYYour Anthropic API key
You can see what env vars each built-in agent needs by running benchspan agents.

4. Run the healthcheck

benchspan run --benchmark agent-healthcheck.quick --agent claude-code
This runs 2 simple tasks against the built-in Claude Code agent. You should see results in about 30 seconds.

5. Check results

benchspan runs list
benchspan runs show <run_id>
Or view them on your dashboard.

6. Run a real benchmark

benchspan run --benchmark swebench --agent claude-code --instances 5
This runs 5 SWEbench instances. Each instance gives the agent a real GitHub issue to solve, then runs the project’s test suite to check if the fix works.

Next steps

Onboard your own agent

Write a runner.sh for your agent and test it against benchmarks.

See built-in agents

Browse pre-configured agents you can use immediately.