Install the CLI, log in, and run your first benchmark in 5 minutes.
This guide uses a built-in agent (Claude Code) so you can see Benchspan in action without writing any code. To onboard your own agent, see Onboard Your Agent after this.
benchspan run --benchmark swebench --agent claude-code --instances 5
This runs 5 SWEbench instances. Each instance gives the agent a real GitHub issue to solve, then runs the project’s test suite to check if the fix works.