Skip to main content

The fastest way: use the onboarding skill

If you use a coding agent like Claude Code, Codex, or OpenCode, install our onboarding skill and it will do everything for you. Install the skill:
curl -fsSL https://benchspan.com/install-skill.sh | sh
This installs the /onboard-agent skill for your coding agent. Then start a session with your coding agent and run:
/onboard-agent
The skill will:
  1. Ask about your agent (where’s the code, what runtime, what env vars it needs)
  2. Explore your codebase to understand how to invoke it
  3. Generate a working runner.sh
  4. Help you set up env vars on the dashboard
  5. Run the healthcheck benchmark to verify everything works
  6. Diagnose and fix any failures automatically
Most agents are fully onboarded in a single session.
The skill follows the Agent Skills open standard and works with Claude Code, Cursor, Codex, GitHub Copilot, and other compatible agents.

Or do it manually

If you prefer to do it yourself, here’s the process:
1

Understand how it works

Your agent runs in a Docker container. It gets a problem statement and a working directory. It produces output. Benchspan grades the result.You write one file — runner.sh — that installs and runs your agent. That’s the entire integration.How it works
2

Write your runner.sh

Follow the pattern that matches your setup: pip package, npm package, binary, or build from source.Writing your runner.sh
3

Set env vars on the dashboard

Add your agent’s required env vars (API keys, model config) at your dashboard settings. Whatever you set there gets injected into the container.
4

Test with the healthcheck benchmark

Run 10 simple tasks that verify your runner.sh works across different environments.
benchspan run --benchmark agent-healthcheck --agent ./my-agent
Healthcheck details
5

Run a real benchmark

benchspan run --benchmark swebench --agent ./my-agent --instances 5

What you’ll need

  • A Benchspan account (sign up)
  • The CLI installed (pip install benchspan)
  • Your agent’s source code or published package
  • API keys for whatever LLM your agent uses