The minimal runner.sh
#!/bin/bash
set -uo pipefail
# Ensure curl is available (the only thing we can't install without)
(which curl >/dev/null 2>&1) || \
(apt-get update -qq && apt-get install -y -qq curl) >/dev/null 2>&1 || true
# Install uv + agent
curl -LsSf https://astral.sh/uv/install.sh | sh >/dev/null 2>&1
export PATH="$HOME/.local/bin:$PATH"
uv pip install --system my-agent >/dev/null 2>&1
cd "$WORKING_DIR"
my-agent --task "$PROBLEM_STATEMENT"
Your agent gets the problem, works on the files, and exits. The platform handles everything else.
But most real agents need a bit more. Here are the complete patterns.
The two rules
Before diving into patterns, internalize these — they’re the #1 cause of failures:
Keep stdout clean during install. The platform captures all stdout. If pip install or apt-get noise goes to stdout, it drowns out your agent’s actual answer. Redirect install output to >/dev/null 2>&1 or >&2.
Use tee, never >. If you redirect agent output to a file with >, stdout is empty and the platform can’t see the answer. Use | tee file to write to both.
Pattern: pip package
#!/bin/bash
# Benchspan agent: My Agent
# Env: MY_API_KEY
set -uo pipefail
# System deps
(which curl >/dev/null 2>&1) || \
(apt-get update -qq && apt-get install -y -qq curl) >/dev/null 2>&1 || true
# Install uv (works even if pip/python aren't on PATH)
curl -LsSf https://astral.sh/uv/install.sh | sh >/dev/null 2>&1
export PATH="$HOME/.local/bin:$PATH"
# Install agent
uv pip install --system my-agent >/dev/null 2>&1
# Run
cd "$WORKING_DIR"
my-agent --task "$PROBLEM_STATEMENT" \
2>"$OUTPUT_DIR/agent_stderr.log" | tee "$OUTPUT_DIR/agent.log"
We recommend uv over pip — it’s faster, handles Python version management, and works reliably across all container images. If you’re sure pip is available, pip install my-agent >/dev/null 2>&1 works too.
Pattern: npm package
#!/bin/bash
# Benchspan agent: My Agent
# Env: MY_API_KEY
set -uo pipefail
# System deps (slim images may lack curl/xz)
(which curl >/dev/null 2>&1 && which xz >/dev/null 2>&1) || \
(apt-get update -qq && apt-get install -y -qq curl xz-utils) >/dev/null 2>&1 || true
# Node.js
curl -fsSL https://nodejs.org/dist/v22.14.0/node-v22.14.0-linux-x64.tar.xz \
| tar -xJ -C /usr/local --strip-components=1
# Install (noise to stderr)
npm install -g my-agent 2>&1 | tail -5 >&2
# Run
cd "$WORKING_DIR"
my-agent --task "$PROBLEM_STATEMENT" \
2>"$OUTPUT_DIR/agent_stderr.log" | tee "$OUTPUT_DIR/agent.log"
Pattern: build from source (Python/uv)
Place runner.sh at the root of your repo. Point --agent at the repo directory.
Also create a .benchspanignore file to exclude files that don’t need to be in the container (the CLI always excludes .git, __pycache__, node_modules, .venv, and *.pyc by default):
# .benchspanignore
dist/
build/
*.egg-info/
tests/fixtures/large/
#!/bin/bash
# Benchspan agent: My Agent
# Env: LLM_API_KEY, LLM_MODEL
set -uo pipefail
# System deps
(which curl >/dev/null 2>&1 && which git >/dev/null 2>&1) || \
(apt-get update -qq && apt-get install -y -qq curl git) >/dev/null 2>&1 || true
# Install uv
curl -LsSf https://astral.sh/uv/install.sh | sh >/dev/null 2>&1
export PATH="$HOME/.local/bin:$PATH"
# Build from source (uv auto-downloads the right Python version)
cd /runner
uv sync --python 3.12 >/dev/null 2>&1
# CRITICAL: point agent at the task directory, not /runner
export MY_AGENT_WORK_DIR="$WORKING_DIR"
# Run
cd "$WORKING_DIR"
uv run --directory /runner my-agent \
--headless -t "$PROBLEM_STATEMENT" \
2>"$OUTPUT_DIR/agent_stderr.log" | tee "$OUTPUT_DIR/agent.log"
/runner/ is where your codebase lives in the container. $WORKING_DIR is the benchmark task. Your agent must work on $WORKING_DIR. If your agent has a work directory config, set it explicitly.
Pattern: build from source (Node.js)
#!/bin/bash
# Benchspan agent: My Agent
# Env: API_KEY
set -uo pipefail
# System deps + Node.js
(which curl >/dev/null 2>&1 && which xz >/dev/null 2>&1) || \
(apt-get update -qq && apt-get install -y -qq curl xz-utils) >/dev/null 2>&1 || true
curl -fsSL https://nodejs.org/dist/v22.14.0/node-v22.14.0-linux-x64.tar.xz \
| tar -xJ -C /usr/local --strip-components=1
# Build from source
cd /runner
npm ci --production >/dev/null 2>&1
# Run
cd "$WORKING_DIR"
node /runner/dist/index.js \
--task "$PROBLEM_STATEMENT" \
2>"$OUTPUT_DIR/agent_stderr.log" | tee "$OUTPUT_DIR/agent.log"
Non-root user workaround
Some agents refuse to run as root. Create a non-root user and use su:
useradd -m -s /bin/bash benchkit 2>/dev/null || true
chown -R benchkit:benchkit "$OUTPUT_DIR" 2>/dev/null || true
chmod -R 777 "$WORKING_DIR" 2>/dev/null || true
# Write a wrapper script (avoids quoting issues with su -c)
cat > /tmp/run_agent.sh << 'RUNEOF'
#!/bin/bash
cd "$WORKING_DIR"
export HOME=/home/benchkit
my-agent --task "$PROBLEM_STATEMENT" \
2>"$OUTPUT_DIR/agent_stderr.log" | tee "$OUTPUT_DIR/agent.log"
RUNEOF
chmod +x /tmp/run_agent.sh
su benchkit -p -c "bash /tmp/run_agent.sh"
Optional: trajectory.json
For token usage analytics, write telemetry to $OUTPUT_DIR/trajectory.json:
{
"schema_version": "1.0",
"instance_id": "$INSTANCE_ID",
"model": "claude-sonnet-4-6",
"total_tokens": 48500,
"prompt_tokens": 36000,
"completion_tokens": 12500,
"steps": [
{"step": 1, "type": "tool_call", "tool": "Bash", "latency_ms": 310}
]
}
If your agent outputs JSONL or structured logs, parse them in a python3 heredoc at the end of runner.sh. If not, a minimal trajectory still helps:
echo '{"schema_version":"1.0","instance_id":"'"$INSTANCE_ID"'","total_tokens":0,"steps":[]}' \
> "$OUTPUT_DIR/trajectory.json"
The # Env: convention
The second line of your runner.sh should declare what env vars the agent needs:
#!/bin/bash
# Benchspan agent: My Agent
# Env: LLM_API_KEY, LLM_MODEL, BENCHSPAN_MAX_TURNS (optional)
The CLI reads this line to check that required env vars are set on the dashboard before starting a run. Users see this info when they run benchspan agents.