Skip to main content

The minimal runner.sh

#!/bin/bash
set -uo pipefail

# Ensure curl is available (the only thing we can't install without)
(which curl >/dev/null 2>&1) || \
  (apt-get update -qq && apt-get install -y -qq curl) >/dev/null 2>&1 || true

# Install uv + agent
curl -LsSf https://astral.sh/uv/install.sh | sh >/dev/null 2>&1
export PATH="$HOME/.local/bin:$PATH"
uv pip install --system my-agent >/dev/null 2>&1

cd "$WORKING_DIR"
my-agent --task "$PROBLEM_STATEMENT"
Your agent gets the problem, works on the files, and exits. The platform handles everything else. But most real agents need a bit more. Here are the complete patterns.

The two rules

Before diving into patterns, internalize these — they’re the #1 cause of failures:
Keep stdout clean during install. The platform captures all stdout. If pip install or apt-get noise goes to stdout, it drowns out your agent’s actual answer. Redirect install output to >/dev/null 2>&1 or >&2.
Use tee, never >. If you redirect agent output to a file with >, stdout is empty and the platform can’t see the answer. Use | tee file to write to both.

Pattern: pip package

#!/bin/bash
# Benchspan agent: My Agent
# Env: MY_API_KEY
set -uo pipefail

# System deps
(which curl >/dev/null 2>&1) || \
  (apt-get update -qq && apt-get install -y -qq curl) >/dev/null 2>&1 || true

# Install uv (works even if pip/python aren't on PATH)
curl -LsSf https://astral.sh/uv/install.sh | sh >/dev/null 2>&1
export PATH="$HOME/.local/bin:$PATH"

# Install agent
uv pip install --system my-agent >/dev/null 2>&1

# Run
cd "$WORKING_DIR"
my-agent --task "$PROBLEM_STATEMENT" \
  2>"$OUTPUT_DIR/agent_stderr.log" | tee "$OUTPUT_DIR/agent.log"
We recommend uv over pip — it’s faster, handles Python version management, and works reliably across all container images. If you’re sure pip is available, pip install my-agent >/dev/null 2>&1 works too.

Pattern: npm package

#!/bin/bash
# Benchspan agent: My Agent
# Env: MY_API_KEY
set -uo pipefail

# System deps (slim images may lack curl/xz)
(which curl >/dev/null 2>&1 && which xz >/dev/null 2>&1) || \
  (apt-get update -qq && apt-get install -y -qq curl xz-utils) >/dev/null 2>&1 || true

# Node.js
curl -fsSL https://nodejs.org/dist/v22.14.0/node-v22.14.0-linux-x64.tar.xz \
  | tar -xJ -C /usr/local --strip-components=1

# Install (noise to stderr)
npm install -g my-agent 2>&1 | tail -5 >&2

# Run
cd "$WORKING_DIR"
my-agent --task "$PROBLEM_STATEMENT" \
  2>"$OUTPUT_DIR/agent_stderr.log" | tee "$OUTPUT_DIR/agent.log"

Pattern: build from source (Python/uv)

Place runner.sh at the root of your repo. Point --agent at the repo directory. Also create a .benchspanignore file to exclude files that don’t need to be in the container (the CLI always excludes .git, __pycache__, node_modules, .venv, and *.pyc by default):
# .benchspanignore
dist/
build/
*.egg-info/
tests/fixtures/large/
#!/bin/bash
# Benchspan agent: My Agent
# Env: LLM_API_KEY, LLM_MODEL
set -uo pipefail

# System deps
(which curl >/dev/null 2>&1 && which git >/dev/null 2>&1) || \
  (apt-get update -qq && apt-get install -y -qq curl git) >/dev/null 2>&1 || true

# Install uv
curl -LsSf https://astral.sh/uv/install.sh | sh >/dev/null 2>&1
export PATH="$HOME/.local/bin:$PATH"

# Build from source (uv auto-downloads the right Python version)
cd /runner
uv sync --python 3.12 >/dev/null 2>&1

# CRITICAL: point agent at the task directory, not /runner
export MY_AGENT_WORK_DIR="$WORKING_DIR"

# Run
cd "$WORKING_DIR"
uv run --directory /runner my-agent \
  --headless -t "$PROBLEM_STATEMENT" \
  2>"$OUTPUT_DIR/agent_stderr.log" | tee "$OUTPUT_DIR/agent.log"
/runner/ is where your codebase lives in the container. $WORKING_DIR is the benchmark task. Your agent must work on $WORKING_DIR. If your agent has a work directory config, set it explicitly.

Pattern: build from source (Node.js)

#!/bin/bash
# Benchspan agent: My Agent
# Env: API_KEY
set -uo pipefail

# System deps + Node.js
(which curl >/dev/null 2>&1 && which xz >/dev/null 2>&1) || \
  (apt-get update -qq && apt-get install -y -qq curl xz-utils) >/dev/null 2>&1 || true
curl -fsSL https://nodejs.org/dist/v22.14.0/node-v22.14.0-linux-x64.tar.xz \
  | tar -xJ -C /usr/local --strip-components=1

# Build from source
cd /runner
npm ci --production >/dev/null 2>&1

# Run
cd "$WORKING_DIR"
node /runner/dist/index.js \
  --task "$PROBLEM_STATEMENT" \
  2>"$OUTPUT_DIR/agent_stderr.log" | tee "$OUTPUT_DIR/agent.log"

Non-root user workaround

Some agents refuse to run as root. Create a non-root user and use su:
useradd -m -s /bin/bash benchkit 2>/dev/null || true
chown -R benchkit:benchkit "$OUTPUT_DIR" 2>/dev/null || true
chmod -R 777 "$WORKING_DIR" 2>/dev/null || true

# Write a wrapper script (avoids quoting issues with su -c)
cat > /tmp/run_agent.sh << 'RUNEOF'
#!/bin/bash
cd "$WORKING_DIR"
export HOME=/home/benchkit
my-agent --task "$PROBLEM_STATEMENT" \
  2>"$OUTPUT_DIR/agent_stderr.log" | tee "$OUTPUT_DIR/agent.log"
RUNEOF
chmod +x /tmp/run_agent.sh

su benchkit -p -c "bash /tmp/run_agent.sh"

Optional: trajectory.json

For token usage analytics, write telemetry to $OUTPUT_DIR/trajectory.json:
{
  "schema_version": "1.0",
  "instance_id": "$INSTANCE_ID",
  "model": "claude-sonnet-4-6",
  "total_tokens": 48500,
  "prompt_tokens": 36000,
  "completion_tokens": 12500,
  "steps": [
    {"step": 1, "type": "tool_call", "tool": "Bash", "latency_ms": 310}
  ]
}
If your agent outputs JSONL or structured logs, parse them in a python3 heredoc at the end of runner.sh. If not, a minimal trajectory still helps:
echo '{"schema_version":"1.0","instance_id":"'"$INSTANCE_ID"'","total_tokens":0,"steps":[]}' \
  > "$OUTPUT_DIR/trajectory.json"

The # Env: convention

The second line of your runner.sh should declare what env vars the agent needs:
#!/bin/bash
# Benchspan agent: My Agent
# Env: LLM_API_KEY, LLM_MODEL, BENCHSPAN_MAX_TURNS (optional)
The CLI reads this line to check that required env vars are set on the dashboard before starting a run. Users see this info when they run benchspan agents.