Common Issues

Install noise polluting stdout

Symptom: Verifiers fail. Output is full of apt-get / pip / npm text instead of the agent’s answer. Fix: Redirect all install output away from stdout:

pip install my-agent >/dev/null 2>&1
(apt-get update -qq && apt-get install -y -qq curl) >/dev/null 2>&1 || true
npm install -g my-agent 2>&1 | tail -5 >&2

Agent output not captured

Symptom: /logs/agent/output.txt is empty. Verifiers say the answer is missing. Fix: Use tee, not >:

# Bad — stdout swallowed
my-agent --task "$PROBLEM_STATEMENT" > "$OUTPUT_DIR/agent.log"

# Good — stdout preserved
my-agent --task "$PROBLEM_STATEMENT" 2>"$OUTPUT_DIR/stderr.log" | tee "$OUTPUT_DIR/agent.log"

Missing system dependencies

Symptom: curl: command not found, git: command not found, or xz errors. Fix: Check and install before using:

(which curl >/dev/null 2>&1 && which git >/dev/null 2>&1) || \
  (apt-get update -qq && apt-get install -y -qq curl git) >/dev/null 2>&1 || true

Agent refuses to run as root

Symptom: Agent exits with “cannot run as root” or similar. Fix: Create a non-root user:

useradd -m -s /bin/bash benchkit 2>/dev/null || true
chown -R benchkit:benchkit "$OUTPUT_DIR" 2>/dev/null || true
chmod -R 777 "$WORKING_DIR" 2>/dev/null || true
su benchkit -p -c "bash /tmp/run_agent.sh"

Python version mismatch

Symptom: uv sync fails or hangs. Error about requires-python. Fix: Use uv’s Python management:

uv sync --python 3.12 >/dev/null 2>&1

Conda PATH issues

Symptom: python: command not found in conda-based benchmark images. Fix: Source conda explicitly:

eval "$(conda shell.bash hook 2>/dev/null)" && conda activate base 2>/dev/null || true

Agent works on wrong directory

Symptom: Agent edits files in /runner/ (its own code) instead of the benchmark task. Fix: Set your agent’s working directory to $WORKING_DIR:

export MY_AGENT_WORK_DIR="$WORKING_DIR"
cd "$WORKING_DIR"

Agent says “not configured” or “missing API key”

Symptom: Agent errors about missing settings in headless mode. Fix: Two things to check:

Make sure the required env vars are set on your dashboard
If the agent normally reads from a config file, pass a flag like --override-with-envs to read from env vars instead

Quoting issues with $PROBLEM_STATEMENT

Symptom: Agent gets a truncated or mangled problem statement. Fix: Always double-quote. For sub-scripts, use single-quoted heredocs:

cat > /tmp/run.sh << 'RUNEOF'
#!/bin/bash
my-agent --task "$PROBLEM_STATEMENT"
RUNEOF
bash /tmp/run.sh

Timeout

Symptom: Run shows “timed out” error. Fix: Check for interactive prompts (missing --headless flag), or use a faster model for testing. The healthcheck tasks have a 300s timeout — if install + agent takes longer, optimize your install step.

Getting Started

Onboard Your Agent

Built-in Agents

Benchmarks

Install noise polluting stdout

Agent output not captured

Missing system dependencies

Agent refuses to run as root

Python version mismatch

Conda PATH issues

Agent works on wrong directory

Agent says “not configured” or “missing API key”

Quoting issues with $PROBLEM_STATEMENT

Timeout

Getting Started

Onboard Your Agent

Built-in Agents

Benchmarks

​Install noise polluting stdout

​Agent output not captured

​Missing system dependencies

​Agent refuses to run as root

​Python version mismatch

​Conda PATH issues

​Agent works on wrong directory

​Agent says “not configured” or “missing API key”

​Quoting issues with $PROBLEM_STATEMENT

​Timeout

Install noise polluting stdout

Agent output not captured

Missing system dependencies

Agent refuses to run as root

Python version mismatch

Conda PATH issues

Agent works on wrong directory

Agent says “not configured” or “missing API key”

Quoting issues with $PROBLEM_STATEMENT

Timeout