Install noise polluting stdout
Symptom: Verifiers fail. Output is full ofapt-get / pip / npm text instead of the agent’s answer.
Fix: Redirect all install output away from stdout:
Agent output not captured
Symptom:/logs/agent/output.txt is empty. Verifiers say the answer is missing.
Fix: Use tee, not >:
Missing system dependencies
Symptom:curl: command not found, git: command not found, or xz errors.
Fix: Check and install before using:
Agent refuses to run as root
Symptom: Agent exits with “cannot run as root” or similar. Fix: Create a non-root user:Python version mismatch
Symptom:uv sync fails or hangs. Error about requires-python.
Fix: Use uv’s Python management:
Conda PATH issues
Symptom:python: command not found in conda-based benchmark images.
Fix: Source conda explicitly:
Agent works on wrong directory
Symptom: Agent edits files in/runner/ (its own code) instead of the benchmark task.
Fix: Set your agent’s working directory to $WORKING_DIR:
Agent says “not configured” or “missing API key”
Symptom: Agent errors about missing settings in headless mode. Fix: Two things to check:- Make sure the required env vars are set on your dashboard
- If the agent normally reads from a config file, pass a flag like
--override-with-envsto read from env vars instead
Quoting issues with $PROBLEM_STATEMENT
Symptom: Agent gets a truncated or mangled problem statement. Fix: Always double-quote. For sub-scripts, use single-quoted heredocs:Timeout
Symptom: Run shows “timed out” error. Fix: Check for interactive prompts (missing--headless flag), or use a faster model for testing. The healthcheck tasks have a 300s timeout — if install + agent takes longer, optimize your install step.