Skip to main content
Benchspan currently supports 27+ benchmarks covering coding, reasoning, math, function calling, and more.
benchspan benchmarks list

Available benchmarks

CategoryBenchmarks
CodingSWEbench, SWEbench-Pro, HumanEvalFix, AutoCodeBench, CRUSTBench, QuixBugs
ReasoningAIME, GPQA-Diamond, IneqMath, ReasoningGym
Function CallingBFCL
QASimpleQA, MMMLU, MMAU
Multi-modalARC-AGI-2
Data ScienceDS1000, DABStep
OtherGAIA, LawBench, QCircuitBench, ReplicationBench, SatBench, StrongReject, USACO

Onboarding new benchmarks

Benchmark onboarding is currently white-glove. If you have a specific benchmark you’d like to run on Benchspan, reach out to us and we’ll work with you to get it set up.

Contact us

Email avi@benchspan.com with your benchmark requirements.