Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.benchspan.com/llms.txt

Use this file to discover all available pages before exploring further.

pip install benchspan
  • Python 3.9+
  • One runtime dependency: httpx
  • Optional extra: benchspan[langchain] (only if you’re using LangChain / CrewAI)

BenchGuard

from benchspan import BenchGuard

guard = BenchGuard(
    api_key="ag_live_...",           # required
    agent="email-agent",              # optional: tags scans in dashboard
    mode="block",                     # optional: "block" (default) or "warn"
    api_url="https://api.benchspan.com",  # optional: override for self-hosted
)

Constructor arguments

api_key
str
required
Bearer API key from Dashboard → API Keys. Format: ag_live_....
agent
str | None
Optional label to tag every scan from this instance. Useful for filtering per-agent traffic in the dashboard.
mode
"block" | "warn"
default:"\"block\""
In block mode, callbacks/hooks raise InjectionDetectedError on detection (synchronous scan; adds typical sub-100ms scan latency). In warn mode, the scan runs in a daemon thread and the LLM call proceeds with zero added latency; detections land in your dashboard asynchronously. See Modes.
api_url
str
default:"https://api.benchspan.com"
Override the API host. Used for self-hosted deployments.

Methods

guard.scan(input, role="user", source=None) → ScanResult

Scan a single string synchronously. Returns the verdict; does not raise on injection. Use framework integrations or wrap for auto-raise behavior.
result = guard.scan("some text", role="tool", source="gmail.get_email")

# result.injection      → bool
# result.score          → float, 0–1
# result.verdict        → "block" | "warn" | "pass"
# result.model_version  → str
# result.latency_ms     → int
# result.id             → str (UUID)

guard.scan_async(input, role="user", source=None) → ScanResult

Async variant of scan. Use inside async def functions.
result = await guard.scan_async("some text", role="tool")

@guard.wrap

Decorator for functions that call the LLM directly (raw OpenAI / Anthropic SDK, etc.). Scans the messages argument before the function runs.
from openai import OpenAI
client = OpenAI()

@guard.wrap
def call_llm(messages):
    return client.chat.completions.create(model="gpt-5", messages=messages)

result = call_llm(messages)
# Raises InjectionDetectedError BEFORE client.chat.completions.create is invoked
# if any user/tool message is classified as injection.

@guard.wrap_async

Async variant of @guard.wrap. For async def functions.
@guard.wrap_async
async def call_llm(messages):
    return await client.chat.completions.create(model="gpt-5", messages=messages)

LangChain / CrewAI callback

BenchGuard implements BaseCallbackHandler directly. Pass it where LangChain accepts callbacks:
llm.invoke(messages, config={"callbacks": [guard]})
# or
crew = Crew(agents=[...], tasks=[...], callbacks=[guard])
See LangChain integration and CrewAI integration.

guard.as_agent_hooks()

Returns an AgentHooksBase subclass (OpenAI Agents SDK) with on_tool_end wired up. See OpenAI Agents integration.
from agents import Agent
agent = Agent(name="...", model="gpt-5", hooks=guard.as_agent_hooks())

guard.as_adk_callback()

Returns a before_model_callback for Google ADK. See Google ADK integration.
from google.adk import LlmAgent
agent = LlmAgent(
    name="...",
    model="gemini-2.5-pro",
    before_model_callback=guard.as_adk_callback(),
)

Types

ScanResult

from dataclasses import dataclass

@dataclass
class ScanResult:
    id: str
    injection: bool
    score: float
    verdict: str  # "block" | "warn" | "pass"
    model_version: str
    latency_ms: int

InjectionDetectedError

class InjectionDetectedError(Exception):
    result: ScanResult  # attached for inspection
Raised by @guard.wrap, @guard.wrap_async, and all framework integrations when verdict == "block".
from benchspan import InjectionDetectedError

try:
    call_llm(messages)
except InjectionDetectedError as e:
    print(f"Blocked: score={e.result.score:.4f}, id={e.result.id}")

Logging

The SDK logs to the benchspan logger. Attach a handler to see warn detections and debug output:
import logging
logging.getLogger("benchspan").setLevel(logging.WARNING)

Thread / async safety

Each BenchGuard instance maintains a dedup cache of scanned message hashes, so don’t share an instance across unrelated agent runs if you need each run to get a fresh cache. A new instance is cheap; it’s just a config bag around httpx.