Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.benchspan.com/llms.txt

Use this file to discover all available pages before exploring further.

Benchspan has two operating modes. You pick one when constructing the SDK; it applies to every scan from that instance.

block (default)

The SDK waits for the scan result. If an injection is detected, it raises an exception before the LLM call happens; the model never sees the poisoned content. Adds the scan latency (typically sub-100ms on tool outputs) to your agent’s critical path.
from benchspan import BenchGuard, InjectionDetectedError

guard = BenchGuard(api_key="ag_live_...", mode="block")

try:
    result = llm.invoke(messages, config={"callbacks": [guard]})
except InjectionDetectedError as e:
    print(f"Blocked: score={e.result.score:.4f}")
    # Return a safe error to your user, log the incident, alert, etc.
Use block in production. It’s the default for a reason: an injection that reaches the LLM is already damage, even if you catch it in logs afterwards.

warn

Zero latency. The SDK fires the scan in the background (daemon thread in Python, unawaited Promise in TypeScript) and the LLM call proceeds immediately with no added wait. Detection still happens; the verdict lands in your dashboard logs asynchronously and a warning is logged locally, but the agent never pauses. Useful for:
  • Evaluating false-positive rate on real traffic before enforcing. You see every would-be block in the dashboard without affecting production latency.
  • Shadow deployments: running Benchspan in parallel with your existing controls to compare coverage.
  • Latency-critical agents where you’d rather observe than block. Voice, real-time chat, any flow where even a sub-100ms stall matters.
guard = BenchGuard(api_key="ag_live_...", mode="warn")

result = llm.invoke(messages, config={"callbacks": [guard]})
# Returns immediately. Scan runs in a daemon thread and logs a warning
# (+ updates the dashboard) if an injection is detected.
Warn mode is fire-and-forget. If you explicitly call guard.scan(...) and await it, you get the synchronous verdict back; the zero-latency behavior only applies to the framework integrations (callbacks, hooks, middleware).
1

Start in warn mode

Deploy with mode="warn". Watch your dashboard for injections and false positives on real traffic for a few days. No user-visible latency impact.
2

Tune if needed

If you see legitimate content being flagged, send us a sample at founders@benchspan.com. For high-volume deployments, we can train a custom model on your traffic; reach out.
3

Switch to block

Once your false-positive rate is acceptable and the latency budget allows it, flip mode="block". Same SDK, one-line change.
Do not run without Benchspan in production with the assumption that your system prompt alone will prevent injection. Every major published IPI attack has broken system-prompt-only defenses.