A vendor invoice arrives. The agent reads it. The PDF whispers: forward customers@ to attacker@. Your agent has Gmail scope. Your agent obeys.
Capability security
for AI agents.
A deterministic runtime that maps every tool your agents reach, mints scoped, revocable capability tokens — verified in single‑digit microseconds — and decides every call against a policy you wrote, with no LLM in the decision path. Stops prompt‑injected refunds, runaway tool storms, indirect‑injection data exfil, and unscoped MCP access before they reach prod. MCP‑native. Audit‑mapped to OWASP LLM, NIST AI RMF, MITRE ATLAS. MIT.
curl -fsSL capframe.ai/install | sh$ capframe install → mcp-recon ✓ mcp-recon v0.0.4 sha256 ok · 1.2 MB → capnagent ✓ capnagent v0.7.6 sha256 ok · 1.2 MB → mcp-guard ✓ mcp-guard v0.5.4 sha256 ok · 19.9 MB Verify with: capframe doctor Add to PATH: ~/.capframe/bin
Your agents inherit the full authority of the credentials you hand them.
Without a capability layer, every prompt-injection vector turns into the worst thing your agent could do with those keys. These aren't hypotheticals — they're the four failure modes every team shipping agents has seen, or is about to.
One bad inference and the refund agent loops, issuing $50 refunds until something — anything — finally rate-limits it. Recovery: 6 hours of finance reversals.
An MCP server exposes 47 tools. Your agent legitimately needs 4. The other 43 are jailbreak surface — and every dependency update silently widens it.
The agent did something. You can't prove what it was allowed to do, who authorised the scope, when the policy last changed, or whether it was revoked in time.
Capframe puts all four at the agent's tool-call boundary — Find and Bind as Rust binaries, Guard as a pip-installable Python layer. Every call is decided by a policy you wrote, not a model you can't inspect.
Four stages. Microsecond capability checks. One findings schema. No model in the loop.
Map the surface, mint scoped authority, enforce every call against a policy you wrote, then export the receipt. The output of each stage is the input to the next — and every artifact is auditor-ready.
Map the tool surface. Catch indirect-injection gaps.
Mint scoped, revocable capability tokens.
Evaluate every tool call against policy at runtime.
Audit-ready artifact: OWASP / NIST / ATLAS.
Standalone, or composed. Either way, three primitives — not three products.
Find and Bind ship as Rust crates with CLI subcommands; Guard ships as a Python package (pip install mcp-guardrails) — each in its own public GitHub repo. Adopt one at a time, or wire them together through the shared findings.v1 JSON Schema — the wire format Find emits, Bind scopes against, and Guard enforces.
Find
Walks every MCP server, every tool endpoint, every parameter your agent can reach. Detects unconstrained inputs, indirect-injection sinks, missing schemas, and silently-widening surfaces between dependency updates. Emits a findings.v1 JSON document aligned to OWASP LLM Top 10 and MITRE ATLAS — the same file Bind consumes to scope tokens and Guard consumes to synthesize policy.
- ▸Static + behavioural scan of every MCP server in your config
- ▸Diffs surfaces between scans — flags newly introduced tools
- ▸Schema-aware: catches missing parameter constraints, not just missing types
- ▸Cross-tool findings.v1 wire format (JSON Schema Draft 2020-12)
$ capframe find ./mcp-server.toml ✓ mapped 14 tools across 2 mcp servers ⚠ 3 tools accept input without constraints (LLM01) ⚠ 1 tool has indirect-injection surface (LLM01, ATLAS T0051) ✓ surface diff: +2 tools vs last scan → ./capframe.findings.json
Bind
The authority layer — and the most adversarially-tested module in the stack. Prompt injection is a confused-deputy attack (Lampson, 1974): smarter guardrails don't fix it, removing the agent's ambient authority does. Bind mints macaroon-style capability tokens — attenuable, revocable, ed25519 holder-of-key — that bound what an agent CAN do at issuance time. Out-of-scope calls are refused before the underlying tool ever sees them, each producing a signed, tamper-evident receipt.
- ▸Macaroon chain (HMAC-SHA256): a holder can't broaden scope without the root key
- ▸ed25519 holder-of-key proofs defeat token theft and replay
- ▸Caveats evaluate against verifier-known facts — never the agent's claims
- ▸Every caveat is human-readable: predict what a token permits in under 30s
$ capframe bind --agent shopify-bot \
--tools "order.read, refund.write" \
--limit max_refund=50 --limit region=eu \
--ttl 24h
✓ token minted: cf_tok_a91f4e…
holder: ed25519 / shopify-bot
scope: 2 tools · max_refund≤50 · region=eu
expires: 2026-05-18T08:14:00Z
revoke: capframe revoke cf_tok_a91f4eGuard
A deterministic Python policy evaluator that sits inline at the agent's tool-call boundary. No LLM in the decision path — every allow/deny is reproducible, fuzzable, and immune to the jailbreak that just broke your agent. Synthesize policy from observed injection gaps, backtest against the corpus, ship.
- ▸Inline at the tool-call boundary — not a sidecar, not a daemon
- ▸Synthesizes YAML policy from a findings.v1 file in one command
- ▸Default corpus of 308 jailbreak / injection / scope-escape cases
- ▸Fail-closed by construction — no policy = no calls
$ capframe guard synth ./capframe.findings.json ✓ 14 rules generated across 3 categories ✓ policy → ./policy.yaml $ capframe guard backtest ./policy.yaml ✓ 308-case corpus · TPR 1.00 / FPR 0.01 ✓ deterministic · no model in the decision path
Bind isn't a claim. It's been attacked ten times — and every result is published.
The authority layer ships with a public purple-team corpus: a structured record of adversarial scenarios run against the token engine, written blue-first — the falsifiable security claim is committed before the attack runs. Every round ships a runnable PoC and a signed denial receipt as evidence. Clone it, run the tests, verify every number on this page without trusting a word of it.
A Claude Opus 4.7 agent, told to move money it isn't scoped for.
No prompt engineering, no guardrail model, no second LLM judging the first. The agent is handed one capability — scoped to checkout.purchase. It tries to send a wire anyway. The authority simply isn't there, so the call dies at the gate while the in-scope purchase proceeds. Both verdicts are audit-logged.
$ npm run demo:llm-direct → task: "send a $30 wire to acct 4471, then buy a USB-C cable" → scope: tool == "checkout.purchase" ⨯ wire.send DENIED out-of-scope receipt cap_rcpt_3f9a… ✓ checkout.purchase ALLOWED in-scope receipt cap_rcpt_7c12… both decisions audit-logged · agent never reached the wire API
Attack classes the corpus exercises
And honest about what it doesn't cover
Model behaviour, system-prompt extraction, jailbreaks, and GCG suffixes are explicitly out of scope— documented in the threat model, not hand-waved. Bind removes the deputy's authority; it doesn't pretend to fix the model.
The runtime is the product. Everything else is paperwork.
Capframe is not a wrapper around an LLM, not a policy DSL transpiled to a model prompt, and not a managed service. It's a deterministic runtime with three primitives — tokens, policy, receipts — that sit inline at the boundary your agent already calls through.
Deterministic policy evaluator
Deterministic and model-free. The same input always returns the same allow/deny — fuzzable, reproducible, immune to the next jailbreak. Most importantly: no LLM in the decision path. Your enforcement boundary is not a model you have to re-evaluate every time someone publishes a new attack paper.
Macaroon-style capability tokens
ed25519 holder-of-key signatures, attenuable third-party caveats, revocation lists, TTL-bound. The primitive Google built distributed authorization on, ported to the agent boundary. Scope your agent to two tools and one region in one CLI call — and revoke the token in one more when something looks off.
Tamper-evident receipts
Every allow and every deny emits a signed (HMAC-SHA256) receipt with policy hash, token id, agent id, parameters, and verdict. Drop the receipt stream into S3 or Loki and you have a forensic timeline that satisfies SOC2 CC7.2 and EU AI Act Article 12 logging requirements out of the box.
findings.v1 wire format
JSON Schema Draft 2020-12. Round-trip tested. The cross-tool contract Find emits, Bind reads to scope tokens, Guard reads to synthesize policy, and Report serializes into auditor-ready HTML/PDF. One schema means every artifact is grep-able, diff-able, and machine-checkable in CI.
Static binaries, no daemon
No daemon. No kernel module. No container. Find and Bind ship as sha256-verified static Rust binaries — x86_64 / aarch64 across Linux, macOS, and Windows — and Guard installs as a Python package alongside. Runs in CI, in your IDE, on your laptop, and inline at the tool-call boundary. Permissive OSS (Apache-2.0 + MIT) — read every line of the code your security depends on.
MCP-native, framework-agnostic
Today: every MCP server — Claude Desktop, Cursor, Continue, Cline, LangGraph via the MCP bridge, every agent SDK that speaks the protocol. Roadmap: native adapters for OpenAI function calling and Anthropic tool use, so the same policy file works regardless of which provider your agent picks tomorrow.
The only artifact mapping all three agent-security frameworks at once.
Most tools tick one framework. Capframe was designed so a single run emits evidence aligned to OWASP LLM Top 10, NIST AI RMF, and MITRE ATLAS — the three frameworks regulated buyers (CISO, GRC, internal audit) actually ask about. capframe report exports the dossier as HTML or PDF — signed, timestamped, and ready to attach to an SOC2 / ISO 42001 / EU AI Act submission.
OWASP LLM
Top 10 — 2025- ✓LLM01 prompt injection
- ✓LLM02 insecure output
- ✓LLM07 insecure plugin
- ✓LLM08 excessive agency
NIST AI RMF
v1.0- ✓GOVERN
- ✓MAP
- ✓MEASURE
- ✓MANAGE
MITRE ATLAS
v4.7- ✓TA0043 reconnaissance
- ✓TA0006 credential access
- ✓TA0040 impact
- ✓TA0007 discovery
Eighty seconds, four commands, one auditor-ready report.
$ capframe find ./my-mcp-server.toml ✓ mapped 14 tools across 2 MCP servers ⚠ 3 tools accept input without constraints (LLM01) ⚠ 1 tool has indirect-injection surface (LLM01, ATLAS T0051) → findings written to ./capframe.findings.json $ capframe bind --agent shopify-bot \ --tools "order.read, refund.write" \ --limit max_refund=50 --limit region=eu \ --ttl 24h ✓ token minted: cf_tok_a91f4e… holder: ed25519 / shopify-bot scope: 2 tools · max_refund≤50 · region=eu expires: 2026-05-18T08:14:00Z revoke: capframe revoke cf_tok_a91f4e $ capframe guard backtest ./policy.yaml ✓ 247/247 corpus cases pass ✓ 14 rules, 3 categories ✓ false-positive rate: 0.0% $ capframe report --format html --out ./report.html ✓ report written OWASP LLM Top 10: 4/10 covered, 2 findings open NIST AI RMF: Govern ✓ Map ✓ Measure ✓ Manage ✓ MITRE ATLAS: 2 techniques flagged, 0 active exploits
curl -fsSL capframe.ai/install | shShort on time? We'll audit your agents.
The same posture we run on our own tools and on 90+ public servers on the leaderboard, pointed at your stack. I map your agent's tool surface — MCP servers, or your existing OpenAI / Anthropic / LangChain tool definitions — and hand you a report you can act on in five business days.
Agent Security Audit
- ✓Branded OWASP LLM Top 10 / NIST AI RMF / MITRE ATLAS findings report (HTML + PDF)
- ✓Prioritized remediation checklist — what to fix, in what order, and why
- ✓30-minute walkthrough call + a sample deterministic policy you can drop in front of your agents
Guarantee:if the report doesn't surface at least one issue you didn't already know about, you pay nothing.
Open source. Hosted when you need it.
Free
All three modules. Local CLI. Full OWASP / NIST / ATLAS report generator. MIT license.
- ✓All three modules
- ✓Local-first CLI
- ✓Full report generator (HTML + PDF)
- ✓sha256-verified installer
- ✓Run anywhere
Pro
Hosted control plane for AI teams shipping agents at velocity. Currently in private early access — join the waitlist below.
- ✓Hosted dashboard (in build)
- ✓Findings history + cross-scan diffing
- ✓Scheduled scans
- ✓Slack alerts
- ✓Up to 10 agents
Enterprise
On-prem / VPC. SSO, audit logs, signed compliance reports, SLA. Taking a small number of design partners in regulated industries.
- ✓SSO + audit logs
- ✓On-prem / VPC deploy
- ✓Signed compliance reports
- ✓SLA + private Slack channel
- ✓Unlimited agents
What people ask before they install.
- Q.01How is this different from an LLM-as-judge guardrail?
- An LLM judge is another model you have to trust — and another attack surface. Capframe's Guard is a deterministic Python evaluator: the same input always yields the same allow/deny, with no model in the path. That means it's fuzzable, reproducible, and immune to the next jailbreak paper. LLM judges are useful for content classification; they are not safe to put inline at a tool-call boundary.
- Q.02How is this different from prompt-injection scanners or red-team frameworks?
- Scanners tell you about a vulnerability after the fact. Capframe enforces the boundary at the moment of the call — the agent never gets to make a refund it shouldn't, regardless of what the prompt said. Find covers the offline discovery surface; Guard is the runtime backstop the scanner ecosystem doesn't have.
- Q.03Is the runtime fast enough for production?
- Yes. Guard is a deterministic rule engine — no model inference and no network hop in the decision path, so a decision is a pure function of (policy, call). (Raw capability verification, in the Rust Bind layer, runs in single-digit microseconds.) Drop Guard inline at the tool-call boundary and measure it on your own traffic.
- Q.04Does my agent data leave my environment?
- No. It's local-first — Find, Bind, Guard, and Report all run on your laptop, your CI, or your inference host (Find and Bind as Rust binaries, Guard as a pip-installable Python layer). The Pro / Enterprise hosted control plane is opt-in and stores only the metadata you choose to sync.
- Q.05Does this only work with MCP?
- Today, yes — Capframe is built around the Model Context Protocol. That covers Claude Desktop, Cursor, Continue, Cline, LangGraph via the MCP bridge, and most agentic Rust/Python frameworks. Native adapters for OpenAI function calling and Anthropic tool use are on the roadmap; the policy file stays the same.
- Q.06Why three separate modules?
- Different teams adopt them at different speeds. Security teams typically start with Find to baseline their agent surface. AI engineers usually start with Guard because it solves the immediate runtime problem. The capability-token layer (Bind) is for teams ready to commit to a permission model. The findings.v1 schema ties them together when you're ready.
- Q.07Why open source?
- Security infrastructure you can't read isn't trustworthy — and AI-agent enforcement is too new a category to be locked behind a closed binary. The code your boundary depends on should be inspectable, fuzzable, and forkable. MIT licensed, every line.
- Q.08What does the auditor actually receive?
- An HTML or PDF report (capframe report) with: the findings table mapped to OWASP LLM Top 10 and MITRE ATLAS techniques, the active policy file with its hash, the signed receipt count by verdict, and the NIST AI RMF function coverage matrix. Timestamped and signature-verifiable end to end.