v0.2.0 · live
Ccapframe
← All posts
· security· schema· ai-agents

Why we shipped findings.v1 as a public JSON Schema

AI agent security tools each emit findings in their own shape. We think they shouldn't. Here's the wire format we're proposing as a starting point — and why we put it in the repo before writing the scanner.

When we started Capframe, the first artifact we committed wasn't a scanner. It wasn't a runtime sentry. It wasn't even documentation. It was a JSON Schema named findings.v1.json.

That feels backwards. Tools usually ship code first and a spec later, once the shape is forced into existence by integration pain. We did it the other way around on purpose. This post is the why.

The gap we're trying to fill

If you ship a SAST (static application security testing) tool today, you emit SARIF. It's a standardised JSON envelope that every code-scanning tool — Semgrep, CodeQL, Snyk, Bandit — knows how to read and write. Your CI can run five different scanners and a downstream dashboard can show their findings in one view, because everyone agreed on the wire format.

If you ship an AI agent security tool today, there is no SARIF. Every project picks its own shape. We looked at the half-dozen tools in the space and saw six different envelopes for the same fundamental object: "here's a thing we found about this tool surface, with a severity, a category, and some context."

The downstream cost of that fragmentation is real. A buyer running three of these tools in parallel ends up writing three glue scripts to normalise findings before they can be ranked, deduped, or routed to a ticket queue. Every new tool that enters the space pays the same tax. And the tools themselves can't compose — one tool's discovery output can't drive another tool's policy synthesizer, because the schemas don't line up.

We don't think this is a permanent state of affairs. We think one of the existing tools is going to define the format that wins, the way SARIF did, and from then on everyone else implements it. We'd rather that format be designed openly than emerge by accident from whichever vendor ships first and biggest.

So we wrote one. It's at schemas/findings.v1.json in the repo and at capframe.ai/schema as a stable URL. MIT-licensed. PRs welcome.

What's in it

The full schema is short enough to read top-to-bottom. The shape roughly mirrors SARIF's design intuition (envelope → results, each result is a flat record with classification metadata) but is much narrower because the domain is narrower.

A findings.v1 document is an envelope:

{
  "schema_version": "capframe.findings.v1",
  "scanned_at":     "2026-05-20T14:32:00Z",
  "scan_id":        "0c81b6d2-7a91-4c52-9c1e-aa8e90e3f6b1",
  "scanner":        { "name": "mcp-recon", "version": "0.0.4" },
  "target":         { "kind": "mcp_server", "name": "shopify-mcp" },
  "tools":          [ ...inventory of discovered tools... ],
  "findings":       [ ...one entry per detected issue... ],
  "summary":        { "total": 6, "by_severity": {...} }
}

The interesting design decisions live in the four sub-shapes: Tool, Finding, Mappings, and the closed-set enums.

Tool — a tool surface entry

Each tool gets a record with name, optional description, optional parameters (a JSON Schema), a closed-set list of declared side_effects (read / write / network / filesystem / execute / money / irreversible), and optional auth_required / rate_limited booleans.

We deliberately kept the side-effect taxonomy small. Seven options is enough to express "this tool moves money" or "this tool can delete things" without forcing tool authors into a 47-bucket ontology. The cost is some loss of precision — database.update and s3.put both flatten to write — but that's a deliberate tradeoff for adoption ergonomics. A taxonomy nobody uses is worse than a coarse one everyone uses correctly.

Finding — one detected issue

A Finding has an id (stable hash for diffing across scans), a severity (info / low / medium / high / critical), a category from a closed set of 13 issue classes (e.g. indirect_injection, excessive_agency, unconstrained_input, missing_authz, secret_exposure, ssrf_surface, …), a title, optional description / tool / evidence / remediation, and the compliance-framework mappings discussed below.

The closed-set category enum is load-bearing. Every category maps to a known threat-model class. If your tool surfaces something that doesn't fit, you can use other — but if other becomes more than a single-digit percentage of your output, we want to know about it, because that's a signal the taxonomy is wrong rather than your output is.

Mappings — the compliance-framework receipts

This is the part the regulated buyers care about. Each finding can carry a mappings object with three optional arrays:

"mappings": {
  "owasp_llm":   ["LLM01", "LLM08"],
  "nist_rmf":    ["MEASURE-2.3", "MANAGE-2.2"],
  "mitre_atlas": ["T0051"]
}

The IDs are regex-validated at the schema level. owasp_llm accepts ^LLM(0[1-9]|10)$ — so LLM99 is invalid, not because the validator guesses semantically but because it's not a real OWASP LLM Top 10 ID. Same for nist_rmf (must match ^(GOVERN|MAP|MEASURE|MANAGE)-[0-9]+(\.[0-9]+)*$) and mitre_atlas (must be a T#### or T####.###).

That regex enforcement matters. The whole "audit-ready" pitch falls apart if half the findings reference LLM6 or Manage-2 or prompt-injection instead of the real IDs. Schemas without validation are documentation, not contracts.

Closed-set everything

Both severity and category are tight enums with no additionalProperties. Severity is a info|low|medium|high|critical ladder. Category is a 13-class enum. additionalProperties: false is set on every nested object. The schema is intentionally rigid.

That's not a fashion choice — it's the whole point of having a schema. Findings that don't validate against the schema are findings nobody downstream can reliably route, dedupe, or audit. We'd rather fail loud at the producer than silently degrade at the consumer.

Tradeoffs we made

Three design choices we're least sure about, in case any of you have a better answer:

1. Flat findings, no nesting. We considered letting findings nest (a parent excessive_agency finding with child missing_cap, unbounded_param findings). Decided against because it complicates deduplication and reporting, and the cases where nesting would help seem rare. SARIF's result.relatedLocations is the same call — flat results, with optional references between them. We went a step further and dropped relations entirely.

2. evidence is a free-form object. We had a moment of wanting to schematize the evidence field per category (so a secret_exposure finding has { "matched_pattern": ..., "byte_offset": ... } and an indirect_injection finding has different keys). We backed off — that's a massive scope expansion, and most consumers won't read evidence anyway. Today evidence is {"type": "object"}. If this becomes a real problem we'll versionize to findings.v2.

3. No support for differential findings. A real scanner wants to emit "this finding is new since the last scan" or "this finding was previously suppressed." We left that to the consumer: first_seen and last_seen are RFC3339 datetimes on each finding, which is enough to compute the diff externally. Building it into the schema would have meant building a state model, and we wanted v1 to stay stateless.

What we're hoping happens

The honest goal: we want one or two other tools in the space to either (a) adopt this format as-is, or (b) push back with concrete proposals for what's missing.

Adoption-as-is is the better outcome for everyone. If three tools emit findings.v1.json, a downstream consumer writes one parser instead of three, and the format de-facto solidifies.

Pushback is the second-best outcome. If we got something wrong — the taxonomy is incomplete, the regex patterns are too strict, the evidence shape needs constraint — file an issue and we'll co-evolve toward findings.v1.1 or, if it's breaking, v2. The schema is versioned by exact string match in the schema_version field, so we can ship both.

The thing we don't want is silent fragmentation. If you're working on a tool in this space and you've already picked your own format — please at least take a look at this one before you let yours ossify. If we got it right, great. If we got it wrong, even better — we'd rather find out now than two years and three breaking changes from now.

How to engage

  • Read the schema: capframe.ai/schema
  • Read the example payload — a realistic Shopify-MCP inventory with one excessive_agency finding.
  • Browse the conformance tests: five tests today, including round-trip validation and explicit rejection of malformed framework IDs.
  • Open a PR or an issue: github.com/capframe/capframe — the findings.v1 schema lives at schemas/findings.v1.json. PRs welcome.
  • Email us: hello@capframe.ai if you'd rather discuss before opening anything public.

The schema is MIT-licensed. Implement it in your tool. Bring concrete disagreements. The format only becomes a standard if more than one project implements it — until then it's just our wire format with ambitious naming.