v0.2.0 · live
CAPFRAME
← leaderboard/Firecrawl MCP/tool · firecrawl_extract
§ toolsandboxFirecrawl MCP

firecrawl_extract

on npm:firecrawl-mcp@3.20.2

Severity

critical0
high0
medium2
low0
info0

2 findings on this tool

  1. mediumunconstrained inputf-r1-firecrawl_extract

    Tool `firecrawl_extract` accepts unconstrained string input

    The following string parameter(s) have no `maxLength` constraint: `prompt`. Unbounded strings let an attacker stuff arbitrary payloads through the tool, including indirect-injection content.

    fix: Add a `maxLength` to each string property, or constrain with an `enum` or `pattern`. Most legitimate tool inputs fit under a few hundred bytes.

    OWASP LLM01NIST MEASURE-2.3ATLAS T0051
  2. mediumindirect injectionf-r6-firecrawl_extract

    Tool `firecrawl_extract` fetches external web content -- indirect-injection surface

    Description: " Extract structured information from web pages using LLM capabilities. Supports both cloud AI and self-hosted LLM extraction. **Best for:** Extracting specific structured data like prices, names, details from web pages. **Not recommended for:** When you need the full content of a page (use scrape); when you're not looking for specific structured data. **Arguments:** - urls: Array of URLs to extract information from - prompt: Custom prompt for the LLM extraction - schema: JSON schema for structured data extraction - allowExternalLinks: Allow extraction from external links - enableWebSearch: Enable web search for additional context - includeSubdomains: Include subdomains in extraction **Prompt Example:** "Extract the product name, price, and description from these product pages." **Usage Example:** ```json { "name": "firecrawl_extract", "arguments": { "urls": ["https://example.com/page1", "https://example.com/page2"], "prompt": "Extract product information including name, price, and description", "schema": { "type": "object", "properties": { "name": { "type": "string" }, "price": { "type": "number" }, "description": { "type": "string" } }, "required": ["name", "price"] }, "allowExternalLinks": false, "enableWebSearch": false, "includeSubdomains": false } } ``` **Returns:** Extracted structured data as defined by your schema. " -- this tool pulls externally-controlled content into the agent's context window, the canonical indirect-injection vector. Even when the user supplies the URL, content at that URL can carry hostile instructions.

    fix: Sandbox the fetched content: strip prompts before forwarding to the model, constrain to an allow-list of domains, and route through capframe-guard with a `domain in [...]` caveat.

    OWASP LLM01NIST MEASURE-2.3ATLAS T0051

About this tool

firecrawl_extract is one of 20 tools exposed by Firecrawl MCP. The server scored 0/100 overall against the capframe rule engine (source: sandbox). Last scanned 2026-06-05.

The findings above are emitted by the public capframe.findings.v1 schema. Disagree with one? Open an issue.