v0.2.0 · live
CAPFRAME
← leaderboard/Firecrawl MCP/tool · firecrawl_parse
§ toolsandboxFirecrawl MCP

firecrawl_parse

on npm:firecrawl-mcp@3.20.2

Severity

critical0
high0
medium3
low0
info0

3 findings on this tool

  1. mediumunconstrained inputf-r1-firecrawl_parse

    Tool `firecrawl_parse` accepts unconstrained string input

    The following string parameter(s) have no `maxLength` constraint: `contentType`, `filePath`, `proxy`. Unbounded strings let an attacker stuff arbitrary payloads through the tool, including indirect-injection content.

    fix: Add a `maxLength` to each string property, or constrain with an `enum` or `pattern`. Most legitimate tool inputs fit under a few hundred bytes.

    OWASP LLM01NIST MEASURE-2.3ATLAS T0051
  2. mediumexcessive agencyf-r5-firecrawl_parse

    Tool `firecrawl_parse` description mentions money but no `money` side-effect is declared

    Description: " Parse a file from the local filesystem using a self-hosted Firecrawl API's /v2/parse endpoint. This is the fastest and most reliable way to extract content from a document on disk — if the file lives locally and the MCP is pointed at a self-hosted Firecrawl instance, you should always prefer this tool over uploading the file elsewhere and then scraping it. **Best for:** Extracting content from a local document (PDF, Word, Excel, HTML, etc.) when you don't want to host it on the public web first; pulling structured data out of a file with JSON format; converting binary documents into markdown for downstream reasoning. **Not recommended for:** Remote URLs (use firecrawl_scrape); multiple files at once (call parse multiple times); documents that require interactive actions, screenshots, or change tracking — those aren't supported by the parse endpoint. **Common mistakes:** Passing a URL instead of a local file path; requesting an unsupported format (screenshot, branding, changeTracking); setting waitFor, location, mobile, or a non-basic/auto proxy — parse uploads reject all of those. **Supported file types:** .html, .htm, .xhtml, .pdf, .docx, .doc, .odt, .rtf, .xlsx, .xls **Unsupported options:** actions, screenshot/branding/changeTracking formats, waitFor > 0, location, mobile, proxy values other than "auto" or "basic". **Privacy:** Set `redactPII: true` to return content with personally identifiable information redacted. **CRITICAL - Format Selection (same rules as firecrawl_scrape):** When the user asks for SPECIFIC data points from a document, you MUST use JSON format with a schema. Only use markdown when the user needs the ENTIRE document content. **Use JSON format when the user asks for:** - Specific fields, parameters, or values from a form / PDF / spreadsheet - Prices, numbers, or other structured data - Lists of items or properties **Use markdown format when:** - User wants to read, summarize, or analyze the full document - User explicitly asks for the complete content **Handling PDFs:** Add `"parsers": ["pdf"]` (optionally with `pdfOptions.maxPages`) when parsing a PDF so the PDF engine is invoked explicitly. For very long documents, cap `maxPages` to keep the response within token limits. **Usage Example (markdown from a local PDF):** ```json { "name": "firecrawl_parse", "arguments": { "filePath": "/absolute/path/to/document.pdf", "formats": ["markdown"], "parsers": ["pdf"], "onlyMainContent": true } } ``` **Usage Example (structured JSON extraction from a local HTML file):** ```json { "name": "firecrawl_parse", "arguments": { "filePath": "./invoice.html", "formats": ["json"], "jsonOptions": { "prompt": "Extract the invoice number, total, and line items", "schema": { "type": "object", "properties": { "invoiceNumber": { "type": "string" }, "total": { "type": "number" }, "lineItems": { "type": "array", "items": { "type": "object", "properties": { "description": { "type": "string" }, "amount": { "type": "number" } } } } } } } } } ``` **Returns:** A parsed document with markdown, html, links, summary, json, or query results depending on the requested formats. " -- this references money/payment/refund/etc., but the declared side_effects ([]) don't include `money`. A capframe-bind policy that relies on declared side_effects to scope spend caveats will under-scope this tool.

    fix: Add `money` to the tool's `side_effects` declaration, or rewrite the description to clarify that no actual money moves.

    OWASP LLM08NIST MEASURE-2.6ATLAS T0040
  3. mediumindirect injectionf-r6-firecrawl_parse

    Tool `firecrawl_parse` fetches external web content -- indirect-injection surface

    Description: " Parse a file from the local filesystem using a self-hosted Firecrawl API's /v2/parse endpoint. This is the fastest and most reliable way to extract content from a document on disk — if the file lives locally and the MCP is pointed at a self-hosted Firecrawl instance, you should always prefer this tool over uploading the file elsewhere and then scraping it. **Best for:** Extracting content from a local document (PDF, Word, Excel, HTML, etc.) when you don't want to host it on the public web first; pulling structured data out of a file with JSON format; converting binary documents into markdown for downstream reasoning. **Not recommended for:** Remote URLs (use firecrawl_scrape); multiple files at once (call parse multiple times); documents that require interactive actions, screenshots, or change tracking — those aren't supported by the parse endpoint. **Common mistakes:** Passing a URL instead of a local file path; requesting an unsupported format (screenshot, branding, changeTracking); setting waitFor, location, mobile, or a non-basic/auto proxy — parse uploads reject all of those. **Supported file types:** .html, .htm, .xhtml, .pdf, .docx, .doc, .odt, .rtf, .xlsx, .xls **Unsupported options:** actions, screenshot/branding/changeTracking formats, waitFor > 0, location, mobile, proxy values other than "auto" or "basic". **Privacy:** Set `redactPII: true` to return content with personally identifiable information redacted. **CRITICAL - Format Selection (same rules as firecrawl_scrape):** When the user asks for SPECIFIC data points from a document, you MUST use JSON format with a schema. Only use markdown when the user needs the ENTIRE document content. **Use JSON format when the user asks for:** - Specific fields, parameters, or values from a form / PDF / spreadsheet - Prices, numbers, or other structured data - Lists of items or properties **Use markdown format when:** - User wants to read, summarize, or analyze the full document - User explicitly asks for the complete content **Handling PDFs:** Add `"parsers": ["pdf"]` (optionally with `pdfOptions.maxPages`) when parsing a PDF so the PDF engine is invoked explicitly. For very long documents, cap `maxPages` to keep the response within token limits. **Usage Example (markdown from a local PDF):** ```json { "name": "firecrawl_parse", "arguments": { "filePath": "/absolute/path/to/document.pdf", "formats": ["markdown"], "parsers": ["pdf"], "onlyMainContent": true } } ``` **Usage Example (structured JSON extraction from a local HTML file):** ```json { "name": "firecrawl_parse", "arguments": { "filePath": "./invoice.html", "formats": ["json"], "jsonOptions": { "prompt": "Extract the invoice number, total, and line items", "schema": { "type": "object", "properties": { "invoiceNumber": { "type": "string" }, "total": { "type": "number" }, "lineItems": { "type": "array", "items": { "type": "object", "properties": { "description": { "type": "string" }, "amount": { "type": "number" } } } } } } } } } ``` **Returns:** A parsed document with markdown, html, links, summary, json, or query results depending on the requested formats. " -- this tool pulls externally-controlled content into the agent's context window, the canonical indirect-injection vector. Even when the user supplies the URL, content at that URL can carry hostile instructions.

    fix: Sandbox the fetched content: strip prompts before forwarding to the model, constrain to an allow-list of domains, and route through capframe-guard with a `domain in [...]` caveat.

    OWASP LLM01NIST MEASURE-2.3ATLAS T0051

About this tool

firecrawl_parse is one of 20 tools exposed by Firecrawl MCP. The server scored 0/100 overall against the capframe rule engine (source: sandbox). Last scanned 2026-06-05.

The findings above are emitted by the public capframe.findings.v1 schema. Disagree with one? Open an issue.