firecrawl_parse

on npm:firecrawl-mcp@3.20.2

Severity

critical0

high0

medium3

low0

info0

3 findings on this tool

mediumunconstrained inputf-r1-firecrawl_parse
Tool `firecrawl_parse` accepts unconstrained string input
The following string parameter(s) have no `maxLength` constraint: `contentType`, `filePath`. Unbounded strings let an attacker stuff arbitrary payloads through the tool, including indirect-injection content.
fix: Add a `maxLength` to each string property, or constrain with an `enum` or `pattern`. Most legitimate tool inputs fit under a few hundred bytes.
OWASP LLM01NIST MEASURE-2.3ATLAS T0051CAST-03
mediumexcessive agencyf-r5-firecrawl_parse
Tool `firecrawl_parse` description mentions money but no `money` side-effect is declared
Description: " Parse a file from the local filesystem using a self-hosted Firecrawl API's /v2/parse endpoint. This is the fastest and most reliable way to extract content from a document on disk — if the file lives locally and the MCP is pointed at a self-hosted Firecrawl instance, you should always prefer this tool over uploading the file elsewhere and then scraping it. **Best for:** Extracting content from a local document (PDF, Word, Excel, HTML, etc.) when you don't want to host it on the public web first; pulling structured data out of a file with JSON format; converting binary documents into markdown for downstream reasoning. **Not recommended for:** Remote URLs (use firecrawl_scrape); multiple files at once (call parse multiple times); documents that require interactive actions, screenshots, or change tracking — those aren't supported by the parse endpoint. **Common mistakes:** Passing a URL instead of a local file path; requesting an unsupported format (screenshot, branding, changeTracking); setting waitFor, location, mobile, or a non-basic/auto proxy — parse uploads reject all of those. **Supported file types:** .html, .htm, .xhtml, .pdf, .docx, .doc, .odt, .rtf, .xlsx, .xls **Unsupported options:** actions, screenshot/branding/changeTracking formats, waitFor > 0, location, mobile, proxy values other than "auto" or "basic". **Privacy:** Set `redactPII: true` to return content with personally identifiable information redacted. **CRITICAL - Format Selection (same rules as firecrawl_scrape):** When the user asks for SPECIFIC data points from a document, you MUST use JSON format with a schema. Only use markdown when the user needs the ENTIRE document content. **Use JSON format when the user asks for:** - Specific fields, parameters, or values from a form / PDF / spreadsheet - Prices, numbers, or other structured data - Lists of items or properties **Use markdown format when:** - User wants to read, summarize, or analyze the full document - User explicitly asks for the complete content **Handling PDFs:** Add `"parsers": ["pdf"]` (optionally with `pdfOptions.maxPages`) when parsing a PDF so the PDF engine is invoked explicitly. For very long documents, cap `maxPages` to keep the response within token limits. **Usage Example (markdown from a local PDF):** ```json { "name": "firecrawl_parse", "arguments": { "filePath": "/absolute/path/to/document.pdf", "formats": ["markdown"], "parsers": ["pdf"], "onlyMainContent": true } } ``` **Usage Example (structured JSON extraction from a local HTML file):** ```json { "name": "firecrawl_parse", "arguments": { "filePath": "./invoice.html", "formats": ["json"], "jsonOptions": { "prompt": "Extract the invoice number, total, and line items", "schema": { "type": "object", "properties": { "invoiceNumber": { "type": "string" }, "total": { "type": "number" }, "lineItems": { "type": "array", "items": { "type": "object", "properties": { "description": { "type": "string" }, "amount": { "type": "number" } } } } } } } } } ``` **Returns:** A parsed document with markdown, html, links, summary, json, or query results depending on the requested formats. " -- this references money/payment/refund/etc., but the declared side_effects ([]) don't include `money`. A capframe-bind policy that relies on declared side_effects to scope spend caveats will under-scope this tool.
fix: Add `money` to the tool's `side_effects` declaration, or rewrite the description to clarify that no actual money moves.
OWASP LLM08NIST MEASURE-2.6ATLAS T0040CAST-01
mediumindirect injectionf-r6-firecrawl_parse
Tool `firecrawl_parse` fetches external web content -- indirect-injection surface
Description: " Parse a file from the local filesystem using a self-hosted Firecrawl API's /v2/parse endpoint. This is the fastest and most reliable way to extract content from a document on disk — if the file lives locally and the MCP is pointed at a self-hosted Firecrawl instance, you should always prefer this tool over uploading the file elsewhere and then scraping it. **Best for:** Extracting content from a local document (PDF, Word, Excel, HTML, etc.) when you don't want to host it on the public web first; pulling structured data out of a file with JSON format; converting binary documents into markdown for downstream reasoning. **Not recommended for:** Remote URLs (use firecrawl_scrape); multiple files at once (call parse multiple times); documents that require interactive actions, screenshots, or change tracking — those aren't supported by the parse endpoint. **Common mistakes:** Passing a URL instead of a local file path; requesting an unsupported format (screenshot, branding, changeTracking); setting waitFor, location, mobile, or a non-basic/auto proxy — parse uploads reject all of those. **Supported file types:** .html, .htm, .xhtml, .pdf, .docx, .doc, .odt, .rtf, .xlsx, .xls **Unsupported options:** actions, screenshot/branding/changeTracking formats, waitFor > 0, location, mobile, proxy values other than "auto" or "basic". **Privacy:** Set `redactPII: true` to return content with personally identifiable information redacted. **CRITICAL - Format Selection (same rules as firecrawl_scrape):** When the user asks for SPECIFIC data points from a document, you MUST use JSON format with a schema. Only use markdown when the user needs the ENTIRE document content. **Use JSON format when the user asks for:** - Specific fields, parameters, or values from a form / PDF / spreadsheet - Prices, numbers, or other structured data - Lists of items or properties **Use markdown format when:** - User wants to read, summarize, or analyze the full document - User explicitly asks for the complete content **Handling PDFs:** Add `"parsers": ["pdf"]` (optionally with `pdfOptions.maxPages`) when parsing a PDF so the PDF engine is invoked explicitly. For very long documents, cap `maxPages` to keep the response within token limits. **Usage Example (markdown from a local PDF):** ```json { "name": "firecrawl_parse", "arguments": { "filePath": "/absolute/path/to/document.pdf", "formats": ["markdown"], "parsers": ["pdf"], "onlyMainContent": true } } ``` **Usage Example (structured JSON extraction from a local HTML file):** ```json { "name": "firecrawl_parse", "arguments": { "filePath": "./invoice.html", "formats": ["json"], "jsonOptions": { "prompt": "Extract the invoice number, total, and line items", "schema": { "type": "object", "properties": { "invoiceNumber": { "type": "string" }, "total": { "type": "number" }, "lineItems": { "type": "array", "items": { "type": "object", "properties": { "description": { "type": "string" }, "amount": { "type": "number" } } } } } } } } } ``` **Returns:** A parsed document with markdown, html, links, summary, json, or query results depending on the requested formats. " -- this tool pulls externally-controlled content into the agent's context window, the canonical indirect-injection vector. Even when the user supplies the URL, content at that URL can carry hostile instructions.
fix: Sandbox the fetched content: strip prompts before forwarding to the model, constrain to an allow-list of domains, and route through capframe-guard with a `domain in [...]` caveat.
OWASP LLM01NIST MEASURE-2.3ATLAS T0051CAST-02

About this tool

firecrawl_parse is one of 20 tools exposed by Firecrawl MCP. The server scored 0/100 overall against the capframe rule engine (source: sandbox). Last scanned 2026-07-20.

The findings above are emitted by the public capframe.findings.v1 schema. Disagree with one? Open an issue.