@r-mcp/docs-extractor NPM

MCP Docs Extractor

A tool that extracts and summarizes documentation from web links for AI consumption.

Features

Extract and summarize documentation from web URLs
Intelligently crawl related pages within the same domain for comprehensive documentation
Convert web content into AI-optimized markdown
Remove unnecessary content like ads, navigation menus, etc.
Produce concise, well-structured documentation
Focus on relevant information based on user query

Installation

# As an MCP server
npm install -g @r-mcp/docs-extractor

# For programmatic usage in your project
npm install @r-mcp/docs-extractor

If you haven't globally defined your OPENAI_API_KEY and FIRECRAWL_API_KEY, you'll need to open the MCP config file and update the keys.

OPENAI_API_KEY=your_openai_api_key
FIRECRAWL_API_KEY=your_firecrawl_api_key

Usage

MCP Tool Usage

This tool is designed to be used with Claude or other AI systems that support MCP.

In Claude, you can extract documentation by calling:

{{mcp_docs-extractor_get-documentation}}

With the parameters:

{
  "links": ["https://example.com/docs"]
}

Programmatic Usage

You can use this tool programmatically in your JavaScript/TypeScript projects in multiple ways:

Direct Function (Recommended)

The simplest approach is to use the default export:

import extractDocumentation from "@r-mcp/docs-extractor";

async function example() {
  try {
    // Extract documentation from URLs
    const documentation = await extractDocumentation({
      links: ["https://example.com/docs"],
      documentationFocus: "API endpoints", // optional
      includeReasoning: false, // optional
    });

    console.log(documentation);
  } catch (error) {
    console.error("Error extracting documentation:", error);
  }
}

example();

Using as a Tool with AI SDK

If you're already using the ai SDK:

import { generateText } from "ai";
import { docExtractorTool } from "@r-mcp/docs-extractor";

const { text } = await generateText({
  model: openai("gpt-4.1"),
  temperature: 0,
  prompt: "Explain the documentation for example.com",
  tools: {
    getDocumentation: docExtractorTool,
  },
  maxSteps: 5,
});

Advanced Options

You can also specify a focus for the documentation:

const documentation = await extractDocumentation({
  links: ["https://example.com/docs"],
  documentationFocus: "API endpoints",
});

To include the reasoning process in the result:

const documentation = await extractDocumentation({
  links: ["https://example.com/docs"],
  includeReasoning: true,
});

How It Works

The tool uses:

FireCrawl to scrape web content
OpenAI's GPT-4.1 to format and optimize the content
MCP to integrate with Claude and other AI systems

When called, the tool:

Receives links to documentation
Uses FireCrawl to retrieve content from those links
Intelligently discovers and crawls related pages within the same domain to gather comprehensive documentation
Processes the content through GPT-4.1 to extract and format relevant information
Returns well-structured documentation in markdown format

License

MIT

mcp documentation ai docs scraping

@ai-sdk/openai @mendable/firecrawl-js @modelcontextprotocol/sdk ai axios dotenv firecrawl zod

6 months ago

7 months ago

7 months ago

7 months ago

7 months ago

7 months ago