1.0.3 • Published 7 months ago

@danger-dream/web-crawler-mcp v1.0.3

Weekly downloads
-
License
MIT
Repository
-
Last release
7 months ago

Web Crawler MCP Service

The web-crawler of LobeChat is very useful, and it has been extracted to create MCP.

Features

Tools

searchWithSearXNG

  • Provides powerful search functionality through the SearXNG meta search engine
  • Supports multiple search engines: Google, Bing, DuckDuckGo, Bilibili, etc.
  • Returns structured search results
  • Customizable search engine selection

crawlSinglePage

  • Extracts content from web pages, optimized for LLM consumption
  • Multiple web content retrieval methods for reliability
  • Automatically extracts webpage titles and main content
  • Intelligent error handling and failover mechanisms

crawlMultiPages

  • Crawls multiple web pages simultaneously
  • Parallel processing for improved efficiency
  • Shares the same features as single page crawling
  • Returns merged structured data

Crawling Implementation

The service uses multiple crawling methods, tried in priority order:

  1. Naive: Basic crawling implementation, directly fetches web page content
  2. Jina: Uses Jina AI's web reader API
  3. Search1API: Uses the Search1API service
  4. Browserless: Uses the Browserless.io service for browser rendering

Setup

Prerequisites

You'll need API keys for the following services to fully utilize this service:

  • SearXNG search engine instance
  • Jina AI API key (optional)
  • Search1API key (optional)
  • Browserless token (optional)

Installation

Method 1: NPX (Recommended)

npx -y @danger-dream/web-crawler-mcp

configuration:

{
  "mcpServers": {
    "deepsearch": {
      "command": "npx",
      "args": ["-y", "@danger-dream/web-crawler-mcp"],
      "env": {
        "SEARXNG_BASE_URL": "<Your SearXNG Instance URL>",
        "JINA_READER_API_KEY": "<Your JINA Key>",
        "BROWSERLESS_TOKEN": "<Your BROWSERLESS Token>",
        "SEARCH1API_API_KEY": "<Your SEARCH1API Key>"
      }
    }
  }
}

Environment Variables

The service supports the following environment variables or command line parameters:

  • SEARXNG_BASE_URL: SearXNG search engine base URL (default: http://localhost:8080)
  • JINA_READER_API_KEY: Jina Reader API key
  • BROWSERLESS_URL: Browserless service URL (default: https://chrome.browserless.io)
  • BROWSERLESS_TOKEN: Browserless service token
  • SEARCH1API_API_KEY: Search1API service API key

License

MIT