1.0.3 • Published 7 months ago
@danger-dream/web-crawler-mcp v1.0.3
Web Crawler MCP Service
The web-crawler of LobeChat is very useful, and it has been extracted to create MCP.
Features
Tools
searchWithSearXNG
- Provides powerful search functionality through the SearXNG meta search engine
- Supports multiple search engines: Google, Bing, DuckDuckGo, Bilibili, etc.
- Returns structured search results
- Customizable search engine selection
crawlSinglePage
- Extracts content from web pages, optimized for LLM consumption
- Multiple web content retrieval methods for reliability
- Automatically extracts webpage titles and main content
- Intelligent error handling and failover mechanisms
crawlMultiPages
- Crawls multiple web pages simultaneously
- Parallel processing for improved efficiency
- Shares the same features as single page crawling
- Returns merged structured data
Crawling Implementation
The service uses multiple crawling methods, tried in priority order:
- Naive: Basic crawling implementation, directly fetches web page content
- Jina: Uses Jina AI's web reader API
- Search1API: Uses the Search1API service
- Browserless: Uses the Browserless.io service for browser rendering
Setup
Prerequisites
You'll need API keys for the following services to fully utilize this service:
- SearXNG search engine instance
- Jina AI API key (optional)
- Search1API key (optional)
- Browserless token (optional)
Installation
Method 1: NPX (Recommended)
npx -y @danger-dream/web-crawler-mcp
configuration:
{
"mcpServers": {
"deepsearch": {
"command": "npx",
"args": ["-y", "@danger-dream/web-crawler-mcp"],
"env": {
"SEARXNG_BASE_URL": "<Your SearXNG Instance URL>",
"JINA_READER_API_KEY": "<Your JINA Key>",
"BROWSERLESS_TOKEN": "<Your BROWSERLESS Token>",
"SEARCH1API_API_KEY": "<Your SEARCH1API Key>"
}
}
}
}
Environment Variables
The service supports the following environment variables or command line parameters:
SEARXNG_BASE_URL
: SearXNG search engine base URL (default:http://localhost:8080
)JINA_READER_API_KEY
: Jina Reader API keyBROWSERLESS_URL
: Browserless service URL (default:https://chrome.browserless.io
)BROWSERLESS_TOKEN
: Browserless service tokenSEARCH1API_API_KEY
: Search1API service API key
License
MIT