0.0.3 • Published 5 months ago
@agent-infra/browser-search v0.0.3
@agent-infra/browser-search
English | 简体中文
A tiny stealth-mode web search and content extraction library built on top of Puppeteer, inspired by EGOIST's local-web-search.
Features
- 🔍 Multi-Engine Search - Support for Google, Bing, and Baidu search engines
- 📑 Content Extraction - Extract readable content from web pages using Readability
- 🚀 Concurrent Processing - Built-in queue system for efficient parallel processing
- 🛡️ Stealth Mode - Advanced browser fingerprint spoofing
- 📝 Markdown Output - Automatically converts HTML content to clean Markdown
- ⚡ Performance Optimized - Smart request interception for faster page loads
Installation
npm install @agent-infra/browser-search
# or
yarn add @agent-infra/browser-search
# or
pnpm add @agent-infra/browser-search
Quick Start
import { BrowserSearch } from '@agent-infra/browser-search';
import { ConsoleLogger } from '@agent-infra/logger';
// Create a logger (optional)
const logger = new ConsoleLogger('[BrowserSearch]');
// Initialize the search client
const browserSearch = new BrowserSearch({
logger,
browserOptions: {
headless: true,
},
});
// Perform a search
const results = await browserSearch.perform({
query: 'climate change solutions',
count: 5,
});
console.log(`Found ${results.length} results`);
results.forEach((result) => {
console.log(`Title: ${result.title}`);
console.log(`URL: ${result.url}`);
console.log(`Content preview: ${result.content.substring(0, 150)}...`);
});
API Reference
BrowserSearch
Constructor
constructor(config?: BrowserSearchConfig)
Configuration options:
interface BrowserSearchConfig {
logger?: Logger; // Custom logger
browser?: BrowserInterface; // Custom browser instance
}
perform(options)
Performs a search and extracts content from result pages.
async perform(options: BrowserSearchOptions): Promise<SearchResult[]>
Search options:
interface BrowserSearchOptions {
query: string | string[]; // Search query or array of queries
count?: number; // Maximum results to fetch
concurrency?: number; // Concurrent requests (default: 15)
excludeDomains?: string[]; // Domains to exclude
truncate?: number; // Truncate content length
browserOptions?: {
headless?: boolean; // Run in headless mode
proxy?: string; // Proxy server
executablePath?: string; // Custom browser path
profilePath?: string; // Browser profile path
};
}
Response Type
interface SearchResult {
title: string; // Page title
url: string; // Page URL
content: string; // Extracted content in Markdown format
}
Advanced Usage
Multiple Queries
const results = await browserSearch.perform({
query: ['renewable energy', 'solar power technology'],
count: 10, // Will fetch approximately 5 results per query
concurrency: 5,
});
Domain Exclusion
const results = await browserSearch.perform({
query: 'artificial intelligence',
excludeDomains: ['reddit.com', 'twitter.com', 'youtube.com'],
count: 10,
});
Content Truncation
const results = await browserSearch.perform({
query: 'machine learning',
count: 5,
truncate: 1000, // Limit content to 1000 characters
});
Using with Custom Browser Instance
import { ChromeBrowser } from '@agent-infra/browser';
import { BrowserSearch } from '@agent-infra/browser-search';
const browser = new ChromeBrowser({
// Custom browser configuration
});
const browserSearch = new BrowserSearch({
browser,
});
const results = await browserSearch.perform({
query: 'typescript best practices',
});
Examples
See examples.
Credits
Thanks to:
- EGOIST for creating a great AI chatbot product ChatWise from which we draw a lot of inspiration for local-browser based search.
- The puppeteer project which helps us operate the browser better.
License
Copyright (c) 2025 ByteDance, Inc. and its affiliates.
Licensed under the Apache License, Version 2.0.
0.0.3
5 months ago
0.0.3-beta.3
5 months ago
0.0.3-beta.2
5 months ago
0.0.3-beta.1
5 months ago
0.0.2
5 months ago
0.0.2-beta.0
6 months ago
0.0.1
7 months ago