0.0.3 • Published 10 months ago
@agent-infra/browser-search v0.0.3
@agent-infra/browser-search
English | 简体中文
A tiny stealth-mode web search and content extraction library built on top of Puppeteer, inspired by EGOIST's local-web-search.
Features
- 🔍 Multi-Engine Search - Support for Google, Bing, and Baidu search engines
- 📑 Content Extraction - Extract readable content from web pages using Readability
- 🚀 Concurrent Processing - Built-in queue system for efficient parallel processing
- 🛡️ Stealth Mode - Advanced browser fingerprint spoofing
- 📝 Markdown Output - Automatically converts HTML content to clean Markdown
- ⚡ Performance Optimized - Smart request interception for faster page loads
Installation
npm install @agent-infra/browser-search
# or
yarn add @agent-infra/browser-search
# or
pnpm add @agent-infra/browser-searchQuick Start
import { BrowserSearch } from '@agent-infra/browser-search';
import { ConsoleLogger } from '@agent-infra/logger';
// Create a logger (optional)
const logger = new ConsoleLogger('[BrowserSearch]');
// Initialize the search client
const browserSearch = new BrowserSearch({
logger,
browserOptions: {
headless: true,
},
});
// Perform a search
const results = await browserSearch.perform({
query: 'climate change solutions',
count: 5,
});
console.log(`Found ${results.length} results`);
results.forEach((result) => {
console.log(`Title: ${result.title}`);
console.log(`URL: ${result.url}`);
console.log(`Content preview: ${result.content.substring(0, 150)}...`);
});API Reference
BrowserSearch
Constructor
constructor(config?: BrowserSearchConfig)Configuration options:
interface BrowserSearchConfig {
logger?: Logger; // Custom logger
browser?: BrowserInterface; // Custom browser instance
}perform(options)
Performs a search and extracts content from result pages.
async perform(options: BrowserSearchOptions): Promise<SearchResult[]>Search options:
interface BrowserSearchOptions {
query: string | string[]; // Search query or array of queries
count?: number; // Maximum results to fetch
concurrency?: number; // Concurrent requests (default: 15)
excludeDomains?: string[]; // Domains to exclude
truncate?: number; // Truncate content length
browserOptions?: {
headless?: boolean; // Run in headless mode
proxy?: string; // Proxy server
executablePath?: string; // Custom browser path
profilePath?: string; // Browser profile path
};
}Response Type
interface SearchResult {
title: string; // Page title
url: string; // Page URL
content: string; // Extracted content in Markdown format
}Advanced Usage
Multiple Queries
const results = await browserSearch.perform({
query: ['renewable energy', 'solar power technology'],
count: 10, // Will fetch approximately 5 results per query
concurrency: 5,
});Domain Exclusion
const results = await browserSearch.perform({
query: 'artificial intelligence',
excludeDomains: ['reddit.com', 'twitter.com', 'youtube.com'],
count: 10,
});Content Truncation
const results = await browserSearch.perform({
query: 'machine learning',
count: 5,
truncate: 1000, // Limit content to 1000 characters
});Using with Custom Browser Instance
import { ChromeBrowser } from '@agent-infra/browser';
import { BrowserSearch } from '@agent-infra/browser-search';
const browser = new ChromeBrowser({
// Custom browser configuration
});
const browserSearch = new BrowserSearch({
browser,
});
const results = await browserSearch.perform({
query: 'typescript best practices',
});Examples
See examples.
Credits
Thanks to:
- EGOIST for creating a great AI chatbot product ChatWise from which we draw a lot of inspiration for local-browser based search.
- The puppeteer project which helps us operate the browser better.
License
Copyright (c) 2025 ByteDance, Inc. and its affiliates.
Licensed under the Apache License, Version 2.0.
0.0.3
10 months ago
0.0.3-beta.3
10 months ago
0.0.3-beta.2
10 months ago
0.0.3-beta.1
10 months ago
0.0.2
10 months ago
0.0.2-beta.0
11 months ago
0.0.1
12 months ago