@agent-infra/browser-search NPM

@agent-infra/browser-search

A tiny stealth-mode web search and content extraction library built on top of Puppeteer, inspired by EGOIST's local-web-search.

Features

🔍 Multi-Engine Search - Support for Google, Bing, and Baidu search engines
📑 Content Extraction - Extract readable content from web pages using Readability
🚀 Concurrent Processing - Built-in queue system for efficient parallel processing
🛡️ Stealth Mode - Advanced browser fingerprint spoofing
📝 Markdown Output - Automatically converts HTML content to clean Markdown
⚡ Performance Optimized - Smart request interception for faster page loads

Installation

npm install @agent-infra/browser-search
# or
yarn add @agent-infra/browser-search
# or
pnpm add @agent-infra/browser-search

Quick Start

import { BrowserSearch } from '@agent-infra/browser-search';
import { ConsoleLogger } from '@agent-infra/logger';

// Create a logger (optional)
const logger = new ConsoleLogger('[BrowserSearch]');

// Initialize the search client
const browserSearch = new BrowserSearch({
  logger,
  browserOptions: {
    headless: true,
  },
});

// Perform a search
const results = await browserSearch.perform({
  query: 'climate change solutions',
  count: 5,
});

console.log(`Found ${results.length} results`);
results.forEach((result) => {
  console.log(`Title: ${result.title}`);
  console.log(`URL: ${result.url}`);
  console.log(`Content preview: ${result.content.substring(0, 150)}...`);
});

API Reference

BrowserSearch

Constructor

constructor(config?: BrowserSearchConfig)

Configuration options:

interface BrowserSearchConfig {
  logger?: Logger; // Custom logger
  browser?: BrowserInterface; // Custom browser instance
}

perform(options)

Performs a search and extracts content from result pages.

async perform(options: BrowserSearchOptions): Promise<SearchResult[]>

Search options:

interface BrowserSearchOptions {
  query: string | string[]; // Search query or array of queries
  count?: number; // Maximum results to fetch
  concurrency?: number; // Concurrent requests (default: 15)
  excludeDomains?: string[]; // Domains to exclude
  truncate?: number; // Truncate content length
  browserOptions?: {
    headless?: boolean; // Run in headless mode
    proxy?: string; // Proxy server
    executablePath?: string; // Custom browser path
    profilePath?: string; // Browser profile path
  };
}

Response Type

interface SearchResult {
  title: string; // Page title
  url: string; // Page URL
  content: string; // Extracted content in Markdown format
}

Advanced Usage

Multiple Queries

const results = await browserSearch.perform({
  query: ['renewable energy', 'solar power technology'],
  count: 10, // Will fetch approximately 5 results per query
  concurrency: 5,
});

Domain Exclusion

const results = await browserSearch.perform({
  query: 'artificial intelligence',
  excludeDomains: ['reddit.com', 'twitter.com', 'youtube.com'],
  count: 10,
});

Content Truncation

const results = await browserSearch.perform({
  query: 'machine learning',
  count: 5,
  truncate: 1000, // Limit content to 1000 characters
});

Using with Custom Browser Instance

import { ChromeBrowser } from '@agent-infra/browser';
import { BrowserSearch } from '@agent-infra/browser-search';

const browser = new ChromeBrowser({
  // Custom browser configuration
});

const browserSearch = new BrowserSearch({
  browser,
});

const results = await browserSearch.perform({
  query: 'typescript best practices',
});

Examples

See examples.

Credits

Thanks to:

EGOIST for creating a great AI chatbot product ChatWise from which we draw a lot of inspiration for local-browser based search.
The puppeteer project which helps us operate the browser better.

License

Licensed under the Apache License, Version 2.0.

@agent-infra/browser @agent-infra/logger @agent-infra/shared

@agent-tars/core @agent-infra/search @multimodal/deep-research-agent

9 months ago

9 months ago

9 months ago

9 months ago

9 months ago

10 months ago

11 months ago