1.0.1 • Published 7 months ago

parsera-ts v1.0.1

Weekly downloads
-
License
MIT
Repository
-
Last release
7 months ago

Parsera SDK

Official TypeScript SDK for Parsera API - Extract structured data from any webpage.

Installation

npm install parsera-ts
  1. Visit parsera.org to get your API key with 20 free credits
  2. Follow the installation and usage instructions below

Basic Usage

import { Parsera } from 'parsera-ts';

// Initialize the client
const parsera = new Parsera({
    apiKey: 'your-api-key',
    timeout: 60000, // 60 second timeout
    retryOptions: {
        maxRetries: 3,
        backoffFactor: 2,
        initialDelay: 1000,
    }
});

// Extract data from a webpage
const data = await parsera.extract({
    url: 'https://example.com/products',
    attributes: {
        title: 'Extract the product title',
        price: 'Get the product price',
        description: 'Extract the product description'
    }
});

Features

  • 🚀 Simple and intuitive API
  • 🔄 Automatic retries with exponential backoff
  • 🌍 Global proxy support
  • 🍪 Cookie injection
  • 📊 Event-driven architecture
  • 🎯 Precision mode for accurate extractions
  • ✨ TypeScript support with full type definitions

Advanced Features

Precision Mode

Precision mode provides higher accuracy extractions at the cost of additional credits (10 credits per extraction instead of 1).

const data = await parsera.extract({
    url: 'https://example.com/products',
    attributes: {
        title: 'Extract the product title',
        price: 'Get the product price'
    },
    precisionMode: true // Enable precision mode (10 credits)
});

Cookie Injection

Inject custom cookies for authenticated page access or specific site configurations:

const data = await parsera.extract({
    url: 'https://example.com/private-page',
    attributes: {
        content: 'Extract protected content'
    },
    cookies: [
        {
            name: 'sessionId',
            value: 'abc123',
            domain: '.example.com',
            path: '/',
            sameSite: 'Lax'
        },
        {
            name: 'preferences',
            value: 'theme=dark',
            domain: '.example.com',
            path: '/',
            sameSite: 'Strict'
        }
    ]
});

Proxy Configuration

Configure global or per-request proxy locations:

// Global proxy configuration
const parsera = new Parsera({
    apiKey: 'your-api-key',
    defaultProxyCountry: 'UnitedStates'
});

// Per-request proxy override
const data = await parsera.extract({
    url: 'https://example.co.uk/products',
    attributes: {
        price: 'Get the local price'
    },
    proxyCountry: 'random' // Override for this request only
});

Event Handling

The SDK provides comprehensive event handling for monitoring extraction progress:

// Monitor extraction progress
parsera.on('extract:start', (event) => {
    console.log(`Starting extraction for ${event.url}`);
});

parsera.on('extract:complete', (event) => {
    console.log(`Extraction completed with ${event.data.length} items`);
});

// Monitor request lifecycle
parsera.on('request:start', (event) => {
    console.log(`API request started: ${event.method} ${event.url}`);
});

parsera.on('request:retry', (event) => {
    console.log(`Retrying request (attempt ${event.retryCount})`);
});

parsera.on('rateLimit', (event) => {
    console.log(`Rate limit hit. Reset in ${event.resetTime} seconds`);
});

Request Cancellation

Cancel ongoing requests using AbortController:

const controller = new AbortController();

try {
    const data = await parsera.extract({
        url: 'https://example.com/products',
        attributes: {
            title: 'Extract the product title'
        },
        signal: controller.signal
    });
} catch (error) {
    if (error.name === 'AbortError') {
        console.log('Request was cancelled');
    }
}

// Cancel the request
controller.abort();

API Reference

Constructor Options

interface ParseraOptions {
    apiKey: string;                // Your Parsera API key
    baseUrl?: string;             // Custom API endpoint (optional)
    defaultProxyCountry?: string; // Default proxy location
    timeout?: number;             // Request timeout in ms
    retryOptions?: {              // Retry configuration
        maxRetries?: number;      // Maximum retry attempts
        backoffFactor?: number;   // Exponential backoff multiplier
        initialDelay?: number;    // Initial retry delay in ms
    }
}

Extract Options

interface ExtractOptions {
    url: string;                  // Target webpage URL
    attributes: {                 // Data to extract
        [key: string]: string;    // Key: attribute name, Value: extraction instruction
    };
    proxyCountry?: string;       // Override default proxy
    cookies?: {                  // Custom cookies
        name: string;            // Cookie name
        value: string;           // Cookie value
        domain: string;          // Cookie domain
        path?: string;           // Cookie path
        sameSite: "None" | "Lax" | "Strict";
        secure?: boolean;        // Require HTTPS
        httpOnly?: boolean;      // Accessible via HTTP only
    }[];
    precisionMode?: boolean;     // Enable precision mode (10 credits)
    signal?: AbortSignal;        // For request cancellation
}

Events

The SDK emits the following events:

EventDescriptionData
extract:startExtraction begins{ url: string }
extract:completeExtraction completes{ data: any[], url: string }
extract:errorExtraction fails{ error: Error, url: string }
request:startAPI request begins{ method: string, url: string }
request:endAPI request completes{ method: string, url: string, duration: number }
request:retryRequest retry attempt{ retryCount: number, error: Error }
request:errorRequest fails{ error: Error }
rateLimitRate limit reached{ resetTime: number }
timeoutRequest timeout{ timeout: number }

Error Handling

try {
    const data = await parsera.extract({
        url: 'https://example.com/products',
        attributes: {
            title: 'Extract the product title'
        }
    });
} catch (error) {
    if (error instanceof ParseraRateLimitError) {
        console.log(`Rate limit exceeded. Reset in ${error.resetTime} seconds`);
    } else if (error instanceof ParseraTimeoutError) {
        console.log('Request timed out');
    } else if (error instanceof ParseraAPIError) {
        console.log(`API error: ${error.message}`);
    }
}

Credits Usage

  • Standard extraction: 1 credit per request
  • Precision mode: 10 credits per request
  • Failed requests: No credits charged
  • Retried requests: Counted as single request

License

MIT

1.0.1

7 months ago

1.0.0

7 months ago