1.0.2 • Published 1 year ago
url-reader v1.0.2
URL READER
This project helps you to read the content of URLs, and return the title, length, html, text, markdown, excerpt.
"node": ">=20.11.0"
Installation
yarn add url-reader
# or npm install url-readerUsage
import URLReader from 'url-reader';
const reader = new URLReader();
await reader.init();
const results = await reader.read({
urls: ['https://www.google.com'],
timeout: 10000, // ms, default: 60000
enableMarkdown: false, // default: true
runScripts: 'dangerously', // run the scripts included in the HTML and fetch remote resources, default is closed.
});Parsed Result:
interface IReaderResult {
title: string;
length: number;
html: string;
text: string;
markdown?: string;
excerpt: string;
}Server
- start server
git clone https://github.com/yokingma/url-reader.git
cd url-reader
# default listen on port 3030
yarn install & yarn run start- api
GET /reader?url=https://www.google.com
POST /reader
Body:
{
urls: ['https://www.google.com', 'https://www.bing.com']
}Docker
docker build -t urlreader . # urlreader is your image's tag nameThe service will listen on port 3030.
Tips
- puppeteer When you install Puppeteer, it will automatically downloads a recent version of Chrome for Testing (~170MB macOS, ~282MB Linux, ~280MB Windows) and a chrome-headless-shell binary.
Troubleshooting
- install error with puppeteer
Error [ERR_TLS_CERT_ALTNAME_INVALID]: Hostname/IP does not match certificate's altnames...remove .npmrc file and re-install.