1.0.3 • Published 4 months ago

gutenbergscraper v1.0.3

Weekly downloads
-
License
ISC
Repository
-
Last release
4 months ago

Gutenberg Scraper

The Gutenberg Scraper is a tool designed to scrape content from Project Gutenberg. But how does it work?

The Gutenberg Scraper uses parallelism and other technologies to speed up the scraping process for Node.js applications. It is primarily built with TypeScript.

If you'd like to use this scraper, here's an example of how to set it up:

You’ll likely notice a file named index.ts. This is where you can begin. By default, it will contain some example code, such as:

import { Scraper } from './Scraper';

const scraper = new Scraper({
  useBooknum: [12, 50],  // Scrape books from 12 to 50
  FormatOutput: 'csv',   // Output format will be CSV
  userAgent: 'Mozilla/5.0',
  timeout: 5000          // Set a timeout for requests
}, 10, 3); // Scrape 10 books at once and retry 3 times in case of failure

scraper.scrape();

In this example:

  • useBooknum: [12, 50] specifies the range of books to scrape, from book number 12 to 50.
  • FormatOutput: 'csv' indicates that the output will be in CSV format. You can also choose other formats, such as CSV, TXT, or JSON.
  • userAgent: 'Mozilla/5.0' sets a custom user-agent to help prevent the scraper from being blocked by the website.
  • timeout: 5000 sets the timeout for each request to 5000 milliseconds (5 seconds).

The second part of the constructor, 10 and 3, represents:

  • 10: The number of parallel requests to make at once. This allows the scraper to scrape multiple books simultaneously, speeding up the process.
  • 3: The number of retry attempts in case a request fails. If a book fails to scrape, the scraper will retry up to 3 times before it gives up.

Once you've set this up, calling scraper.scrape() will start the scraping process based on the provided configuration. You can choose the output format to be CSV, JSON, or TXT as per your preference.

To use it first install the package by running npm i gutenbergscraper once run you can directly type in the command prompt or powershell npm i then npm run start and your done~!

gutenbergscrapernodetypescriptweb scrapinggutenberg scrapergutenberg downloaderbook scraperbook downloadergutenberg apinode.jsnode scraperhttp requestparallel scrapingweb scraping nodegutenberg booksopen sourceproject gutenbergscrape booksgutenberg downloader nodescraping librarydata extractionhtml parseraxioscheerionpm scraperscrape datagutenberg projectscraper toolnode scraping librarycsv outputjson outputtxt outputbook metadataebook downloaderbook formatscraper frameworkgutenberg text extractionnode scraping toolnodejs scrapertypescript scraperbooks in csvbooks in jsonebooks in txtbook extractionscrape project gutenberggutenberg contentscrape Gutenberg projectweb crawlerdata scraperautomated scrapingscraping frameworknodejs web scrapinghtml to csvhtml to jsonscrape web contentbook content extractionscraping toolextracting book datanodejs scrapingtext extractionbook data scraperscraping project gutenbergscrape ebooksgutenberg libraryebook extractionscraper node packagescraper for booksbook data exporterbook web scrapertypescript web scrapernode parallelismscraping parallelnodejs parallel scrapingdata extraction toolscraping framework nodeasync scrapernpm scrapernpm scraping librarynpm scraper toolscrape from gutenbergbook scraper nodenodejs downloadernpm scraper projectnode scraper typescriptparallel request scraperscraper with retriesscrape with retriesscraping with retriesweb scraping packagenodejs web scraperscraping npm packageparallel scraping npmscrape ebooks nodescraper npmnpm web scraperasync scrapinghtml to bookweb scraper npmscraper parallelgutenberg ebooksopen bookstext scrapingnodejs scraping tooltypescript web scrapingweb scraping toolsscrape html contentscraping data frameworkscrape content nodejsscrape books project gutenbergdata extraction nodejsscraper nodejs toolweb scraper typescriptgutenberg book extractorbooks from gutenbergscraping packageparallel requests scrapingscraping tools nodejsscraping with nodejsscraping html to csvscraping html to jsonscraping in nodejsgutenberg book downloaderscraper for gutenbergbook scraper npmhtml web scrapergutenberg html scraperscraping library nodejsscraper retry nodejsgutenberg node scraperdata extraction tool nodejsscrape html booksscrape gutenberg contentscrape books into csvscraper javascriptscraper for nodejsdata scraping toolgutenberg library scrapebook download toolgutenberg web scrapingnodejs scraper toolscraping project gutenberg booksscraping with axiosscraping with cheerioscraper nodejs projectscraper nodejs npmparallel data scrapingscrape books jsonscraper retryscrape books retryscraper csv jsonscraper typescript nodescrape gutenberg booksbooks scraperscraper node npmgutenberg scraper npmscrape from gutenberg nodejsgutenberg content extractionscrape gutenberg books nodejsnpm book scraperscrape gutenberg projectscraping books nodejsgutenberg content extractorscraper for gutenberg booksscrape gutenberg with nodejsscraper with axios cheerioscraper npm packagegutenberg html datascraper nodejs npmscraper nodejs parallelscraping with cheerio npmscraping books npmscrape book textbooks from gutenberg scraperscraper books nodejshtml extraction nodejsscrape gutenberg librarybook data extractionscraping books to csvscraper npm nodejsscraping books textnodejs scraping librarygutenberg project scraperbook data scraper nodenodejs text scrapinggutenberg scraping toolscrape html nodejsgutenberg metadata scraperbooks scraper npmscrape to csvscraper for ebooksproject gutenberg scrapingscrape gutenberg textscraper for gutenberg projectnodejs scraper npmscraper html databook data nodejsscraper parallel requestscraper library nodeweb scraping tools npmgutenberg text scraperscrape gutenberg project datascrape nodejsscraper project gutenbergnodejs project gutenbergscrape books from gutenbergscraper text extractionhtml book scraperscraper gutenberg htmlscraper parallel processingscraper nodejs retryscrape gutenberg books jsonscraper nodejs csvscraper with cheerio htmlscraping gutenberg with nodescrape from gutenberg csvgutenberg nodejs scraperhtml scraping nodejsbook extraction npmscraping books jsonscraping with axios cheerioscrape nodejs booksscrape gutenberg html nodejsscraper project gutenberg npmscraping gutenberg booksscraper for books project gutenbergscrape books text nodejsscraper npm projectscraper for gutenberg project booksscraper books projectscraper for html to jsonscraping books in nodejsscraping to json npmscrape html books nodejsscrape books nodejs npmgutenberg text extraction scrapergutenberg books json scraperscraper books text extractionscraper books datagutenberg scrape npmscraper text to csvgutenberg node scraperscraping books npmscraper gutenberg nodejs
1.0.3

4 months ago

1.0.2

4 months ago

1.0.1

4 months ago

1.0.0

4 months ago