2.5.1 • Published 1 month ago

scrapefrom v2.5.1

Weekly downloads
34
License
ISC
Repository
github
Last release
1 month ago

scrapefrom

Scrape data from any webpage.

installation

$ npm i scrapefrom

import

const scrapefrom = require("scrapefrom");
// or,
// import scrapefrom from "scrapefrom"

use cases

Extract html, htmlSplits, htmlStripped, htmlStrippedSplits.

scrapefrom("https://www.npmjs.com/package/scrapefrom").then(console.log);

Extract an array of strings for all h1 tags on a page.

scrapefrom({
  url: "https://www.npmjs.com/package/scrapefrom",
  extract: "h1",
  defaultDelimiter: null,
}).then(console.log); // "{ h1: [...] }"

Extract an array of strings for all h1 tags on a page as "titles".

scrapefrom({
  url: "https://www.npmjs.com/package/scrapefrom",
  extract: { name: "titles", selector: "h1", delimiter: null },
}).then(console.log); // "{ titles: [...] }"

Extract a joined array of strings for all h1 tags on a page using a delimiter, as "title".

scrapefrom({
  url: "https://www.npmjs.com/package/scrapefrom",
  extract: { name: "title", selector: "h1", delimiter: "," },
}).then(console.log); // "{ title: "...,..." }"

Extract an array of datetime attribute values for all time tags on a page as "dates".

scrapefrom({
  url: "https://www.npmjs.com/package/scrapefrom",
  extract: {
    name: "dates",
    selector: "time",
    attribute: "datetime",
    delimiter: null,
  },
}).then(console.log); // "{ dates: [...] }"

Extract previous use cases in a single config.

scrapefrom({
  url: "https://www.npmjs.com/package/scrapefrom",
  defaultDelimiter: null,
  extracts: [
    { name: "titles", selector: "h1" },
    { name: "dates", selector: "time", attribute: "datetime" },
  ],
}).then(console.log); // "{ titles: [...], dates: [...] }"

Extract previous use cases from multiple URLs.

scrapefrom([
  {
    url: "https://www.npmjs.com/package/scrapefrom",
    defaultDelimiter: null,
    extracts: [
      { name: "titles", selector: "h1" },
      { name: "dates", selector: "time", attribute: "datetime" },
    ],
  },
  {
    url: "https://www.npmjs.com/package/async-fetch",
    defaultDelimiter: null,
    extracts: [
      { name: "titles", selector: "h1" },
      { name: "dates", selector: "time", attribute: "datetime" },
    ],
  },
]).then(console.log); // "[{ titles: [...], dates: [...] }, { titles: [...], dates: [...] }]"

Extract a list of items from a page.

scrapefrom({
  url: "https://www.npmjs.com/package/async-fetch",
  extract: {
    selector: "tbody tr",
    name: "rows",
    extracts: [
      { selector: "td:nth-child(1)", name: "key" },
      { selector: "td:nth-child(2)", name: "type" },
      { selector: "td:nth-child(3)", name: "definition" },
      { selector: "td:nth-child(4)", name: "default" },
    ],
  },
}).then(console.log); // "[ { key: "...", type: "...", definition: "...", default: "..." }, ...]"

if a page requires javascript...

By default scrapefrom utilizes fetch under the hood, but if a page is unavailable because it requires javascript, there is the option to use puppeteer (which should be able to bypass this requirement through the use of a headless chrome browser).

2.5.1

1 month ago

2.5.0

5 months ago

2.4.9

5 months ago

2.4.8

5 months ago

2.4.7

11 months ago

2.4.6

11 months ago

2.4.3

1 year ago

2.4.5

12 months ago

2.4.4

1 year ago

2.4.1

1 year ago

2.4.0

1 year ago

2.4.2

1 year ago

2.3.8

1 year ago

2.3.9

1 year ago

2.3.7

2 years ago

2.3.6

2 years ago

2.3.2

2 years ago

2.3.1

2 years ago

2.3.4

2 years ago

2.3.3

2 years ago

2.3.5

2 years ago

2.3.0

2 years ago

2.2.9

2 years ago

2.2.8

3 years ago

2.2.7

3 years ago

2.2.5

3 years ago

2.2.4

3 years ago

2.2.6

3 years ago

2.2.3

3 years ago

2.2.1

3 years ago

2.2.0

3 years ago

2.2.2

3 years ago

2.1.6

3 years ago

2.1.8

3 years ago

2.1.7

3 years ago

2.1.9

3 years ago

2.1.5

3 years ago

2.1.4

3 years ago

2.1.3

3 years ago

2.1.2

4 years ago

2.1.1

4 years ago

2.0.9

4 years ago

2.1.0

4 years ago

2.0.8

4 years ago

2.0.7

4 years ago

2.0.6

4 years ago

2.0.5

4 years ago

2.0.4

4 years ago

2.0.3

4 years ago

2.0.2

4 years ago

2.0.1

4 years ago

2.0.0

4 years ago

1.1.3

4 years ago

1.1.2

4 years ago

1.1.1

4 years ago

1.1.0

4 years ago

1.0.9

4 years ago

1.0.8

4 years ago

1.0.7

4 years ago

1.0.6

4 years ago

1.0.5

4 years ago

1.0.2

4 years ago

1.0.4

4 years ago

1.0.3

4 years ago

1.0.1

4 years ago

1.0.0

4 years ago