eshop-scraper v1.3.0
Eshop Scraper (eshop-scraper)
Eshop Scraper is a npm package.
npm i eshop-scraper
What it does (in short)
This package can be used for getting some important data like price, currency, name from various famous e-commerce websites like Amazon, Steam, Walmart etc.
Support
{
"node": ">=16.16.0",
"npm": ">=8.11.0"
}
Getting Started
Create a instance of eshop-scraper class
import eshop_scraper from 'eshop-scraper'
const scraper = new eshop_scraper()
Use .getData()
method of the class to scrape
import eshop_scraper from 'eshop-scraper'
const scraper = new eshop_scraper()(async () => {
let res = await scraper.getData('https://www.test.com/product/355223235')
console.log(res)
})
.getData()
The method is used to scrape an website data that's entry is available in _webprops
.
Parameter
The method takes only one single parameter. Pass the link of the item you want to scrape inside the function.
scraper.getData(link)
Output
It will will output a promise. Use async/await to handle the output.
Sample output:
{
price: 140.36,
currency: 'USD',
name: 'Test Item',
site: 'Test',
link: 'https://www.test.com/product/355223235'
}
Config
Pass new configs inside the class to config some extra things. It's optional because some common configs already included in the scraper to use without any problem.
Insert new entries
You can insert new entries in the scraper, then you can scrape items from that website just like default entries.
import eshop_scraper from 'eshop-scraper'
// create a map with new entries
const propsList = new Map([
[
'test.com',
{
// website's domain or subdomain
site: 'Test', // website's name
selector: {
price: ['span[itemprop="price"]'], // items's price html selector
name: ['h1[itemprop="name"]'], // items's name html selector
},
},
],
// follow the same structure and add many more sites, inside the map
])
const config = {
webprops: propsList, // pass a map with new entries in webprops
}
const scraper = new eshop_scraper(config)
Replace or exclude extra things
Exclude extra things to make the scraper work. The scraper needs to get a string like "\$50.30" or "USD 40" or "30 \$" from the price selector.
import eshop_scraper from 'eshop-scraper'
const obj = {
'price is:': '', // pass empty string to exclude
now: '',
usd: '$', // replace one string with another
}
const config = {
replaceobj: obj, // pass an object in replaceobj
}
const scraper = new eshop_scraper(config)
Insert new currencies
Some websites may show prices in bitcoin or some unknown currency, to show them in proper way you need to map them. Otherwise you will get undefined
in currency
output.
import eshop_scraper from 'eshop-scraper'
// create a map with new currencies
const currencyList = new Map([
[
'$',
'USD',
['euro', '€'],
'EUR', // to map multiple strings to one currency put the strings inside an array
],
// follow the same structure and add many more currencies, inside the map
])
const config = {
currencymap: currencyList, // pass a map with new currency entries in currencymap
}
const scraper = new eshop_scraper(config)
Insert new set of headers
To make scraper looks realistic and prevent the website from blocking the ip, realistic headers needed to be set.
import eshop_scraper from 'eshop-scraper'
const newheaders = [
{
Accept:
'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
'User-Agent':
'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.107 Safari/537.36',
},
{
Accept:
'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
'User-Agent':
'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:90.0) Gecko/20100101 Firefox/90.0',
},
// add many more headers inside the array
]
const config = {
headersarr: newheaders, // pass a arrray with new set of headers in headersarr
}
const scraper = new eshop_scraper(config)
Set timeout
If the request takes longer time than the set amount of time and the website doesn't response within that time then the request will be cancelled.
import eshop_scraper from 'eshop-scraper'
const config = {
timeout: 10, // pass an integer in timeout (counted as second)
}
const scraper = new eshop_scraper(config)
Check default values
Use these only to check default valuess, directly replacing values with new values not recommended.
import eshop_scraper from 'eshop-scraper'
const scraper = new eshop_scraper()(async () => {
let defProps = scraper._webprops // default supported websites
let defReplaceStrings = scraper._replaceobj // default replaced strings
let defHeaders = scraper._headers // default set of headers
let defTimeout = scraper._timeoutAmount // default timeout amount
let defCurrencyMap = scraper._currencymap // default currency map
console.log(defProps)
console.log(defReplaceStrings)
console.log(defHeaders)
console.log(defTimeout)
console.log(defCurrencyMap)
})
Supported websites
It supports 12 websites by default and more can be added very easily.
Websites list
- Steam (store.steampowered.com)
- Amazon (amazon.com, amazon.in)
- Walmart (walmart.com)
- Crutchfield (crutchfield.com)
- Playstation (store.playstation.com, gear.playstation.com, direct.playstation.com)
- Rakuten (fr.shopping.rakuten.com)
- Ebay (ebay.com)
- Ebags (ebags.com)
- Bikroy (bikroy.com)
- Flipkart (flipkart.com)
- Etsy (etsy.com)
- Avito (avito.ru)
Note
Some websites may show unexpected result. Because all websites doesn't support the same way of scraping. Also this scraper is made for static websites. Dynamic / Single Page websites won't work with this scraper. Those will be supported in future version of this scraper.
Some websites may show prices like "2345" instead of "23.45" because those websites initially shows the price without any dot or shows with a comma and later dynamically changed with a dot, as comma is excluded by the scraper and the scraper can't execute javascript while scraping, that's why the price is shown as "2345".
Some websites shows price in local language. The scraper processes the price that's got from the website and it only understands English. So the price has to be in English. Otherwise it will return NaN
in price output and undefined
in currency output.
Contribute
Contribute in the project by opening a pull request on github. Contributions are welcomed!
Proudly Made In Bangladesh 🇧🇩