Crawlx-cloudscraper NPM

crawlx-cloudscraper

This plugin rewrites attempts callback for crawlx, using puppeteer to bypass cloudflare's anti-ddos page.

const x = require("crawlx").default;
const cfPlugin = require("crawlx-cloudscraper")({
  targetUrl: "https://www.apotea.se/",
  waitForSelector: '#main-wrapper'
});
x.use(cfPlugin);
x({ url: "https://www.apotea.se/" }).then(t => {console.log(t.res.statusCode)});

Results:

Start Bypassing: limit concurrency to 0.
Start Bypassing: https://www.apotea.se/
Finish Bypassing: {"user-agent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.100 Safari/537.36","cookie":"shopper=************; ASP.NET_SessionId=**************; _culture=sv; __cfduid=***********; cf_clearance=***********"}
Finish Bypassing: resume concurrency to 2
200

Options:

const pluginOptions = {
  // required
  targetUrl: "https://www.apotea.se/", 
  waitForSelector: '#main-wrapper' // bypassed if this element exists

  // optional
  statusAllowed: [503], // requests with 503 code will be handled
  attempts: 2,
  userAgent: "", // empty: use crawlx's default useragent
  fileDir: require('os').homedir(),
  fileName: ".crawlx-cf.json", // file to store headers information
  log: console.log,
  delayForBypass: 6000,
}

5 years ago

5 years ago

5 years ago