0.15.0 • Published 4 years ago

video_ad_spy v0.15.0

Weekly downloads
-
License
MIT
Repository
-
Last release
4 years ago

Video Ad Spy

Scrapes Youtube videos then it watches them and saves the HAR for every video in the database.

Requirements

Installation

git clone https://bitbucket.org/vadteam/video-ad-spy.git
cd video-ad-spy
yarn install
  • Next we need to setup our .env file. Simply copy .env.example to .env and fill in the data for PG_CONNECTION_STRING and other variables.
  • Run database migration yarn db:migrate

Getting started

  • You can test out the app by using CLI tools. To do so please refer to CLI.md for info about CLI

  • Other option is to run the normal app which consists of master & worker. To do so run yarn dev

➜ yarn dev
yarn run v1.13.0
$ concurrently --kill-others "yarn worker" "yarn master"
$ nodemon worker.js
$ nodemon master.js
[0] [nodemon] 1.18.4
[0] [nodemon] to restart at any time, enter `rs`
[0] [nodemon] watching: *.*
[0] [nodemon] starting `node worker.js`
[1] [nodemon] 1.18.4
[1] [nodemon] to restart at any time, enter `rs`
[1] [nodemon] watching: *.*
[1] [nodemon] starting `node master.js`
[1] [7/20/2019] [9:12:00 AM] [master.js] › ✔  success   [Server] MASTER listening on port :8080
[0] [7/20/2019] [9:12:00 AM] [worker.js] › ✔  success   [Server] WORKER listening on port :8081
[0] [7/20/2019] [9:12:00 AM] [worker.js] › ✔  success   [Worker] socket connected - address: ::ffff:127.0.0.1
[1] [7/20/2019] [9:12:00 AM] [index.js] › ℹ  info      [Master] ws://localhost:8081: connected!

HAR

We save har files of scraped youtube vides in a folder called "har" which resides in app's root. Each file in that folder has this format <timestamp>-<youtubeVideoId>.gz, ".gz" extension implies that this file is compressed using gzip. To unzip/decompress these files we can use gunzip utility and we do something like gunzip < compressed_file.json > decompressed_file.json for instance if we have a file called 1564225586214-SKqu2kH6cPw.gz we can decompress it by doing that gunzip < 1564225586214-SKqu2kH6cPw.gz > 1564225586214-SKqu2kH6cPw.json

Har files older than 1 week are automatically deleted. Checkout delete-old-har.js file

UPDATE: this is deprecated, we no longer use HAR data