video_ad_spy v0.15.0
Video Ad Spy
Scrapes Youtube videos then it watches them and saves the HAR for every video in the database.
Requirements
- Node.js (version > 9.9)
- Yarn
- PostgerSQL
- Redis
Installation
git clone https://bitbucket.org/vadteam/video-ad-spy.git
cd video-ad-spy
yarn install
- Next we need to setup our
.env
file. Simply copy.env.example
to.env
and fill in the data forPG_CONNECTION_STRING
and other variables. - Run database migration
yarn db:migrate
Getting started
You can test out the app by using CLI tools. To do so please refer to CLI.md for info about CLI
Other option is to run the normal app which consists of master & worker. To do so run
yarn dev
➜ yarn dev
yarn run v1.13.0
$ concurrently --kill-others "yarn worker" "yarn master"
$ nodemon worker.js
$ nodemon master.js
[0] [nodemon] 1.18.4
[0] [nodemon] to restart at any time, enter `rs`
[0] [nodemon] watching: *.*
[0] [nodemon] starting `node worker.js`
[1] [nodemon] 1.18.4
[1] [nodemon] to restart at any time, enter `rs`
[1] [nodemon] watching: *.*
[1] [nodemon] starting `node master.js`
[1] [7/20/2019] [9:12:00 AM] [master.js] › ✔ success [Server] MASTER listening on port :8080
[0] [7/20/2019] [9:12:00 AM] [worker.js] › ✔ success [Server] WORKER listening on port :8081
[0] [7/20/2019] [9:12:00 AM] [worker.js] › ✔ success [Worker] socket connected - address: ::ffff:127.0.0.1
[1] [7/20/2019] [9:12:00 AM] [index.js] › ℹ info [Master] ws://localhost:8081: connected!
HAR
We save har files of scraped youtube vides in a folder called "har" which resides in app's root. Each file in that folder has this format <timestamp>-<youtubeVideoId>.gz
, ".gz" extension implies that this file is compressed using gzip. To unzip/decompress these files we can use gunzip
utility and we do something like gunzip < compressed_file.json > decompressed_file.json
for instance if we have a file called 1564225586214-SKqu2kH6cPw.gz
we can decompress it by doing that gunzip < 1564225586214-SKqu2kH6cPw.gz > 1564225586214-SKqu2kH6cPw.json
Har files older than 1 week are automatically deleted. Checkout delete-old-har.js
file
UPDATE: this is deprecated, we no longer use HAR data
5 years ago