Icg-crawler NPM

Market Maker Crawler

A bot to capture every piece of data on the market.

Routine work

Update Okx futures symbol weekly on Friday and Bitmex futures symbol quarterly on 28th.
Run data-to-csv.js every weekend to backup data
1. Backup selected databases;
2. Restore from backed-up databases to local influxdb;
3. Export CSV from local databases;
4. Delete databases if storage is full.

How to reveal credentials

Install gawk: sudo apt-get update && sudo apt-get install gawk.
Install git-secret: (http://git-secret.io/installation)
GPG version should be 1.4.* to make key files consistent.
Scp local gpg keys: scp -r ~/.gnupg user@remotehost:~/
Run git secret reveal to reveal credentials.js.

How to renew https certs

Run letscrypt certbot (https://certbot.eff.org/).

sudo certbot renew --dry-run

Make sure permission is correct.

chmod 755 /etc/letsencrypt/live/
chmod 755 /etc/letsencrypt/archive/

Make sure the permission of every key is the same (rw-r--r--) in archive folder.
Run systemctl restart grafana-server

Sentiment Analysis

Collect News content

Requirements
Stucture
Usage

Requirements

Before run the project

Ensure install python-dev and libpq-dev

sudo apt-get install python-dev libpq-dev

and run the requirements use

pip install -r requirements.txt

For each requirements file under the folder bishijie / twitter / chaindd / jinse is works for specific vitual env respectively.

If you want to run all the projects on the same env, please use the requirement file in the root level.

Stucture

.
├── README.md
├── bishijie
│   ├── bishijieFinal.csv
│   ├── bishijie_webdriver.py
│   ├── bishijie_xpath.py
│   ├
│   ├── historical
│   │   ├── data
│   │   │   └── bishijie_correct_time.csv
│   │   ├── get_all_historical.py
│   │   └── get_all_historical_correct_time.py
│   └── requirements.txt
├── chaindd
│   ├── chaindd_xpath.py
│   └── requirements.txt
├── jinse
│   ├── historical
│   │   ├── data
│   │   │   └── jinse.csv
│   │   └── jinse_historical.py
│   ├── jinse_xpath.py
│   └── requirements.txt
├── requirements.txt
├── spider.py
└── twitter
    ├── historical
    │   ├── data
    │   │   └── result.csv
    │   ├── twitter_historical_preprocessing.py
    │   └── twitter_historical_searching_result.py
    ├── mongoDB
    │   ├── data_preprocessing.py
    │   └── tweepy_connect.py
    ├── requirements.txt
    └── tweepy_script.py

folder	website link
bishijie	http://www.bishijie.com/kuaixun/
chaindd	http://www.chaindd.com/nictation/
jinse	https://www.jinse.com/lives
twitter	https://twitter.com/

each scipt collects real-time data and historical folder collects historical data for past a year or so

for script end with xpath use Requests and parse with Xpath
for script end with webdriver use Webdriver you should install selenium first

pip install selenium

if you run Webdriver script on the server (Ubuntu), you should uncomment these lines

    options = Options()
    options.binary_location = '/usr/bin/google-chrome'
    options.add_argument('--headless')
    options.add_argument('--disable-gpu')
    options.add_argument('--no-sandbox')
    driver = webdriver.Chrome(executable_path='/root/driver/chromedriver', chrome_options=options)

and comment these lines

    driver = webdriver.Chrome()

if you run bishijie_xpath.py to serve as server make https available, don't forget to add add both ./fullchain and ./privkey key for local test, for server test, the key is place in /etc/letsencrypt/live/sentiment.icg.io-0003/ folder.

Usage

How to run

run each script use

python3 tweepy_script.py

if you want to run the script as deamon after quit the terminal you can use

nohup python3 -u tweepy_script.py > log.out &

to save the log in log.out

Log info

log will catch every error and print out some debug info such as

Writing 8 points to DB
Bishijie collected at 2018-08-03 18:33:46.114923
enter the main function
1.41
Create database: bishijie
Write DataFrame with Tags

@icgio/clients-info @icgio/icg-exchanges @icgio/icg-exchanges-data @icgio/icg-exchanges-wrapper @icgio/icg-utils @influxdata/influxdb-client async bson-ext express import-fresh influx needle node-cmd prompt web3

@infinitebrahmanuniverse/nolb-icg @zalastax/nolb-icg

1.0.54

2 years ago

1.0.52

3 years ago

icg-crawler v1.0.54