1.0.54 • Published 2 years ago
icg-crawler v1.0.54
Market Maker Crawler
A bot to capture every piece of data on the market.
Routine work
- Update Okx futures symbol weekly on Friday and Bitmex futures symbol quarterly on 28th.
- Run data-to-csv.js every weekend to backup data- Backup selected databases;
- Restore from backed-up databases to local influxdb;
- Export CSV from local databases;
- Delete databases if storage is full.
 
How to reveal credentials
- Install gawk: sudo apt-get update && sudo apt-get install gawk.
- Install git-secret: (http://git-secret.io/installation)
- GPG version should be 1.4.* to make key files consistent.
- Scp local gpg keys: scp -r ~/.gnupg user@remotehost:~/
- Run git secret revealto revealcredentials.js.
How to renew https certs
- Run letscrypt certbot (https://certbot.eff.org/).
sudo certbot renew --dry-run- Make sure permission is correct.
chmod 755 /etc/letsencrypt/live/
chmod 755 /etc/letsencrypt/archive/- Make sure the permission of every key is the same (rw-r--r--) inarchivefolder.
- Run systemctl restart grafana-server
Sentiment Analysis
Collect News content
Table of Contents
Requirements
Before run the project
Ensure install python-dev and libpq-dev
sudo apt-get install python-dev libpq-devand run the requirements use
pip install -r requirements.txtFor each requirements file under the folder bishijie / twitter / chaindd / jinse is works for specific vitual env respectively.
If you want to run all the projects on the same env, please use the requirement file in the root level.
Stucture
.
├── README.md
├── bishijie
│   ├── bishijieFinal.csv
│   ├── bishijie_webdriver.py
│   ├── bishijie_xpath.py
│   ├
│   ├── historical
│   │   ├── data
│   │   │   └── bishijie_correct_time.csv
│   │   ├── get_all_historical.py
│   │   └── get_all_historical_correct_time.py
│   └── requirements.txt
├── chaindd
│   ├── chaindd_xpath.py
│   └── requirements.txt
├── jinse
│   ├── historical
│   │   ├── data
│   │   │   └── jinse.csv
│   │   └── jinse_historical.py
│   ├── jinse_xpath.py
│   └── requirements.txt
├── requirements.txt
├── spider.py
└── twitter
    ├── historical
    │   ├── data
    │   │   └── result.csv
    │   ├── twitter_historical_preprocessing.py
    │   └── twitter_historical_searching_result.py
    ├── mongoDB
    │   ├── data_preprocessing.py
    │   └── tweepy_connect.py
    ├── requirements.txt
    └── tweepy_script.py| folder | website link | 
|---|---|
| bishijie | http://www.bishijie.com/kuaixun/ | 
| chaindd | http://www.chaindd.com/nictation/ | 
| jinse | https://www.jinse.com/lives | 
| https://twitter.com/ | 
- each scipt collects real-time data and historical folder collects historical data for past a year or so
- for script end with xpathuse Requests and parse with Xpath
- for script end with webdriveruse Webdriver you should install selenium first
pip install selenium- if you run Webdriver script on the server (Ubuntu), you should uncommentthese lines
    options = Options()
    options.binary_location = '/usr/bin/google-chrome'
    options.add_argument('--headless')
    options.add_argument('--disable-gpu')
    options.add_argument('--no-sandbox')
    driver = webdriver.Chrome(executable_path='/root/driver/chromedriver', chrome_options=options)and comment these lines
    driver = webdriver.Chrome()- if you run bishijie_xpath.pyto serve as server make https available, don't forget to add add both ./fullchain and ./privkey key for local test, for server test, the key is place in /etc/letsencrypt/live/sentiment.icg.io-0003/ folder.
Usage
How to run
run each script use
python3 tweepy_script.pyif you want to run the script as deamon after quit the terminal you can use
nohup python3 -u tweepy_script.py > log.out &to save the log in log.out
Log info
log will catch every error and print out some debug info such as
Writing 8 points to DB
Bishijie collected at 2018-08-03 18:33:46.114923
enter the main function
1.41
Create database: bishijie
Write DataFrame with Tags