1.0.54 • Published 6 months ago
icg-crawler v1.0.54
Market Maker Crawler
A bot to capture every piece of data on the market.
Routine work
- Update Okx futures symbol weekly on Friday and Bitmex futures symbol quarterly on 28th.
- Run data-to-csv.js every weekend to backup data
- Backup selected databases;
- Restore from backed-up databases to local influxdb;
- Export CSV from local databases;
- Delete databases if storage is full.
How to reveal credentials
- Install gawk:
sudo apt-get update && sudo apt-get install gawk
. - Install git-secret: (http://git-secret.io/installation)
- GPG version should be 1.4.* to make key files consistent.
- Scp local gpg keys:
scp -r ~/.gnupg user@remotehost:~/
- Run
git secret reveal
to revealcredentials.js
.
How to renew https certs
- Run letscrypt certbot (https://certbot.eff.org/).
sudo certbot renew --dry-run
- Make sure permission is correct.
chmod 755 /etc/letsencrypt/live/
chmod 755 /etc/letsencrypt/archive/
- Make sure the permission of every key is the same (
rw-r--r--
) inarchive
folder. - Run
systemctl restart grafana-server
Sentiment Analysis
Collect News content
Table of Contents
Requirements
Before run the project
Ensure install python-dev and libpq-dev
sudo apt-get install python-dev libpq-dev
and run the requirements use
pip install -r requirements.txt
For each requirements file under the folder bishijie / twitter / chaindd / jinse is works for specific vitual env respectively.
If you want to run all the projects on the same env, please use the requirement file in the root level.
Stucture
.
├── README.md
├── bishijie
│ ├── bishijieFinal.csv
│ ├── bishijie_webdriver.py
│ ├── bishijie_xpath.py
│ ├
│ ├── historical
│ │ ├── data
│ │ │ └── bishijie_correct_time.csv
│ │ ├── get_all_historical.py
│ │ └── get_all_historical_correct_time.py
│ └── requirements.txt
├── chaindd
│ ├── chaindd_xpath.py
│ └── requirements.txt
├── jinse
│ ├── historical
│ │ ├── data
│ │ │ └── jinse.csv
│ │ └── jinse_historical.py
│ ├── jinse_xpath.py
│ └── requirements.txt
├── requirements.txt
├── spider.py
└── twitter
├── historical
│ ├── data
│ │ └── result.csv
│ ├── twitter_historical_preprocessing.py
│ └── twitter_historical_searching_result.py
├── mongoDB
│ ├── data_preprocessing.py
│ └── tweepy_connect.py
├── requirements.txt
└── tweepy_script.py
folder | website link |
---|---|
bishijie | http://www.bishijie.com/kuaixun/ |
chaindd | http://www.chaindd.com/nictation/ |
jinse | https://www.jinse.com/lives |
https://twitter.com/ |
- each scipt collects real-time data and historical folder collects historical data for past a year or so
- for script end with
xpath
use Requests and parse with Xpath - for script end with
webdriver
use Webdriver you should install selenium first
pip install selenium
- if you run Webdriver script on the server (Ubuntu), you should
uncomment
these lines
options = Options()
options.binary_location = '/usr/bin/google-chrome'
options.add_argument('--headless')
options.add_argument('--disable-gpu')
options.add_argument('--no-sandbox')
driver = webdriver.Chrome(executable_path='/root/driver/chromedriver', chrome_options=options)
and comment
these lines
driver = webdriver.Chrome()
- if you run
bishijie_xpath.py
to serve as server make https available, don't forget to add add both ./fullchain and ./privkey key for local test, for server test, the key is place in /etc/letsencrypt/live/sentiment.icg.io-0003/ folder.
Usage
How to run
run each script use
python3 tweepy_script.py
if you want to run the script as deamon after quit the terminal you can use
nohup python3 -u tweepy_script.py > log.out &
to save the log in log.out
Log info
log will catch every error and print out some debug info such as
Writing 8 points to DB
Bishijie collected at 2018-08-03 18:33:46.114923
enter the main function
1.41
Create database: bishijie
Write DataFrame with Tags