Market Maker Crawler

A bot to capture every piece of data on the market.

Routine work

  1. Update Okx futures symbol weekly on Friday and Bitmex futures symbol quarterly on 28th.
  2. Run data-to-csv.js every weekend to backup data
    1. Backup selected databases;
    2. Restore from backed-up databases to local influxdb;
    3. Export CSV from local databases;
    4. Delete databases if storage is full.

How to reveal credentials

  1. Install gawk: sudo apt-get update && sudo apt-get install gawk.
  2. Install git-secret: (http://git-secret.io/installation)
  3. GPG version should be 1.4.* to make key files consistent.
  4. Scp local gpg keys: scp -r ~/.gnupg user@remotehost:~/
  5. Run git secret reveal to reveal credentials.js.

How to renew https certs

  1. Run letscrypt certbot (https://certbot.eff.org/).
sudo certbot renew --dry-run
  1. Make sure permission is correct.
chmod 755 /etc/letsencrypt/live/
chmod 755 /etc/letsencrypt/archive/
  1. Make sure the permission of every key is the same (rw-r--r--) in archive folder.
  2. Run systemctl restart grafana-server

Sentiment Analysis

Collect News content

Before run the project

Ensure install python-dev and libpq-dev

sudo apt-get install python-dev libpq-dev

and run the requirements use

pip install -r requirements.txt

For each requirements file under the folder bishijie / twitter / chaindd / jinse is works for specific vitual env respectively.

If you want to run all the projects on the same env, please use the requirement file in the root level.


├── README.md
├── bishijie
│   ├── bishijieFinal.csv
│   ├── bishijie_webdriver.py
│   ├── bishijie_xpath.py
│   ├
│   ├── historical
│   │   ├── data
│   │   │   └── bishijie_correct_time.csv
│   │   ├── get_all_historical.py
│   │   └── get_all_historical_correct_time.py
│   └── requirements.txt
├── chaindd
│   ├── chaindd_xpath.py
│   └── requirements.txt
├── jinse
│   ├── historical
│   │   ├── data
│   │   │   └── jinse.csv
│   │   └── jinse_historical.py
│   ├── jinse_xpath.py
│   └── requirements.txt
├── requirements.txt
├── spider.py
└── twitter
    ├── historical
    │   ├── data
    │   │   └── result.csv
    │   ├── twitter_historical_preprocessing.py
    │   └── twitter_historical_searching_result.py
    ├── mongoDB
    │   ├── data_preprocessing.py
    │   └── tweepy_connect.py
    ├── requirements.txt
    └── tweepy_script.py
website link
  1. each scipt collects real-time data and historical folder collects historical data for past a year or so
  • for script end with xpath use Requests and parse with Xpath
  • for script end with webdriver use Webdriver you should install selenium first
pip install selenium
  1. if you run Webdriver script on the server (Ubuntu), you should uncomment these lines
    options = Options()
    options.binary_location = '/usr/bin/google-chrome'
    driver = webdriver.Chrome(executable_path='/root/driver/chromedriver', chrome_options=options)

and comment these lines

    driver = webdriver.Chrome()
  1. if you run bishijie_xpath.py to serve as server make https available, don't forget to add add both ./fullchain and ./privkey key for local test, for server test, the key is place in /etc/letsencrypt/live/sentiment.icg.io-0003/ folder.


How to run

run each script use

python3 tweepy_script.py

if you want to run the script as deamon after quit the terminal you can use

nohup python3 -u tweepy_script.py > log.out &

to save the log in log.out

Log info

log will catch every error and print out some debug info such as

Writing 8 points to DB
Bishijie collected at 2018-08-03 18:33:46.114923
enter the main function
Create database: bishijie
Write DataFrame with Tags