1.0.0 • Published 1 year ago

@chcaa/twitter-academic-search v1.0.0

Weekly downloads
-
License
ISC
Repository
github
Last release
1 year ago

Twitter Academic Search

A tool for using the twitter v2 api

Installation

  • Install node.js version 16.x or higher

Usage

Run one of the following:

$ npx -p @chcaa/twitter-academic-search@latest search -h 
$ npx -p @chcaa/twitter-academic-search@latest conversations -h 
$ npx -p @chcaa/twitter-academic-search@latest hydrate -h 

Local Usage

  • Clone this repository
  • Navigate to the root of the repository and run
$ npm install

You can now the run files directly by navigating to the .src/cli of the repository and run one of the following depending on what you want to do (will display the help manual).

$ node search -h 
$ node conversations -h 
$ node hydrate -h 

Search

Search the full twitter history using the twitter query language.

CLI options

  • -k, --api-credentials <apiCredentials> required - The twitter API bearerToken OR the consumerKey and consumerSecret as a string in the format KEY:SECRET OR a file containing one of the two
  • -q, --query <query> required - The twitter search query in the twitter query language
  • -d, --destination <directory> required - The directory where the result should be stored
  • -p, filename <string> optional - The name of the result file
  • -f, --from <date> required - The date to fetch data from in the format "yyyy-mm-dd" or a timestamp in ms
  • -t, --to <date> required - The date to fetch data to (included) in the format "yyyy-mm-dd" or a timestamp in ms. To fetch a single day this should be the same as "from" i using the "yyyy-mm-dd" format
  • -w, --resume-token <string> optional - Resume token used to resume if e.g. an error occurred or the fetch was aborted
  • -z, --development-mode optional - Should logging data be printed to the stdout

Example

$ npx -p @chcaa/twitter-academic-search@latest search -k KEY:SECRET -q "war OR peace" -d "/data/twitter" -p "war_peace" -f "2020-10-10" -t "2021-01-01"

Conversations

Fetch full conversations based on a conversation id.

CLI options

  • -k, --api-credentials <apiCredentials> required - The twitter API bearerToken OR the consumerKey and consumerSecret as a string in the format KEY:SECRET OR a file containing one of the two
  • -i, --ids <string|path> required - A comma separated list of ids ("123,345,678") or a path to a file with ids - one id pr. line
  • -d, --destination <directory> required - The directory where the result should be stored
  • -f, --from <date> required - The date to fetch data from in the format "yyyy-mm-dd" or a timestamp in ms
  • -t, --to <date> required - The date to fetch data to (included) in the format "yyyy-mm-dd" or a timestamp in ms. To fetch a single day this should be the same as "from" i using the "yyyy-mm-dd" format
  • -z, --development-mode optional - Should logging data be printed to the stdout

Example

$ npx -p @chcaa/twitter-academic-search@latest conversations -k KEY:SECRET -i "1234,3444" -d "/data/twitter" -f "2020-10-10" -t "2021-01-01"

Hydrate

Hydrate a list of tweet id's to their full json-form.

CLI options

  • -k, --api-credentials <apiCredentials> required - The twitter API bearerToken OR the consumerKey and consumerSecret as a string in the format KEY:SECRET OR a file containing one of the two
  • -i, --ids <string|path> required - A comma separated list of ids ("123,345,678") or a path to a file with ids - one id pr. line
  • -d, --destination <directory> required - The directory where the result should be stored
  • -p, --filename <string> required - The name of the result file
  • -z, --development-mode optional - Should logging data be printed to the stdout

Example

$ npx -p @chcaa/twitter-academic-search@latest conversations -k KEY:SECRET -i "1234,3444" -d "/data/twitter" -p "corona-tweets"

Hint

For large sets of id's it is advisable to split the list into smaller chunks of e.g. 100.000 id's pr. file or even smaller. Then if something goes wrong (the v2 of the API i sometimes unstable) it is easier to just start over instead figuring out where to resume from. Furtermore this approach makes it possible to use multiple access tokens for faster hydration - just start a fetch for every access token available and divide the id's between them.