@chcaa/twitter-academic-search NPM

Twitter Academic Search

A tool for using the twitter v2 api

Installation

Install node.js version 16.x or higher

Usage

Run one of the following:

$ npx -p @chcaa/twitter-academic-search@latest search -h

$ npx -p @chcaa/twitter-academic-search@latest conversations -h

$ npx -p @chcaa/twitter-academic-search@latest hydrate -h

Local Usage

Clone this repository
Navigate to the root of the repository and run

$ npm install

You can now the run files directly by navigating to the .src/cli of the repository and run one of the following depending on what you want to do (will display the help manual).

$ node search -h

$ node conversations -h

$ node hydrate -h

Search

Search the full twitter history using the twitter query language.

CLI options

-k, --api-credentials <apiCredentials> required - The twitter API bearerToken OR the consumerKey and consumerSecret as a string in the format KEY:SECRET OR a file containing one of the two
-q, --query <query> required - The twitter search query in the twitter query language
-d, --destination <directory> required - The directory where the result should be stored
-p, filename <string> optional - The name of the result file
-f, --from <date> required - The date to fetch data from in the format "yyyy-mm-dd" or a timestamp in ms
-t, --to <date> required - The date to fetch data to (included) in the format "yyyy-mm-dd" or a timestamp in ms. To fetch a single day this should be the same as "from" i using the "yyyy-mm-dd" format
-w, --resume-token <string> optional - Resume token used to resume if e.g. an error occurred or the fetch was aborted
-z, --development-mode optional - Should logging data be printed to the stdout

Example

$ npx -p @chcaa/twitter-academic-search@latest search -k KEY:SECRET -q "war OR peace" -d "/data/twitter" -p "war_peace" -f "2020-10-10" -t "2021-01-01"

Conversations

Fetch full conversations based on a conversation id.

CLI options

-k, --api-credentials <apiCredentials> required - The twitter API bearerToken OR the consumerKey and consumerSecret as a string in the format KEY:SECRET OR a file containing one of the two
-i, --ids <string|path> required - A comma separated list of ids ("123,345,678") or a path to a file with ids - one id pr. line
-d, --destination <directory> required - The directory where the result should be stored
-f, --from <date> required - The date to fetch data from in the format "yyyy-mm-dd" or a timestamp in ms
-t, --to <date> required - The date to fetch data to (included) in the format "yyyy-mm-dd" or a timestamp in ms. To fetch a single day this should be the same as "from" i using the "yyyy-mm-dd" format
-z, --development-mode optional - Should logging data be printed to the stdout

Example

$ npx -p @chcaa/twitter-academic-search@latest conversations -k KEY:SECRET -i "1234,3444" -d "/data/twitter" -f "2020-10-10" -t "2021-01-01"

Hydrate

Hydrate a list of tweet id's to their full json-form.

CLI options

-k, --api-credentials <apiCredentials> required - The twitter API bearerToken OR the consumerKey and consumerSecret as a string in the format KEY:SECRET OR a file containing one of the two
-i, --ids <string|path> required - A comma separated list of ids ("123,345,678") or a path to a file with ids - one id pr. line
-d, --destination <directory> required - The directory where the result should be stored
-p, --filename <string> required - The name of the result file
-z, --development-mode optional - Should logging data be printed to the stdout

Example

$ npx -p @chcaa/twitter-academic-search@latest conversations -k KEY:SECRET -i "1234,3444" -d "/data/twitter" -p "corona-tweets"

Hint

For large sets of id's it is advisable to split the list into smaller chunks of e.g. 100.000 id's pr. file or even smaller. Then if something goes wrong (the v2 of the API i sometimes unstable) it is easier to just start over instead figuring out where to resume from. Furtermore this approach makes it possible to use multiple access tokens for faster hydration - just start a fetch for every access token available and divide the id's between them.

got delay luxon lodash commander await-lock oauth-1.0a n-readlines twitter-lite

@everything-registry/sub-chunk-166

1.0.0

2 years ago