0.0.3 • Published 7 years ago

preprocess-tweets v0.0.3

Weekly downloads
3
License
MIT
Repository
github
Last release
7 years ago

Preprocessing of tweets

This module can be used for easier preparation of training twitter data. It removes:

  • mentions
  • links
  • emojis
  • keyword RT
  • sentences, which contain single word
  • some special characters

There is an option to filter whether you want to remove URLs, mentions and emojis.

The default option is:

var filter = {
    "mentions": true,
    "links": true,
    "emojis": true
}

For example:

The tweet:

New @Imaginedragons song 'Whatever It Takes' and a new album 'Evolve'. I'm so #excited this song is incredible ❤️ https://t.co/PS9NM4pTBQ

Will become:

New song 'Whatever It Takes' and a new album 'Evolve'. I'm so excited this song is incredible 

Install

npm install preprocess-tweets

Prequisits

The file with the extracted tweets shuold be txt file, containing one tweet per row.

Example

In this example the URLs won't be deleted.

var preprocessing = require('preprocess-tweets')

var file = './originalFile.txt';
var writeFile = './modifiedFile.txt'

var filter = {
    "mentions": true,
    "links": false,
    "emojis": true
}

preprocessing.clean(file, writeFile, JSON.stringify(filter))

The result will be new file, containing the modified tweets.