conllu-core
A core type to handle CoNLL-U format
A core type to handle CoNLL-U format
A CJK text tokenizer
Transform a directory of conll files (treebank) into a directory of svg files.
Merge multiple sentiment libraries for better sentiment analysis
A Castor theme to visualize a synthesis on a corpus, using pies and histograms, based on SB Admin v2.0
Some classes to represent elements in a text corpus.
Feature hashing, also known as the hashing trick, a fast and space-efficient way of vectorizing features.
Calculate how many documents contain a certain term, within a list (`Array`) of text documents.
Text mining library
The text of Moby Dick by Herman Melville.
The text of Moby Dick by Herman Melville.
State of the Union addresses by U.S. Presidents.
State of the Union addresses by U.S. Presidents.
Spam Assassin public mail corpus.
Spam Assassin public mail corpus.
IFCT 2017 Corpus