treebank2svg
Transform a directory of conll files (treebank) into a directory of svg files.
Transform a directory of conll files (treebank) into a directory of svg files.
Merge multiple sentiment libraries for better sentiment analysis
A core type to handle CoNLL-U format
A CJK text tokenizer
A Castor theme to visualize a synthesis on a corpus, using pies and histograms, based on SB Admin v2.0
Some classes to represent elements in a text corpus.
Text mining library
Feature hashing, also known as the hashing trick, a fast and space-efficient way of vectorizing features.
Calculate how many documents contain a certain term, within a list (`Array`) of text documents.
The text of Moby Dick by Herman Melville.
The text of Moby Dick by Herman Melville.
State of the Union addresses by U.S. Presidents.
State of the Union addresses by U.S. Presidents.
Spam Assassin public mail corpus.
Spam Assassin public mail corpus.