0.1.1 • Published 4 years ago

textclass v0.1.1

Weekly downloads
5
License
Apache-2.0
Repository
github
Last release
4 years ago

TextClass // Text classification algorithms

With TextClass, you get a text classification library with zero dependencies.

Example use cases:

  • Spam-filter
  • Potential insult checking
  • Simple ChatBot
  • Self-learning bot applications
  • Text sample detection

How the algorithms work

The algorithms in TextClass work slightly differently than, for example, Bayesian classification. They are also not based on neural networks or some kind of AI. TextClass works with a points system. So the inputs will be tokenized to only words in lower case without any special chars. In the "weighted" classes, the tokens are counted according to their frequency. Then on use of the class it will check and calculate the results using frequencies and weightings to rate the texts. The difference between normal and "weighted" classes is that with "weighted" classes, the frequency and value of importance does also matter, not just the number of matches. A big difference to other classifications like Bayes is that TextClass needs less but more accurate training data to be more accurate. If there is a lot of data, I recommend the use of "weighted" classes, because they show the importance of the tokens. Furthermore, the "single" classes do not need any additional training data for comparing. That means, if you build a spam filter, you don't need even spam AND normal data for training. You only need the spam data to train. And thats how TextClass works!

Examples

Simple greeting detection

let tcs = new TextClass.Single();

tcs.learn('Hello there!');
tcs.learn('Hi');
tcs.learn('Welcome everyone');
tcs.learn('Hey how are you?');

let result1 = tcs.run('Hi, how are you?');
/*
{
  inputTokens: [ 'hi', 'how', 'are', 'you' ],
  matchedTokens: [ 'hi', 'how', 'are', 'you' ],
  result: { confidence: 1, percent: 100 }
}
*/

let result2 = tcs.run('I want to have food');
/*
{
  inputTokens: [ 'i', 'want', 'to', 'have', 'food' ],
  matchedTokens: [],
  result: { confidence: 0, percent: 0 }
}
*/

Classes

TextClass.Single()

Most simple and primitive classification. The system only checks whether the tokens are present in the model and uses them to count and calculate the points for the result.

Methods:

learn(text: string)

Train the model of your instance with text

run(text: string)

Classify text and get results

Returns:

let result = {
    inputTokens: Array,
    matchedTokens: Array,
    result: {
        confidence: Number,
        percent: Number
    }
}

Or if no data is in model it returns null.

Properties:

model: Array

The model. In this TextClass, its just an array with all tokens.

TextClass.SingleWeighted()

More advanced classification. It calculates the importance of every token in the model and can deliver more accurate results on heavy data.

Methods:

learn(text: string)

Train the model of your instance with text

run(text: string)

Classify text and get results

Returns:

let result = {
    inputTokens: Array,
    matchedTokens: Array,
    result: {
        confidence: Number,
        percent: Number
    }
}

Or if no data is in model it returns null.

Properties:

model: Object

The model. In this TextClassWeighted, it looks like this:

let model = {
    tokens: Object,
    importantTokens: Object,
    processed: Boolean
}

TextClassMulti

Comming soon

TextClassMultiWeighted

Comming soon

WordClass

Comming soon