2.0.0 • Published 8 years ago

blather v2.0.0

Weekly downloads
30
License
MIT
Repository
github
Last release
8 years ago

Blather

Blather is a little js library for generating text with Markov chains.

###A Simple Example

var Blather = require('blather');

var blatherer = Blather();

var fragments = [
    'I love dogs because they are fun',
    'I love cats because they are chill',
    'I wouldn\'t say I love snakes',
    'I do not care for ants',
    'Honestly, I hate snakes because they are disgusting and weird',
    'Zebras, because they live on a different continent than I do, I am indifferent towards',
    'I am a person with a lot of opinions about animals I love and don\'t love'
];

fragments.forEach(function(fragment) {
    blatherer.addFragment(fragment);
});

blatherer.generateFragment(); // 'I am a person with a lot of opinions about animals I love snakes'
blatherer.generateFragment(); // 'I wouldn't say I love dogs because they are fun'
blatherer.generateFragment(); // 'Honestly, I hate snakes because they are chill'

You get the picture. It mangles up the text. Accurately explaining Markov chains is beyond the scope of this README or my capabilities - I just use this to make silly toys and bots.

###More Complicated Examples Blather is pretty flexible in how it chews up your text fragments, and how it spits it back at you.

####split

split is a function that takes a text fragment and splits it into distinct units. By default, it splits only on /\s+/, meaning those units are (roughly) words. If you wanted to get more fine-grained than that, though, you could do something like this:

var blatherer = Blather({
    split: function(text) {
        return text.split('');
    }
});

split takes a string and must return an array.

####depth

depth controls how many units each step of the Markov generation looks at when determining what unit could possibly come next. By default, it's 2 - higher numbers mean a higher likelihood of exactly replicating source fragments, and lower numbers mean more chance for gibberish.

####clean

clean is the last function a generated fragment passes through before coming out. It takes in an array of units and stitches them back together. By default, it just joins using a space, but imagine you were using the custom split defined above - you might want to do something like:

var blatherer = Blather({
    split: function(text) {
        return text.split('');
    },
    clean: function(textArray) {
        return textArray.join('');
    }
}
});

You could also shove custom cleanup logic here if you wanna get fancy - simple grammar checks, punctuation stripping, whatever. As long as it takes in an array of strings and returns a string, you're cool.

####isStart

When adding fragments, Blather stores away certain groups of units as possible fragment-starters. If all your fragments are discrete units of language, like sentences or tweets, the default should do you fine - it just checks if it's the first group of units in the fragment. If you're adding paragraphs at a time, maybe you'd want something closer to this:

var blatherer = Blather({
    isStart: function(fragmentPiece, index) {
        return (fragmentPiece[0].toUpperCase() === fragmentPiece[0]);
    }
});

This would get you all the starts of sentences with capital letters (and all the proper nouns, too, which is a little sloppy), which might be useful. fragmentPiece is a string and index is the index of the piece within the fragment being added. The default function here just checks if index is zero.

####joiner

This one's a little silly. If all is well, it shouldn't matter - it's just an implementation detail. As mentioned above, Blather is operating on arrays of units - as split by the split function and their length determined by depth - but it needs to turn those arrays into strings to use as object keys. joiner is the string it uses to join them.

By default, it is '<|>', but if that exact sequence of characters appears anywhere in your source fragments, it's gonna get fudged up. If you're doing weird things, try changing this, I guess - it really shouldn't be part of the public API, but I felt irresponsible leaving it out. Do what you will with this information.

2.0.0

8 years ago

1.0.4

9 years ago

1.0.3

9 years ago

1.0.2

9 years ago

1.0.1

9 years ago

1.0.0

9 years ago