1.0.1 • Published 5 years ago

streaming-markov-chain-builder v1.0.1

Weekly downloads
2
License
MIT
Repository
github
Last release
5 years ago

streaming-markov-chain-builder

Gitlab pipeline status (self-hosted) npm bundle size npm

streaming-markov-chain-builder is a Markov chain builder that accepts input text as a stream and outputs a stream of n-grams.

Installation

npm i --save streaming-markov-chain-builder

Usage

const { MarkovBuilder } = require('streaming-markov-chain-builder')

const builder = MarkovBuilder({
  // number of "context words" to add to each individual word
  // defaults to 1
  order: 1

  // optional - return true if this `word` should be considered 'proper'
  // see `src/is-proper.ts` for the default implementation, exported as { isProperFn }
  isProperFn: (word) => { return _.isUpperCase(word[0]) }

  // optional - given a single line, return a list of sub-sentences
  // see `src/sentence-splitter.ts` for the default implementation, exported as { sentenceSplitterFn }
  sentenceSplitterFn: (line) => { return line.split(/[\.\?!]/g) }
})

// now, you can start ingesting data by writing it...
builder.write('the quick brown fox jumped over the lazy dog')

// or by streaming it in from a file...
fs.createReadStream('/tmp/corpus.txt').pipe(builder)

// since MarkovBuilder is a Transform stream, you can consume the output by reading from it...
builder.on('data', (ngram) => {
  // see below for structure of the `ngram`
})

// you can also pipe the Transform stream to a consumer that accepts object-mode streams
builder.pipe(storage)

Ngram structure

export type MarkovNgram = {
  // list of the words in this ngram
  ngram: string[],
  // is this a sentence starter?
  sentenceStart: boolean,
  // is this a sentence ender?
  sentenceEnd: boolean,
  // is ngram[0] proper?
  startsWithProper: boolean
}