2.0.0 • Published 7 years ago

streamsplit v2.0.0

Weekly downloads
6
License
MIT
Repository
github
Last release
7 years ago

streamsplit

Travis npm npm npm Coverage Status node

streamsplit. is a Reactive Programming (rx) module which provides a straight-forward and both memory & CPU efficient way to split readable streams into chunks of data.

This module was built with the main goal of splitting a large MySQL dump .sql file into multiple files (one per each table). ( npm -> mysqldumpslit )

Thanks to the great streamsearch library which uses the Boyer-Moore-Horspool algorithm (the same used in the Unix grep command) the token search is blazing fast and have a very low memory footprint.

This library contains several splitters that will emit an item each time a chunk between the split token is found.

The Module is written in ES6 transpiled using the wonderful Babel library.

Requirements

  • node.js - v0.12 or newer!

Installation

npm install streamsplit

Basic Usage

Base splitter

import StreamSplit from 'streamsplit';

const readStream = getReadableStreamSomehow();
const splitToken = '\n';

StreamSplit.split({
  stream: readStream,
  token: splitToken
}).subscribe(
  range => {
    console.log(`start: ${range.start}, end: ${range.end}`);
  }
);

Assuming readStream contains a\nbb\nc, the code above will output:

start: 0, end: 1
start: 2, end: 4
start: 5, end: 6

Buffer splitter

The Buffer splitter uses the Base splitter but instead of emitting ranges of start+end it will provide a buffer for each chunk. Using it is pretty straight forward:

import toBuffSplit from 'streamsplit/dist/splitters/tobuffer';

const readStream = getReadableStreamSomehow();
const splitToken = '\n';

toBuffSplit({
  stream: readStream,
  token: splitToken
}).subscribe(
  buf => {
    console.log(`Buf: '${buf.toString('utf8')}'`);
  }
);

Assuming readStream contains a\nbb\nc, the code above will output:

Buf: 'a'
Buf: 'bb'
Buf: 'c'

Note: The buffer splitter is memory efficient but if you expect large chunk of data between every split token avoid using it :)

JSON Splitter

Sometimes you have a stream of data which contains several JSON items. for example, if you've a stream "containing":

{"iam": "great"}
{"and": "you"}
{"know": "it"}

Aka a stream containing a json per each line; you could parse it using:

import toJsonSplit from 'streamsplit/dist/splitters/tojson';

const readStream = getReadableStreamSomehow();
const splitToken = '\n';

toJsonSplit({
  stream: readStream,
  token: splitToken
}).subscribe(
  obj => {
    console.log(`Obj: ',obj);
  }
);

And you'll get it parsed for you.

Need something different?

Thanks to the powerful Rx operators you can manipulate the data the way you prefer.

Lets say you've a very magical stream of data consisting in base64 encoded PNGs separated by a fictional (but magical) separator -LOL-LIMEWIRE+NYAN-CAT-, you could do the following to get the original image data back

import toBuffSplit from 'streamsplit/dist/splitters/tobuffer';

const readStream = getReadableStreamSomehow();
const splitToken = '-LOL-LIMEWIRE+NYAN-CAT-';

toBuffSplit({
  stream: readStream,
  token: splitToken
})
  .map(base64Buf => new Buffer(base64Buf, 'base64'))
  .subscribe(
    imageData => {
      // imageData contains a single image contente buffer.
    }
  );

Lets say youve a stream of mixed data separated by ";". Every 'piece' might contain:

  • a number
  • a string

Ex: '10;lol;haha;13;12;love you!;....

And you want to fetch only the even numbers; then:

import toBuffSplit from 'streamsplit/dist/splitters/tobuffer';

const readStream = getReadableStreamSomehow();

toBuffSplit({
  stream: readStream,
  token: ';'
})
  .map(buf => buf.toString('utf8')) // convert buf to string!
  .filter(item => !isNaN(item)) // filter out strings!
  .map(parseFloat) // remap item from "string representation of a number" to Number
  .filter(n => n % 2 == 0) // filter out odd numbers.
  .subscribe(
    evenNumber => {
      // ..
    }
  );

If you're unfamiliar with the RX world I suggest you take a look at Getting started with Rx