pstrscan v0.1.6
PStringScanner
Overview
PStringScanner is a simple string tokenizer that provides for lexical scanning operations on a string.
It's the third port of the Ruby library into JavaScript. However, where the other ports concentrated on the interface, this one concentrates on speed.
The original Ruby version was written in C, and is very fast. This version, while not in C, is 4-10x faster on short strings (under 32 Kb of characters), and an order of magnitude (100-1000x) faster on large strings (million+ characters) than the other two ports of the same library. All of that accomplished by taking advantage of the traits in JavaScript's Regular Expressions, and not blindly porting from Ruby.
Installation
npm install -g pstrscan
Quick start
Scanning a string means keeping track of and advancing a position (a zero-based index into the source string) and matching regular expressions against the portion of the source string after the position.
var PStrScan = require("pstrscan");
var s = new PStrScan("This is a test");
s.scan(/\w+/); // = "This"
s.scan(/\w+/); // = null
s.scan(/\s+/); // = " "
s.scan(/\s+/); // = null
s.scan(/\w+/); // = "is"
s.hasTerminated(); // = false
s.scan(/\s+/); // = " "
s.scan(/(\w+)\s+(\w+)/); // = "a test"
s.getMatch(); // = "a test"
s.getCapture(1); // = "a"
s.getCapture(2); // = "test"
s.hasTerminated(); // = true
Documentation
The interface should be familiar to those familiar with the original library, and the one originally ported to JavaScript/Node. There are some slight differences, but you should be able to gleam those from the source file.
To Do
- More documentation specific to this implementation.
- Add a more comprehensive
unscan
history/capability.