js-tokeniser v1.1.2
tokeniser
Why another tokenizer?
I needed a tokenizer and tried some of the existing, but they missed some features I needed:
- node-tokenizer doesn't return information about the token types it finds, only the text.
- tokenizer-array returns this information but has a strange way of finding tokens: it searches by bisecting the text, but this doesn't find all tokens, at least in my case.
Since the task of creating a tokenizer doesn't seems too hard, I decided to create an own one - here it is.
Requirements
The only requirement is to use ES6. It makes code easier to read (in my opinion), you might need a translator (like Babel if you want to use it in some browsers or with older Node versions (time to update?), but it is simply the future.
Installation
npm i -S js-tokeniserUsage
const tokeniser = require('js-tokeniser')
let result = tokeniser('// Test file\nName = Test\nAuthor = "Joachim Schirrmacher"',
[
{type: "comment", regex: /^\s*\/\/[^\n]*/},
{type: "definition", regex: /^\s*(\w+)\s*=\s*((?:[^\n]+\\\n)*[^\n]+)/},
]
)
console.log(result)prints the following output:
[ { type: 'comment', matches: [] },
{ type: 'definition', matches: [ 'Name', 'Test' ] },
{ type: 'definition', matches: [ 'Author', '"Joachim Schirrmacher"' ] }
]As you see, you don't only get the three recognised tokens, but also the matches that are found. This makes it easy to handle a lot without defining separate tokens but instead use patterns (like in 'definition').
Updating from version 1.0.x
In previous versions, tokeniser returned the unfiltered result from String.match(), which contains an input and an index attribute as well as the whole match.
These elements are stripped in version beginning at 1.1.0, so if you need this side-effect, be sure to modify your code.