doken v1.0.2
doken 
A minimalistic, general purpose tokenizer generator.
Usage
Use npm to install:
$ npm install dokenImport doken and create a tokenizer with regular expression rules:
const {createTokenizer, regexRule} = require('doken')
const tokenizeJSON = createTokenizer({
rules: [
regexRule('_whitespace', /\s+/y, {lineBreaks: true}),
regexRule('brace', /[{}]/y),
regexRule('bracket', /[\[\]]/y),
regexRule('colon', /:/y),
regexRule('comma', /,/y),
regexRule('string', /"([^"\n\\]|\\[^\n])*"/y),
regexRule('number', /(-|\+)?\d+(.\d+)?/y),
regexRule('boolean', /(true|false)\b/y),
regexRule('null', /null\b/y)
]
})
let tokens = tokenizeJSON(`{"a": "Hello World!"}`)
console.log([...tokens])API
Rule object
A rule object contains the following fields:
type<string>- The type of the token this rule generates. Iftypestarts with an underscore_, the token will not be emitted by the tokenizer.lineBreaks<boolean>(Optional) - Set this property totrueif this rule might match line breaks and you want to track it correctly.match<Function>- A function with the following signature:(input: string, position: number) => null | { length: number, value: any }This function will try to get the token of given
typeat givenpositionininputif applicable. Returnnullifinputatpositionis not a token with the giventype, otherwise return an object.The first
lengthcharacters ofinputafterpositionwill be the matched token. You can optionally return avaluewhich can contain any data that will be attached to the token. Ifvalueis not given, it will default to the firstlengthcharacters ofinputafterposition.
Token object
A token will be represented by an object with the following fields:
type<string>|null- The type of the token ornullif no given rules match the input.length<number>- The length of the token.value<any>- Thevaluegenerated by the rule.pos<number>- The zero-based position of the first character of the token.row<number>- The zero-based row of the first character of the token.col<number>- The zero-based column of the first character of the token.
doken.createTokenizer(options)
options<object>rules<Array<Rule>>strategy'first' | 'longest'(Optional) - Default:'first'
- Returns:
<Function>
Generates a tokenize function with the following signature:
(input: string) => IterableIterator<Token>This function will attempt to tokenize given input, yielding
tokens matched by given rules one by one.
Set strategy to 'longest' to match the token with the rule that matches the
most characters instead of using the rule that matches first.
doken.regexRule(type, regex[, options])
type<string>regex<RegExp>options<object>lineBreaks<boolean>(Optional) - Set this property totrueif this rule might match line breaks and you want to track it correctly.value<Function>(Optional) - A function for calculating the token value out of the match.condition<Function>(Optional) - A function for indicating whether to discard match or not.
- Returns:
<Rule>
Returns a rule that attempts to match input string with the
given regex.
value can be set to a function (match: RegExpExecArray) => any. The
generated token will have the returned value as value.
condition can be set to a function (match: RegExpExecArray) => boolean.
Return false to indicate to discard matched result and go on with the next
rule.
5 years ago
6 years ago
6 years ago
6 years ago
6 years ago
6 years ago
6 years ago
6 years ago
6 years ago
6 years ago
6 years ago