tokenz v0.1.1
tokenz
Text walking/tokenizer library
- Lightweight (almost zero-dep)
- Performant
- Versatile
- Only works with String (for now)
Install
npm/yarn/pnpm install tokenz
Example usage
import { TextWalker } from 'tokenz'
const walker = new TextWalker(text)
const walker = new TextWalker(html)
const tokens = []
while (!walker.isEnd()) {
const token = walker.walk([
() => endTag(walker),
() => startTag(walker),
() => text(walker),
])
if (token) {
tokens.push(token)
}
}
return tokensAPI
walker.isEnd
walker.isEnd(): Boolean
Returns whether end of text has been reached.
const walker = new TextWalker('asd')
walker.isEnd() // => false
walker.skip(3)
walker.isEnd() // => truewalker.peek
walker.peek([pos: Number, [count: Number]]): String
pos defaults to 0.count defaults to 1.
Returns characters starting from pos relative to current walker's position,
and as many as count or until end of text is reached.
const walker = new TextWalker('asd')
walker.peek() // => 'a'
walker.peek(1) // => 's'
walker.peek(0, 2) // => 'as'walker.match
walker.match(strs: Array<String>, [count: Number]): Boolean
count defaults to 1.
Returns whether any string in strs matches the walker text in its current position.
const walker = new TextWalker('asd')
walker.match('a') // => true
walker.match('b') // => false
walker.match('as') // => true
walker.match('asde') // => falsewalker.read
walker.read([count: Number]): String
count defaults to 1.
Returns characters from walker.pos
and as many characters as count (or less if end of text is reached).
Increments walker.pos by count.
const walker = new TextWalker('asd')
walker.read() // => 'a'
walker.read(2) // => 'sd'walker.readUntil
walker.readUntil(strs: Array<String>|String): String
Returns all characters from walker.pos (or less if end of text is reached)
until the first string on strs is found, or end of text is reached.
Increments walker.pos by the amount of characters returned.
const walker = new TextWalker('abcbd')
walker.readUntil(['b', 'c']) // => 'a'
walker.readUntil('b') // => 'bc'walker.readUntilNot
walker.readUntilNot(strs: Array<String>|String): String
Returns all characters from walker.pos (or less if end of text is reached)
until either no match fro strs is found, or end of text is reached.
Increments walker.pos by the amount of characters returned.
const walker = new TextWalker('112233')
walker.readUntilNot(['1', '2']) // => '1122'
walker.readUntilNot('3') // => ''walker.skip
walker.skip([count]: Number)
Like walker.read but it doesn't return a String.
const walker = new TextWalker('asd')
walker.skip()
walker.peek() // => 's'
walker.skip(2)
walker.isEnd() // => truewalker.skipUntil
walker.skipUntil(strs: Array<String>|String)
Like walker.readUntil but it doesn't return a String.
walker.skipUntilNot
walker.skipUntilNot(strs: Array<String>|String)
Like walker.readUntilNot but it doesn't return a String.
walker.walk
walker.walk(tokenizers: Array<Function>)
Iterates over tokenizers executing them sequentially.
Said iteration stops with the first return value that isn't null|undefined.
This return value is also returned by walker.walk.
After each tokenizer execution that returns null or undefined,
walker.pos is rolled back to the value it had when walker.walk was called.
This allows each tokenizer to move walker.pos at will, but being able to cancel
at any time by return null|undefined.