kodetokenizer v0.0.1
KodeTokenizer
Generic source code tokenizer. WIP.
Installation
Via npm on Node:
npm install kodetokenizerUsage
Reference in your program:
var kt = require('kodetokenizer');Given a text, get its content as tokens:
var tokens = kt.getTokens("var myvar = 13;");The result is an array of tokens, each one is a plain JavaScript object with:
value: texttype: a number, from kt.Types
The types are:
kt.Types.Word: a sequence of letterskt.Types.Digits: a sequence of digitskt.Types.WhiteSpace: a sequence of whitespacekt.Types.NewLine: a new line:\n,\r\nor\rkt.Types.Symbol: a sequence of symbol (not a letter, digit, whitespace, new line or separator)kt.Types.Unknownkt.Types.Separator: a character separator
The separators are "language dependend", so you must indicate them in an option object parameter, ie:
var tokens = kt.getTokens("myfun(1,2,3);", { separators: ['(', ')', '{', '}', ',', ';' ]);You can add processors: functions that given an initial character, returns a token:
function stringProcessor(ch, text, position) {
//...
}
var tokens = kt.getTokens("myfun('foo', 'bar');", { processors: { '#': stringProcessor } });The parameter ch is the detected character. position points to a character in text, the next unprocessed one.
The processor can return:
null: no token detected, so the tokenizer takes control again.{ position: anumber, token: atoken }: wherepositionis the new unprocessed char position in text, andtokenis the token to be used
See test/string.js as an example of processor. Note that you can use;
var Types = kt.Types;
Types.String = ++Types.MaxValue;to add your own token types.
Development
git clone git://github.com/ajlopez/KodeTokenizer.git
cd KodeTokenizer
npm install
npm testSamples
TBD
Versions
- 0.0.1: Published
References
TBD
Contribution
Feel free to file issues and submit pull requests — contributions are welcome
If you submit a pull request, please be sure to add or update corresponding
test cases, and ensure that npm test continues to pass.
12 years ago
12 years ago