1.0.5 • Published 4 years ago

pocket-tokenizer v1.0.5

Weekly downloads
-
License
ISC
Repository
-
Last release
4 years ago

Pocket Tokenizer

Simple, pocket-sized javascript tokenizer.

This is a simple, intuitive way to tokenize strings for your projects!

Example code

import Tokenizer from "pocket-tokenizer"

var tokenizer = new Tokenizer();
var code = "console.log(1.0 + 3.0)";
var tokens = tokenizer.exec(code);
for (const token of tokens) {
    console.log(`Name: ${token.name}, Value: "${token.value}"`);
}

/*
The snippet will return:

Name: Identifier, Value: "console"
Name: Default, Value: "."
Name: Identifier, Value: "log"
Name: Default, Value: "("
Name: NumericLiteral, Value: "1.0"
Name: WhiteSpace, Value: " "
Name: ArithmeticOperator, Value: "+"
Name: WhiteSpace, Value: " "
Name: NumericLiteral, Value: "3.0"
Name: Default, Value: ")"
*/

How it works

Basically, the main Tokenizer class will take the user-defined or default rules and compiles into a unique rule, using Regular Expressions (called RegEx). Each rule has the name, Regular Expression to match and priority. The compiled RegEx will be used to tokenize the String and return a iterator, from here you can transform into a Array or iterate with for-of loop, each iteration returns a Object with the original RegEx match, the index, matched name and value.

How can I customize?

Simple! The class has a method called "addTokenRule", so you can just pass a Object describing the name, the target RegEx and the priority. The name and the target RegEx is explained before, but the priority is used to sort the rules by priority, so issues like collisions between rules can be avoided.

Example code:

import Tokenizer from "pocket-tokenizer"

var tokenizer = new Tokenizer();
//Note that you can chain methods.
tokenizer.addTokenRule({
    name: "Foo",
    reg: /foo/,
    priority: 0
}).compile(); // Don't forget to compile!

var code = "console.log(foo - 1e412)";
var tokens = tokenizer.exec(code);
for (const token of tokens) {
    console.log(`Name: ${token.name}, Value: "${token.value}"`);
}

/*
The snippet will return:

Name: Identifier, Value: "console"
Name: Default, Value: "."
Name: Identifier, Value: "log"
Name: Default, Value: "("
Name: Foo, Value: "foo" // The custom rule is matched!
Name: WhiteSpace, Value: " "
Name: ArithmeticOperator, Value: "-"
Name: WhiteSpace, Value: " "
Name: NumericLiteral, Value: "1e412"
Name: Default, Value: ")"
*/

How can I say the priority of a rule?

The priority is inverse-proportional, so the higher the number of the priority, less the chances of being matched, saying so. For example, an rule that matches a Numeric Literal is used before +- arithmetic operators, because of negative values, exponential notation, etc.

So the following code:

1.0 + 4.0 + 1.e+42

Will tokenize the names into something like that:

NumericLiteral,WhiteSpace,ArithmeticOperator,WhiteSpace,NumericLiteral,WhiteSpace,ArithmeticOperator,WhiteSpace,NumericLiteral

Observe that the last number is tokenize to a full Numeric Literal, even if there's a "+" sign in the middle, the reason why is because the rule for Numeric Literals is used before because of the priority.

In case of rules with same priority number, will be order by the order on the list of rules.

Default rules

By default, the Tokenizer class come with some rules, these rules has the max priority number, so they will not conflict with custom rules:

-NumericLiteral

Matches any valid Javascript number literal, including binary, hexadecimal, exponential notation, integer and floating point.

-NewLine

Matches any character that resembles a new line, being LF or CRLF.

-ArithmeticShorthand

Matches JS arithmetic operations by variable assignment shorthands, like "+=", "**=", "%=", etc.

-ConditionalOperator

Matches any JS operation used in conditionals, like "<", "==", "!==", etc.

-Comment and MultilineComment

Comment and MultilineComment matches any block of comment using "//" or "/*something*/".

-Identifier

Before default RegEx, will match any word that can be used as identifier, beign variable name, function name, keywords, etc.

-Default

This is the default match, will match anything else if no other rule is matched.

1.0.5

4 years ago

1.0.4

4 years ago

1.0.3

4 years ago

1.0.2

4 years ago

1.0.1

4 years ago

1.0.0

4 years ago