@sudo-nymd/text-parser v0.3.5
A Simple Parser of Text
A simple, yet flexible library for extracting specialized tokens out of text. After maintaining a library of Regular Expressions for years, I was inspired to build a number of parsers after taking Dmitry Soshnikov's excellent Udemy course Building a Parser from Scratch.
Getting Started
Install the Package
npm i @sudo-nymd/text-parserThe Parser
The Parser is the root of the API. The Parser implemented a parse() method that will return parsed tokens with metadata about the text found.
The following code...
const { Parser } = require('@sudo-nymd/text-parser');
const text = `Sudo-Nymd's "text-parser"!`
const parsed = new Parser().parse(text);
console.log(parsed);... produces the following output:
[
{ type: 'word', flags: 48, value: "Sudo-Nymd's" },
{ type: 'whitespace', flags: 0, value: ' ' },
{
type: 'phrase',
flags: 3,
startChar: { type: 'character', value: '"' },
items: [
{
"type": "word",
"flags": 32,
"value": "text-parser"
}
],
value: 'text-parser',
stopChar: { type: 'character', flags: 0, value: '"' }
},
{ type: 'punctuation', flags: 0, value: '!' }
]If you'd prefer the token ASAP, you can specify a callback to be notified whenever a token is parsed:
const { Parser } = require('@sudo-nymd/text-parser');
const text = `Sudo-Nymd's "text-parser"!`
new Parser.parse(text, (token) => {
// We got a token, use it!
console.log(token);
});The Tokenizer
COMING SOON
Plugins
Keywords Plugin
COMING SOON
Date Plugins
COMING SOON
Grammer
The grammer of the parser is simple, and is outlined below.
Line
A single line of text composed of one or more Literals.
: LiteralsLiterals
A collection of one or more Word, Phrase, Character, Punctuation, WhiteSpace, or Plugin.
: (Word | Phrase | Character | Puncuation | Whitespace | Plugin) +Examples:
The quick, [brown fox] jumped over the "lazy dog", and the cow jumped over the {moon}!The precending Literal consistes of
3 Phrases (brown fox, "lazy dog", and {moon}),
3 Punctuation (2 commas and 1 exclamation point),
11 Words, and
13 Whitespaces.
Phrase
A collection of Words, Characters, or Whitespace enclosed by a StartChar and a StopChar.
: StartChar
: (Word | Character | Whitespace) +
: EndCharExamples:
"The snow is falling"
[Build Completed]
{Start}
'Mission Success'PhraseCharacter
StartChar
One or more repeating characters that signify the start of a Phrase. Includes double quote, single quote, open brace, and open bracket.
: ( { | [ | " | ' )+EndChar
One or more repeating characters that signify the end of a Phrase. Includes double quote, single quote, close brace, and close bracket.
: ( } | ] | " | ' )+Word
Any single word.
: ([\w]+(?:.['-]?[\w]+)*)Whitespace
Any whitespace
: [\s]+Character
Any character that is not alpha-numeric, whitespace, or a phrase start or stop character.
: [^a-zA-Z0-9{}\\[\\]"']