0.3.5 • Published 2 years ago

@sudo-nymd/text-parser v0.3.5

Weekly downloads
-
License
MIT
Repository
github
Last release
2 years ago

A Simple Parser of Text

A simple, yet flexible library for extracting specialized tokens out of text. After maintaining a library of Regular Expressions for years, I was inspired to build a number of parsers after taking Dmitry Soshnikov's excellent Udemy course Building a Parser from Scratch.

Getting Started

Install the Package

npm i @sudo-nymd/text-parser

The Parser

The Parser is the root of the API. The Parser implemented a parse() method that will return parsed tokens with metadata about the text found.

The following code...

const { Parser } = require('@sudo-nymd/text-parser');

const text = `Sudo-Nymd's "text-parser"!`
const parsed = new Parser().parse(text);
console.log(parsed);

... produces the following output:

[
  { type: 'word', flags: 48, value: "Sudo-Nymd's" },
  { type: 'whitespace', flags: 0, value: ' ' },
  {
    type: 'phrase',
    flags: 3,
    startChar: { type: 'character', value: '"' },
    items: [
            {
                "type": "word",
                "flags": 32,
                "value": "text-parser"
            }
        ],
    value: 'text-parser',
    stopChar: { type: 'character', flags: 0, value: '"' }
  },
  { type: 'punctuation', flags: 0, value: '!' }
]

If you'd prefer the token ASAP, you can specify a callback to be notified whenever a token is parsed:

const { Parser } = require('@sudo-nymd/text-parser');

const text = `Sudo-Nymd's "text-parser"!`
new Parser.parse(text, (token) => {
    // We got a token, use it!
    console.log(token);
});

The Tokenizer

COMING SOON

Plugins

Keywords Plugin

COMING SOON

Date Plugins

COMING SOON


Grammer

The grammer of the parser is simple, and is outlined below.

Line

A single line of text composed of one or more Literals.

: Literals

Literals

A collection of one or more Word, Phrase, Character, Punctuation, WhiteSpace, or Plugin.

: (Word | Phrase | Character | Puncuation | Whitespace | Plugin) +

Examples:

The quick, [brown fox] jumped over the "lazy dog", and the cow jumped over the {moon}!

The precending Literal consistes of 3 Phrases (brown fox, "lazy dog", and {moon}), 3 Punctuation (2 commas and 1 exclamation point), 11 Words, and 13 Whitespaces.

Phrase

A collection of Words, Characters, or Whitespace enclosed by a StartChar and a StopChar.

: StartChar
: (Word | Character | Whitespace) +
: EndChar

Examples:

"The snow is falling"
[Build Completed]
{Start}
'Mission Success'

PhraseCharacter

StartChar

One or more repeating characters that signify the start of a Phrase. Includes double quote, single quote, open brace, and open bracket.

: ( { | [ | " | ' )+

EndChar

One or more repeating characters that signify the end of a Phrase. Includes double quote, single quote, close brace, and close bracket.

: ( } | ] | " | ' )+

Word

Any single word.

: ([\w]+(?:.['-]?[\w]+)*)

Whitespace

Any whitespace

: [\s]+

Character

Any character that is not alpha-numeric, whitespace, or a phrase start or stop character.

: [^a-zA-Z0-9{}\\[\\]"']
0.3.5

2 years ago

0.3.4

2 years ago

0.3.3

2 years ago

0.3.2

2 years ago

0.3.1

2 years ago

0.3.0

2 years ago

0.2.4

2 years ago

0.2.3

2 years ago

0.2.2

2 years ago

0.2.1

2 years ago

0.2.0

2 years ago

0.1.5

2 years ago

0.1.4

2 years ago

0.1.3

2 years ago

0.1.2

2 years ago

0.1.1

2 years ago

0.1.0

2 years ago