@conartist6/rd-parse v4.0.0-rc.0
@conartist6/rd-parse
A generic, minimalist, recursive-descent parser which allows you to define your grammar directly in Javascript in a syntax closely related to EBNF. Avoiding the need to compile the grammar makes the resultant parser compact and quick to initialize.
You will be bound by the limitations of recursive descent parsing, for example you must remove left recursion in your grammar.
Usage
npm i rd-parse # or
yarn add rd-parseOnce installed you can import the Parser class and the grammar combinators and build your parser. Here is an example which uses rd-parse to recognize a subset of valid JSON.
const {
Parser,
Node,
All,
Any,
Star,
Y,
} = require('@conartist6/rd-parse');
// `Y` lets us use a function recursively before we're finished defining it
const dict = Y((dict) => {
// Match empty space and do not use the result
const _ = Ignore(/\s*/);
const string = Node(
All('"', /[^"]*/, '"'),
([text]) => text,
);
const key = string;
// `dict` is the top level grammar returned below
const value = Any(string, dict);
const entry = Node(
All(key, _, ':', _, value),
(entry) => entry,
);
// Define `dict` as the object built from [key, value] entries
return Node(
All('{', _, Star(entry), _, '}'),
Object.fromEntries,
);
});
const dictParser = new Parser(dict);
dictParser.parse('{ "foo": "bar" }');
// { foo: 'bar' }
dictParser.parse('{ "foo": { "bar": "baz" } }');
// { foo: { bar: 'baz' }}Grammar
The main task in using rd-parse is to define a grammar. A grammar is made up of rules, where a rule may be a token or a combinator. Tokens can be represented as string literals or regex literals, and each token is itself a grammar. For example the /\w/ token is a grammar that matches any single lowercase latin letter.
Making a useful grammar is a process of combining token grammars using combinators such as All, Optional, and Plus. Each combinator returns a new grammar derived from an input grammar or grammars according to the rules documented below:
'foo'matches if the charactersfooare next in the input/[0-9]/matches if the next character in the input is a digitAll(...grammars)matches when allgrammarsmatchAny(...grammars)matches when anygrammarofgrammarsmatchesOptional(grammar)always matchesIgnore(grammar)matches whengrammarmatchesPlus(grammar)matchesgrammarrepeated one or more timesStar(grammar)matchesgrammarrepeated zero or more times
The last step of parsing - extracting and AST from the grammar - is done with the special Node combinator, which takes a (stack) => node callback. The stack is built up by tokens and combinators, and is always transformed to a single value (usually an object) by Node. The specifics of how the stack grows are documented below:
'foo'does not add anything to the stack/[0-9]/addsmatch[0]to the stack, ornullAll(...grammars)concatenates stacks from allgrammarsAny(...grammars)uses stack from first matching grammar ofgrammarsOptional(grammar)adds to the stack whengrammarmatchesIgnore(grammar)never adds to the stackPlus(grammar)concatenates stacks from all repetitions ofgrammarStar(grammar)concatenates stacks from all repetitions ofgrammar
Fork
This project is forked from dmaevsky's rd-parse. I wanted the freedom to tweak several aspects of the upstream library at once.
In particular the documentation in this version has been rewritten and several notable changes have been made to the API:
- Undocumented exports are removed, particularly
START, andUse. - Regex token changes:
- Capture groups no longer have meaning. Instead the full match result is used.
- required
^is now prepended to expression for you
Ignorerenamed toIgnoreBetween.Ignoreis a new rule type.
5 years ago