1.0.1 • Published 1 year ago

syntactics v1.0.1

Weekly downloads
-
License
MIT
Repository
github
Last release
1 year ago

Syntactics

GitHub Workflow Status GitHub license npm semantic-release: gitmoji Twitter

Syntactics is a monadic bottom-up parser combinator library.

Contents

Features

Planned Features

  • Exports algorithms for parsing JavaScript strings and UTF-8 byte sequences.
  • Exposes a rich set of parser combinators for quickly prototyping languages.
  • Allows specifying error recovery strategies to create error-tolerant parsers.

Examples

API Reference

Represents a parser which consumes tokens of type T and returns either errors of type E or values of type A.

Returns a parser which always succeeds. The parser returns the given value. It doesn't consume any input tokens.

Type Declaration

const pure: <T, E, A>(value: A) => Parser<T, E, A>;

Use Cases

The pure function is commonly used to create optional parsers. For example, if we have a parser for a plus or a minus sign called sign, then we can make it optional as follows.

const optionalSign = alt(pure(""), sign);

The pure function is also commonly used with the read function to accept specific tokens. For example, here's a parser which accepts digits and rejects everything else.

const digit = read((character) =>
  "0123456789".includes(character)
    ? pure(character)
    : fail({ expected: "digit", received: character })
);

Sequences two parsers. The second parser can depend upon the output of the first parser. Hence, it's more powerful than the map function. However, it's also more difficult to use. Always prefer using the map function instead of the bind function for parsing context-free languages.

Type Declaration

const bind: <T, E, A, B>(
  parser: Parser<T, E, A>,
  arrow: (value: A) => Parser<T, E, B>
) => Parser<T, E, B>;

Use Cases

Used for context-sensitive parsing. For example, given a function repeat than can repeat a parser a specified number of times, we can create a parser for the context-sensitive language $a^nb^nc^n$ using bind.

const abcCount = (n: number) =>
  alt(
    bind(a, () => abcCount(n + 1)),
    map(() => n, repeat(b, n), repeat(c, n))
  );

const abc = abcCount(0);

Sequences and transforms the results of multiple parsers. The input parsers are independent of each other. Hence, it's less powerful than bind. However, it's much easier to use. Always prefer using the map function instead of the bind function for parsing context-free languages.

Type Declaration

type Parsers<T, E, A> = {
  [K in keyof A]: Parser<T, E, A[K]>;
};

const map: <T, E, A extends unknown[], B>(
  morphism: (...a: A) => B,
  ...parsers: Parsers<T, E, A>
) => Parser<T, E, B>;

Use Cases

Used for sequencing parsers. For example, given parsers for parsing identifiers, arbitrary text, and expressions, we can create a parser for declarations.

const makeDeclaration = (name, _equals, expr) => ({
  type: "declaration",
  name,
  expr
});

const declaration = map(makeDeclaration, identifier, text("="), expression);

Combines multiple parsers non-deterministically. The resultant parser executes the input parsers in parallel and without backtracking. The order of the input parsers doesn't matter. It can also return multiple results for ambiguous grammars.

Type Declaration

const alt: <T, E, A>(...parsers: Parser<T, E, A>[]) => Parser<T, E, A>;

Use Cases

Used for selecting parsers non-deterministically. For example, given parsers for expressions and declarations, we can create a parser that can parse either expressions or declarations.

const eitherExpressionOrDeclaration = alt(expression, declaration);

Returns a parser which always fails. The parser returns the given error. It doesn't consume any input tokens.

Type Declaration

const fail: <T, E, A>(error: E) => Parser<T, E, A>;

Use Cases

The fail function is also commonly used with the read function to reject specific tokens. For example, here's a parser which accepts digits and rejects everything else.

const digit = read((character) =>
  "0123456789".includes(character)
    ? pure(character)
    : fail({ expected: "digit", received: character })
);

Returns a parser which consumes a single input token and applies the input function to this token. The input function can decide whether to accept the token, reject the token, or continue parsing more input tokens.

Type Declaration

const read: <T, E, A>(arrow: (token: T) => Parser<T, E, A>) => Parser<T, E, A>;

Use Cases

Reading and parsing tokens from the input stream. For example, here's a parser for the keyword if.

const keywordIf = read((char1) => {
  if (char1 !== "i") return fail({ expected: "i", received: char1 });
  return read((char2) => {
    if (char2 !== "f") return fail({ expected: "f", received: char2 });
    return pure("if");
  });
});

Returns an object of mutually-recursive parsers. The input of the fix function is an object of combinators. The fix function feeds the output of all the combinators, which is collected as an object of mutually-recursive parsers, to each of the combinators. Kind of like a dragon eating its own tail.

Self reference, symbolized as the Ouroboros Dragon, allows us to define recursive and mutually-recursive parsers. The fix function also allows you to define left-recursive parsers.

Type Declaration

type Parsers<T, E, A> = {
  [K in keyof A]: Parser<T, E, A[K]>;
};

type Combinators<T, E, A> = {
  [K in keyof A]: (parsers: Parsers<T, E, A>) => Parser<T, E, A[K]>;
};

const fix: <T, E, A extends {}>(
  combinators: Combinators<T, E, A>
) => Parsers<T, E, A>;

Use Cases

Defining recursive and mutually-recursive parsers. For example, given parsers for numbers and arbitrary text we can define parsers for expressions, terms, and factors.

const makeAdd = (left, _plus, right) => ({ type: "add", left, right });

const makeMul = (left, _times, right) => ({ type: "mul", left, right });

const makeGroup = (_left, expr, _right) => expr;

const { expression } = fix({
  expression: ({ expression, term }) =>
    alt(term, map(makeAdd, expression, text("+"), term)),
  term: ({ term, factor }) =>
    alt(factor, map(makeMul, term, text("*"), factor)),
  factor: ({ expression }) =>
    alt(number, map(makeGroup, text("("), expression, text(")")))
});

Represents the result of a parser. It can either contain zero or more errors of type T, or one or more parsed values of type R.

Type Declaration

interface Cons<A> {
  head: A;
  tail: List<A>;
}

type List<A> = Cons<A> | null;

type ParserResult<E, R> =
  | { success: false; errors: List<E> }
  | { success: true; values: Cons<R> };

Represents a possible continuation of the parsing process at a given point. A non-empty list of continuations can be given to the next function to continue the parsing process from that point.

Represents the state of the parsing process at a given point. It contains the result of the parsing process at that point. It also contains the list of continuations of the parsing process from that point.

Type Declaration

interface Cons<A> {
  head: A;
  tail: List<A>;
}

type List<A> = Cons<A> | null;

interface ParseState<T, E, R> {
  result: ParserResult<E, R>;
  continuations: List<Continuation<T, E, R, T>>;
}

Starts the parsing process and returns the first parse state.

Type Declaration

const begin: <T, E, R>(parser: Parser<T, E, R>) => ParseState<T, E, R>;

Use Cases

The begin function is used to start the parsing process.

const parse = <E, R>(
  parser: Parser<string, E, R>,
  input: string
): ParserResult<E, R> => {
  let state = begin(parser);

  for (const char of input) {
    const { result, continuations } = state;
    if (continuations === null) return result;
    state = next(continuations, char);
  }

  return state.result;
};

Continues the parsing process and returns the next parse state. The list of non-empty continuations specify where to continue the parsing process from. In order to continue the parsing process, the next token from the input stream needs to be given to the next function.

Type Declaration

const next: <T, E, R>(
  continuations: Cons<Continuation<T, E, R, T>>,
  token: T
) => ParseState<T, E, R>;

Use Cases

The next function is used to continue the parsing process.

const parse = <E, R>(
  parser: Parser<string, E, R>,
  input: string
): ParserResult<E, R> => {
  let state = begin(parser);

  for (const char of input) {
    const { result, continuations } = state;
    if (continuations === null) return result;
    state = next(continuations, char);
  }

  return state.result;
};