Teg-parser NPM

Teg

WARNING: This is currently in beta as I finalize out the API, write docs, and examples.

Teg is a tiny declarative parser toolkit written in Typescript. It aims to be a semantic and approachable library for parsing. Teg's semantics are mostly based off PEGS: Parsing Expression Grammers

0 dependencies
Browser or Node
4.4kb minified (but highly tree-shakeable!)
Well-tested
Helpful error messages
Straightforward and semantic by default
But also powerful and composable API.

Install

npm install teg-parser

Usage

import { template, line } from "teg-parser"

/** Parse markdown level 1 headings */
const h1Parser = template`# ${line}`

const result = h1Parser.run("# heading\n")

assert(result.isSuccess())
assert.deepEqual(result.value, ["heading"])

const failResult = h1Parser.run("not a heading")

assert(failResult.isFailure())
console.log(failResult)
/**
 * Logs
Parse Failure

| not a heading
| ^

Failed at index 0: Char did not match "#"
In middle of parsing text("#") at 0
In middle of parsing text("# ") at 0
In middle of parsing template(text("# "), line, text("")) at 0
 */

Often, you'll want to do some processing on a successful parse. To make this ergonomic, parsers define a map function that will let you transform successfully parsed content.

import { template, maybe, zeroOrMore, line, takeUntilAfter } from "teg-parser"

type Blockquote = {
  content: string
}

const blockquote: Parser<Blockquote> = zeroOrMore(template`> ${line}`)
  .map((lines) => lines.map(([line]) => line).join("\n"))
  .map((content) => ({ content }))

const result = blockquote.run(`> Line 1\n> Line 2\n> Line 3`)

assert(result.isSuccess())
assert.deepEqual(result.value, {
  content: "Line 1\nLine 2\nLine 3",
})

Since it's written in typescript, types are inferred as much as possible.

Much of the idea comes from Chet Corcos's article on parsers. Although Parsers currently implement bimap, fold, and chain methods as described in the article, I haven't found them as useful in real-world usage, and may remove them or change them.

Examples

There are some examples available in the examples directory. It's TODO to build out more; help out if you want!

Markdown
CLI args
Unordered list
JSON
LaTeX

You can also see an example of a bigger parser I use for my custom blog post format here: https://github.com/tanishqkancharla/tk-parser/blob/main/src/index.ts (although it's using an older version of teg right now).

API

Combinators

/** Matches a text string */
export const text = <T extends string>(value: T) => Parser<T>

/**
 * Tagged template text for parsing.
 *
 * "template`# ${line}`" will parse "# Heading" to ["Heading"]
 *
 * Can use multiple parsers together. Keep in mind parsers run greedily,
 * so "template`${word}content`" will fail on "textcontent" b/c the `word` parser
 * will match "textcontent", and then it will try to match the text "content"
 */
export const template

/**
 * Match the given parser n or more times, with an optional delimiter parser
 * in between.
 */
const nOrMore: <T, D>(
	n: number,
	parser: Parser<T>,
	delimiter?: Parser<D>
) => Parser<T[]>
/**
 * Match the given parser zero or more times, with an optional delimiter
 * NOTE: this will always succeed.
 */
const zeroOrMore: <T, D>(parser: Parser<T>, delimiter?: Parser<D>) => Parser<T[]>
/**
 * Match the given parser one or more times, with an optional delimiter
 */
const oneOrMore: <T, D>(parser: Parser<T>, delimiter?: Parser<D>) => Parser<T[]>

/** Matches exactly one of the given parsers, checked in the given order */
const oneOf: <ParserArray extends Parser<any>[]>(
	parsers: ParserArray
) => ParserArray[number]

/**
 * Match the given parsers in sequence
 *
 * @example
 * sequence([text("a"), text("b"), text("c")]) => Parser<"abc">
 */
const sequence: (
	parsers: Parser[],
	delimiter?: Parser
) => Parser

/**
 * Look ahead in the stream to match the given parser.
 * NOTE: This consumes no tokens.
 */
const lookahead: <T>(parser: Parser<T>) => Parser<T>

/**
 * Tries matching a parser, returns undefined if it fails
 * NOTE: This parser always succeeds
 */
const maybe: <T>(parser: Parser<T>) => Parser<T | undefined>

/**
 * Keep consuming until the given parser succeeds.
 * Returns all the characters that were consumed before the parser succeded.
 *
 * @example
 * `takeUntilAfter(text("\n"))` takes until after the newline but
 * doesn't include the newline itself in the result
 */
const takeUntilAfter: <T>(parser: Parser<T>) => Parser<string>
/**
 * Keep consuming until before the given parser succeeds.
 * Returns all the characters that were consumed before the parser succeded.
 *
 * @example
 * `takeUpTo(text("\n"))` takes all chars until before the newline
 */
export const takeUpTo: <T>(parser: Parser<T>): Parser<string>

Built-in primitive parsers

/**
 * Takes the first sentence in the stream
 * i.e. up to (and including) the first newline
 */
const line = takeUntilAfter(text("\n"));

/** Matches a single lowercase English letter */
const lower: Parser<string>

/** Matches a single uppercase English letter */
const upper: Parser<string>

/** Matches a single English letter, case insensitive */
const letter: Parser<string>

/**
 * Match an English word
 */
const word: Parser<string>

/** Match a single digit from 0 to 9 */
const digit: Parser<string>

const integer: Parser<number>

/** Match a single hexadecimal digit (0-9, A-F), case insensitive */
const hexDigit: Parser<string>

/** Match a single English letter or digit */
const alphaNumeric: Parser<string>

Custom Parser

const custom = new Parser((stream) => {
  // ... logic
  return new ParseSuccess(result, stream)
  // or
  return new ParseFailure(errorMessage, stream)
})

All primitive parsers and combinators are built using these constructors, so you can look at those for examples.

Testing parsers

Teg ships utilities to test parsers at teg-parser/testParser. It is used like this:

import { testParser } from "teg-parser/testParser";

const test = testParser(parser)

/** Assert the content passed in completely parses to the expected value */
test.parses(content, expected)

/**
 * Assert the content gets parsed to the expected value, but without asserting
 * all the content is consumed
 */
test.parsePartial(content, expected)

/** Assert the parser successfully matches the given content */
test.matches(content)

/** Assert the parser fails on the given content */
test.fails(content)

ESM and CJS

teg comes with out of the box support for both ESM and CJS. The correct format will be used depending on whether you use import (ESM) or require (CJS). However, a lot of parsers in teg are just simple utilities, so if you use ESM, you will be probably be able to tree-shake away a significant portion of the library.