0.1.0 • Published 1 year ago

@hgargg-0710/regex v0.1.0

Weekly downloads
-
License
MIT
Repository
-
Last release
1 year ago

regex

regex is a JavaScript library intended for parsing, generation and AST-construction of various regular expressions, as per the JavaScript variety's definition.

NOTE: the library depends upon the parsers.js package for parser-making

Installation

npm install @hgargg-0710/regex

Documentation

The package has the following exports:

  1. parse (function)
  2. generate (function)
  3. parser (submodule)
  4. generator (submodule)
  5. tree (submodule)
  6. tokens (submodule)

parse

function parse(regex: string): Flags

A function taking in a string containing a regular expression, and returning an AST of it.

generate

function generate(AST: Flags): string

Takes in the given AST node (not necessariliy Flags, but too long to express here), and returns a string representing it.

NOTE: partial nodes will give only partial results. For example, passing a PatternEnd will give "$".

parser

Various parsing layers APIs

exportdescription
ExpressionParserFunction. Parses an Expression, initially tokenizing it
boundrySubmodule. Handles parsing of boundries
charsSubmodule. Handles tokenization
classesSubmodule. Handles parsing of character classes
deflagSubmodule. Handles removal of flags
disjunctionSubmodule. Handles parsing of disjunction expressions
escapedSubmodule. Handles parsing of escape-sequences
groupSubmodule. Handles recursion within a regular expression
nogreedySubmodule. Handles the "no-greedy" quantifiers
quantifierSubmodule. Handles the quantifiers

The submodule exports are a part of the parse function's final definition.

The order in which they (layers) are passed within the parse function are:

  1. deflag
  2. chars
  3. classes
  4. escaped
  5. boundry
  6. group (recursive, looped)
  7. quantifier
  8. nogreedy
  9. disjunction

deflag

exportdescription
DeFlagFunctions for the de-flagging of a string with regular expression in it. Returns a Flags object, with the .expression field containing the expressions's string
flagTableTable for identification of flags with appropriate TokenInstances
flagInstanceFunction based off flagTable. Returns the TokenType of a given flag string
identifyFlagsMaps flagInstance to an array of strings

chars

exportdescription
ExpressionTokenizerA PatternTokenizer for tokenizing the given Pattern with a regular expression in it
tokenizerMapThe RegExpMap, on which ExpressionTokenizer is based

classes

exportdescription
CharacterClassParserMain parser for character classes
classLimitLimits the given stream up to the next RectOp from the current element
classMapTypeMap, on which CharacterClassParser is based
HandleClassThe handler for the RectOp token inside the classMap
ClassHandlerA multistep function, serving as the main component of HandleClass
EscapeInnerA parser function, first component of the ClassHandler. Escapes inside characters
HandleEscapedHandler for the escaped characters, main part of the EscapeInner
IdentifyRangesSecond parsing function of ClassHandler. Identifies and parsers ranges
HandleRangeThe main component of IdentifyRanges, parses encountered ranges
InClassEscapedHandlerA slightly modified version of the escapedMap from escaped module for escaping

escaped

exportdescription
EscapedParserMain parser of the escaped characters
escapePrefaceThe TypeMap, on which EscapedParser is based
escapeMapThe ValueMap, on which defines the global-scope escaping
escapedHandlerCreates a function for handling escaped characters based off given map
parseBackreferenceReturns a Backreference based on given arguments of curr, input
parseMultControlReturns a ControlCharacter of lengths 4-5 based on curr, input
parseDoubleControlReturns a ControlCharacter of length 2 based on curr, input
parseSingleControlReturns a ControlCharacter of length 1 based on curr, input
readUnicodeClassPropertyParses a UnicodeClassProperty based on curr, input
readBracedReads the given Stream, until a ClBrace is encountered
readNamedBackreferenceReads a NamedBackreference based on readIdentifier
readUBraceReads a sequence of {hhhh} or {hhhhh} where isHex(h) === true
readuReads a sequence of hhhh, where isHex(h) === true
readxReads a sequence of hh, where isHex(h) === true
isHexReturns whether a character given is a hexidecimal

boundry

exportdescription
BoundryParserMain parser of the submodule. Separates boundries into TokenInstances
boundryMapThe TypeMap, on which the BoundryParser is based
HandleEscapedHandles the NonWordBoundry TokenInstances

group

exportdescription
EndParserThe main parser of the submodule. The ExpressionParser ends with it
GroupParserThe first parsing layer of the EndParser. Recursive. Handles recursion, groups/captures, look-aheads/-behinds
groupMapThe TypeMap, on which the GroupParser is based
GroupHandlerThe main component of the groupMap
nestedBrackFunction for limiting the current-level nested bracket-expression
CollectionHandlerFunction for handling current collection
HandleQMarkFunction for handling "collections" starting with ? ((?<!...), (?<...>...), ...)
HandleCollectionBaseFunction for recursively handling a capture group
QMarkHandlerUnderlying TableParser of HandleQMark
HandleQMarkExclMarkHandles a negative look-ahead
HandleQMarkEqHandles a look-ahead
HandleLeftAngularHandles all "collections" starting with < ((?<...>...), (?<=...), ...)
HandleColonHandles a no-capture group
LeftAngularHandlerUnderlying TableParser for HandleLeftAngular
HandleLeftAngularBaseHandles a named capture
HandleLeftAngularExclMarkHandles a negative look-behind
HandleLeftAngularEqHandles a look-behind
readIdentifierReads an identifier (for the named capture/backreference)

quantifier

exportdescription
QuantifierParserMain parser of the submodule. Parses quantifiers
QuantifierHandlerA TableParser, main component of the QuantifierParser
HandlePlusHandles a Plus token encountered
HandleStarHandles a Star token encountered
HandleQMarkHandles a QMark token encountered
BraceHandlerHandles a OpBrace token encountered
HandleBracedReturns a handling function for either one of NtoM, NPlus, or NOnly
readNumberReads a number from the given Stream (note: up to the first isNaN token)
limitBracedLimits the given Stream up to the point of the first encountered ClBrace

nogreedy

exportdescription
ParseNoGreedyMain parser of the submodule. Parsers NoGreedy tokens
noGreedyMapThe TypeMap, on which ParseNoGreedy is based
HandleQuantifierHandler for quantifiers
QuantifierHandlerThe underlying TableParser-function of HandleQuantifiers
HandleQMarkHandles QMark following a quantifier (no-greedy quantifiers)

disjunction

exportdescription
DisjunctionParserThe main export of the submodule. Parses disjunctions
EmptyFixerFirst parsing layer of DisjunctionParser. Fixes empty expressions \|\|
DisjunctionTokenizerSecond parsing layer of DisjunctionParser. Puts non-Pipe bits of current Stream into DisjucntionArguments
DisjunctionDelimiterThird and final parsing layer of DisjunctionParser. Delimits the Stream based off Pipe tokens
hasDisjunctionsChecks whether a given Stream has disjunctions to parse from given point on
limitPipeLimits the given Stream until the moment the next Pipe is encountered
skipTilPipesSkips Stream until a Pipe is discovered

generator

Provides regex-generation related exports based off the package's AST

exportdescription
RegexGeneratorThe SourceGenerator for the package's AST (generate is based on it)
generatorMapThe TypeMap, on which RegexGenerator is based
GenerateBackspaceClassGenerates a regex for BackspaceClass
GenerateWordBoundryGenerates a regex for WordBoundry
GenerateNonWordBoundryGenerates a regex for NonWordBoundry
GenerateNewlineGenerates a regex for Newline
GenerateCarriageReturnGenerates a regex for CarriageReturn
GenerateWordClassGenerates a regex for WordClass
GenerateNonWordClassGenerates a regex for NonWordClass
GenerateFormFeedGenerates a regex for FormFeed
GenerateDigitClassGenerates a regex for DigitClass
GenerateNonDigitClassGenerates a regex for NonDigitClass
GenerateNULClassGenerates a regex for NULClass
GenerateVerticalTabGenerates a regex for VerticalTab
GenerateHorizontalTabGenerates a regex for HorizontalTab
GenerateNonWhitespaceClassGenerates a regex for NonWhitespaceClass
GenerateWhitespaceClassGenerates a regex for WhitespaceClass
GenerateEmptyExpressionGenerates a regex for EmptyExpression
GenerateMatchIndiciesGenerates a regex for MatchIndicies flag
GenerateGlobalSearchGenerates a regex for GlobalSearch flag
GenerateCaseInsensitiveGenerates a regex for CaseInsensitive flag
GenerateMultlineGenerates a regex for Multline flag
GenerateDotAllGenerates a regex for DotAll flag
GenerateUnicodeGenerates a regex for Unicode flag
GenerateUnicodeSetsGenerates a regex for UnicodeSets flag
GenerateStickyGenerates a regex for Sticky flag
GeneratePatterStartGenerates a regex for PatternStart
GeneratePatternEndGenerates a regex for PatternEnd
GenerateFlagsGenerates a regex for Flags
GenerateExpressionGenerates an regex for Expression
GenerateNOnlyGenerates an regex for NOnly
GenerateNtoMGenerates an regex for NtoM
GenerateNPlusGenerates an regex for NPlus
GenerateEscapedGenerates an regex for Escaped
GenerateBackreferenceGenerates a regex for Backreference
GenerateUnicodeClassPropertyGenerates a regex for UnicodeClassProperty
GenerateControlCharacterGenerates a regex for ControlCharacter
GenerateNamedBackreferenceGenerates a regex for NamedBackreference
GenerateClassRangeGenerates a regex for ClassRange
GenerateNoGreedyGenerates a regex for NoGreedy
GenerateOptionalGenerates anregex for Optional
GenerateZeroPlusGenerates a regex for ZeroPlus
GenerateOnePlusGenerates a regex for OnePlus
GenerateClassGenerates a regex for CharacterClass
GenerateNegClassGenerates a regex for NegCharacterClass
GenerateDisjunctionGenerates a regex for Disjunction
GenerateDisjunctionArgumentGenerates a regex for DisjunctionArgument
GenerateNonCaptureGroupGenerates a regex for NonCaptureGroup
GenerateCaptureGroupGenerates a regex for CaptureGroup
GenerateLookAheadGenerates a regex for LookAhead
GenerateLookBehindGenerates a regex for LookBehind
GenerateNegLookAheadGenerates a regex for NegLookAhead
GenerateNegLookBehindGenerates a regex for NegLookBehind
GenerateNamedCaptureGenerates a regex for NamedCapture
GenerateWildcardGenerates a regex for Wildcard
GeneratePipeGenerates a regex for Pipe
GenerateCommaGenerates a regex for Comma
GenerateTrivialGenerates a regex for anything else not in the table already (with a typeof .value === 'string')

tree

exportdescription
RegexStreamA TreeStream for the library's AST (note: accepts THE AST ITSELF)
RegexTreeA Tree interface implementation for the library's AST
treeMapThe TypeMap, on which RegexTree is based
NamedCaptureTreeThe function for conversion of a NamedCapture to a Tree
ExpressionTreeThe function for conversion of an Expression to a Tree
FlagTreeThe function for convertsion of a Flags to a Tree
SeveralTreeThe function for conversion of NOnly, NtoM and NPlus to a Tree
SingleTreeThe function for conversion of ZeroPlus, OnePlus, Optional, LookAhead, LookBehind, NegLookAhead, NegLookBehind, NamedBackreference to a Tree
ValueTreeThe function for conversion of ClassRange, DisjunctionArgument, CharacterClass, NegCharacterClass and Disjunction to a Tree
ChildlessTreeThe function for conversion of the rest of the tokens to a Tree

tokens

The tokens module has the same submodule structure as the parser module.

submoduledescription
boundryVarious boundry tokens
charsVarious basic (first-order) tokens
classesTokens for representation of character classes
deflagFlags and expressions representation tokens
disjunctionDisjunction-related tokens
escapedEscape-sequence-related tokens
groupTokens for groups and other recursive structures
nogreedyTokens for non-greedy quantifiers
quantifierTokens for quantifiers

deflag

TokenType/TokenInstancerepresentstype
MatchIndiciesThe d flag"indicies"
GlobalSearchThe g flag"global"
CaseInsensitiveThe i flag"case-insensitive"
MultilineThe m flag"multiline"
DotAllThe s flag"dot-all"
UnicodeThe u flag"unicode"
UnicodeSetsThe v flag"unicode-sets"
StickyThe y flag"sticky"
FlagsThe complete regular expression with flags"flags"
ExpressionA partial expression, without flags (can have other Expressions inside)"expression"

chars

TokenTyperepresentstype
Escape\\"escape"
RectOp["rop"
RectCl]"rcl"
Hyphen-"hyphen"
Pipe\|"pipe"
OpBrack("opbrack"
ClBrack)clbrack
QMark?"qmark"
ExclMark!"emark
Eq="eq"
Wildcard."wildcard"
Star*"star"
Plus+"plus"
OpBrace{"opbrc"
ClBrace}"clbrc"
Colon:"colon"
Comma,"comma"
LeftAngular<"lang"
RightAngular>"rang"
Dollar$"dollar"
Xor^"xor"
RegexSymboleverything else"symbol"

classes

TokenTyperepresentstype
CharacterClassA character class [...]"charclass"
NegCharacterClassA negative character class [^...]"neg-charclass"
ClassRangeA character class range X-Y"class-range"

escaped

TokenType/TokenInstancerepresentstype
ControlCharacter\cX, \xhh, \uhhhh, \u{hhhh} or \u{hhhhh}"control-char"
Backreference\N - numeric backreference"backref"
NamedBackreference\k<name> - named backreference"named-backref"
UnicodeClassProperty\p{...} - unicode class property"uniprop"
RegexIdentifiername - identifier in named captures/backreferences"identifier"
CarriageReturn\r - carriage return"cr"
NonWordBoundry\B - non-word boundry (outside classes)"non-word-boundry"
WordBoundry\b - word-boundry"word-boundry"
NULClass\0 - NUL class"nul-class"
FormFeed\f - form feed"form-feed"
DigitClass\d - digit class"digit-class"
NonDigitClass\D - non-digit class"non-digit-class"
WordClass\w - word-class"word-class"
NonWordClass\W - nonw-word-class"non-word-class"
WhitespaceClass\s - whitespace class"whitespace-class"
NonWhitespaceClass\S - non-whitespace class"non-whitespace-class"
HorizontalTab\t - horizontal tab"tab"
VerticalTab\v - vertical tab"vtab"
BackspaceClass\b - backspace"backspace"
Newline\n - newline"newline"
EscapedAny other escaped character"escaped"

boundry

TokenInstancerepresentstype
PatternStart^"start"
PatternEnd$"end"

group

TokenTyperepresentstype
CaptureGroup(...)"capture"
NoCaptureGroup(?:...)"non-capture"
NamedCapture(<name>...)"named-capture"
LookAhead(?=...)"lookahead"
LookBehind(?<=...)"lookbehind"
NegLookAhead(?!...)"neg-lookahead"
NegLookBehind(?<!...)"neg-lookbehind"

quantifier

TokenTyperepresentstype
ZeroPlus...*"zero-plus"
OnePlus...+"one-plus"
Optional...?"optional"
NOnly...{...}"n-only"
NPlus...{...,}"n-plus"
NtoM...{...,...}"n-to-m"

nogreedy

exportdescriptiontype
NoGreedyA TokenType representing no-greedy opertors"nogreedy"
isQuantifierA predicate returning true only for tokens with types from the quantifier module

disjunction

TokenType/TokenInstancerepresentstype
Disjunction...\|...\|..."disjunction"
DisjunctionArgumentAn element of a Disjunction"disjunction-arg"
EmptyExpressionAn empty element of a Disjunction (\|\|)"empty"