1.1.1 • Published 2 years ago

srl-lang v1.1.1

Weekly downloads
-
License
ISC
Repository
-
Last release
2 years ago

SRL - Structured RegExp Language

This is a simple language which compiles into a "Regular Expression". The language is inspired by the post of u/johnngnky on Reddit.

I don't follow the post from him but the way I created the syntax is inspired by his post and I got the idea to make this from him.

The project is in an very early state and is not mend to be used in an actual application.

Language

Data Types

SRL has a few data types you should know before starting to work with SRL.

NameUsageDescription
Number$5A number is idecated by a $
String"Foo"A string is contained inside two "
Constant!digitA constant is indecated by a !

Constants

Constants are pre defined patterns that can be reused.

NameUsageRegExpDescription
Any char!any.Any single character
Any whitespace!whitespace\sAny whitespace character
Any non-whitespace!!whitespace\SAny non-whitespace character
Any digit!digit\dAny digit
Any non-digit!!digit\DAny non-digit
Any word character!word\wAny word character
Any non-word character!!word\WAny non-word character
Any unicode sequence!unicode\XAny Unicode sequence, linebreaks included
Match data unit!dataunit\CMatch one data unit
Unicode newline!uninl\RUnicode newlines
Anything but newline!!newline\NMatch anything but a newline
Vertical whitespace!vwhitespace\vVertical whitespace character
Negative vertical whitespace!!vwhitespace\VNegation of \v
Horizontal whitespace!hwhitespace\hHorizontal whitespace character
Negative horizontal whitespace!!hwhitespace\HNegation of \h
Reset Match!reset\KReset match
Control character Y!ycontrol\cYControl Character Y
Backspace!backspace\bBackspace character
Newline!newline\nNewline
Carriage return!carriage\rCarriage return
Tab!tab\tTab
Null character!null\0Null charachter

Anchors

Anchors allows you to jump to certain points.

NameUsageRegExpDescriptions
Start matchSTAT MATCH\GStart of match
Start input*START INPUT^ / \AStart of input
End input*END INPUT$ / \ZEnd of input
Absolute end inputABSOLUTE END INPUT\zAbsolute end of input
word boundaryWORD BOUNDARY\bA word boundary
boundaryBOUNDARY\BNon-word boundary

* ^ and $ match the start and end of a line if Multiline is enabled

Groups

Groups are much like normal groups in regex and can be used as such.
There are some different ways how we can use groups.

NameUsageRegExpDescription
Capture enclosed...(...)Capture everything enclosed
Atomic captureATMOIC ...(?>...)Atmoic group (non-capturing)
Named groupNAME "Foo" FOR ...(?\)Named capturing group
Define group *DEFINE "Foo" FOR ...Defines a group for later use
Positive lookaheadPOSITIVE LOOKAHEAD ...(?=...)Positive lookahead
Negative lookaheadNEGATIVE LOOKAHEAD ...(?!...)Negative lookahead
Positive lookbehindPOSITIVE LOOKBEHIND ...(?<=...)Positive lookbehind
Negative lookbehindNEGATIVE LOOKBEHIND ...(?<!...)Negative lookbehind
If branchingIF (LITERAL ("foo")) THEN LITERAL ("Bar") ELSE LITERAL ("barFoo")(((?=foo)fooBar)|(barFoo))Allows to go through different regex branches

* Is not supported by the Emacs RegExp Engine, but its implemented via SRL

Quantifiers

Quantifiers are used to define how often the last element should be repeated.

NameUsageRegExpDescription
OptionalLITERAL ("a") OPTIONALa?0 or 1 of a
Zero or More*LITERAL ("a") MANYa*0 or more of a
One or MoreLITERAL ("a") MANY1a+1 or more of a
ExactLITERAL ("a") EXACT ($3)a{3}Exactly 3 of a
More thanLITERAL ("a") MORE ($3)a{3,}3 or more of a
Less than ***LITERAL ("a") LESS ($3)a{0,3}3 or less of a
BetweenLITERAL ("a") BETWEEN ($3 $6)a{3,6}Between 3 and 6 of a
Greedy**LITERAL ("a") GREEDYa*Greedy quantifier
LazyLITERAL ("a") LAZYa*?Lazy quantifier
PossessiveLITERAL ("a") POSSESSIVEa*+Possessive quantifier

* Is replaceable with GREEDY
** Is replaceable with MANY
*** Is not supported by the Emacs RegExp Engine, but its implemented via SRL

Instructions

Instructions are used to define your patterns.

NameUsageRegExpDescription
FromFROM ("123")123Single char of
ExceptEXCEPT ("123")^123Any other char than
LiteralLITERAL ("a")aWhole string matches
OrLITERAL ("a") OR LTIERAL ("b")a|ba or b
Subroutine *SUBROUTINE("test")Matches a predefined group

* Custom implementation, this feature is not a part of the default regex engine for ecmascript

Flags

Flags are used to set the modes in the RegExp parser

NameUsageDescription
GlobalGLOBALDoes not stop after first match
MultilineMULTILINE^ and $ match the start and end of the line
Case insensitiveCASE INSENSITIVEMatches capital letters and non-captial letters as the same
Single lineSINGLE LINEReads whole input as one line
UnicodeUNICODEStrings will be treated as UTF-16
STICKYSTICKYForces pattern to anchor at start of the search or the last match

* Also called "Ignore Whitespace"