1.5.16 • Published 2 months ago

pure-regex v1.5.16

Weekly downloads
-
License
ISC
Repository
-
Last release
2 months ago

Description

A RegExp engine implemented in pure JavaScript.

This package implements the latest features of modern regular expression (to name a few, unicode, named capture group, named capture backreference and lookbehind assertion), without any support from native RegExp. And it works on NFA, which a tree traversal and the path switch underlie.

Remarks: the early development of this package is drawing to an end.

Features

Supports all modern flags as following:

  • d: hasIndices (ES2021)
  • g: global
  • i: ignoreCase
  • m: multiline
  • s: dotAll
  • u: unicode
  • y: sticky

Tutorial

Installation

npm i pure-regex

Import

var PureRegex = require("pure-regex")

Or use import (in node.js or browser with a module bundler or loader):

import PureRegex from "pure-regex"

In browser (exports as PureRegex) (Experimental):

<script src="https://cdn.jsdelivr.net/npm/pure-regex@1.5/dist.umd.cjs"></script>

Example

Match string

var pReg = new PureRegex("w(.+)")
var matches = pReg.exec("hello world")
console.log(matches)

Match string with capture group name

var pReg = new PureRegex("w(?<cap>.+)")
var matches = pReg.exec("hello world")
console.log(matches)

Match string with named capture reference

var pReg = new PureRegex("hell(?<cap>\\w).+\\k<cap>")
var matches = pReg.exec("hello world")
console.log(matches)

Match string with lookbehind assertion

var pReg = new PureRegex("(?<=hello ).+")
var matches = pReg.exec("hello world") //match "world"
console.log(matches)

Test string

var pReg = new PureRegex("w.+")
console.log(pReg.test("hello world"))

Regex Api

new PureRegex(source | regex, flags): instance

This is the constructor, which could be invoked, as with the native RegExp. And it can also be called as a ordinary function without "new", such as PureRegex().

match methods:

#exec(string): array | null

Returns a match array when it matched, corresponding to the native RegExp. A null return means match failure.

#test(string): bool

Checks whether the string can be matched or not.

Props:

#flags: string

Returns a string including the regex flags.

#lastIndex: number

Indicates the last-matched end position in the string, which serves as the next beginning position in the string.

#hasIndices: bool

Returns true if the hasIndices flag exists.

#global: bool

Returns true if the global flag exists.

#ignoreCase: bool

Returns true if the ignoreCase flag exists.

#multiline: bool

Returns true if the multiline flag exists.

#dotAll: bool

Returns true if the dotAll flag exists.

#unicode: bool

Returns true if the unicode flag exists.

#sticky: bool

Returns true if the sticky flag exists.

#source: string

Returns the regex source (also called the regex pattern).

#toString(): string

Returns the regex source, warpped in slashs, and its flags, just like "/.+/g".

Extended String Api

String.prototype.match(pureRegex): array | null

var pReg = new PureRegex("(\\w+)$")
var matches = "hello world".match(pReg)

String.prototype.matchAll(pureRegex): Iterator

var pReg = new PureRegex("\\w+", "g")
var matchesIterator = "hello world".matchAll(pReg)
console.log([...matchesIterator])

String.prototype.search(pureRegex): number

var pReg = new PureRegex("world", "i") //ignore letter case
var index = "Hello World".search(pReg) //6

String.prototype.replace(pureRegex, replacement): string

var pReg = new PureRegex("\\b(world)\\b")
var str = "hello world".replace(pReg, "pure-regex")
//"hello pure-regex"

String.prototype.replaceAll(pureRegex, replacement): string

var pReg = new PureRegex("\\s", "g")
var str = "a b c".replaceAll(pReg, "_")
//"a_b_c"

String.prototype.split(pureRegex, limit): array

var pReg = new PureRegex("\\s")
var chunks = "hello world".split(pReg)

Flags

i - ignoreCase

With case-insensitive matching of character, uppercase and lowercase letters are considered as equivalent, which involves the English alphabet. Thus "A" is equivalent to "a" in the regex source and the text to match.

s - dotAll

By default, a dot meta-character will get a match that excludes line breaks ("\n" and "\r").

When set the "s" flag, a dot meta-character hits nearly all character, the code point of which ranges from U+0000 to U+10FFFF only if the Unicode mode enabled.

u - unicode

When the "u" flag specified, the regex source and the matching text are interpret as Unicode encodings, and it supports the Unicode syntax and features. That is the regex works in the Unicode mode.

var pReg = new PureRegex("[🍀]\\u{1F338}", "u")
var str = "🍀🌸" //Unicode
console.log(pReg.test(str)) //true
var pReg = new PureRegex("0x\\p{Hex_Digit}+", "u")
var matches = pReg.exec("0xFAF1")
console.log(matches)

m - multiline

Providing the "m" flag is included, "^" and "$" delimiter would match the beginning and the end of every line respectively.

g - global

With the "g" flag, exec(), match() and replace() will support search all matches, and run in a stepping mode that every matching starts from the previous end position. Meanwhile, matchAll() and replaceAll() isn't accessible unless the global flag exists.

y - sticky

If the "y" flag is set, regex will start matching from a fixed position which pureRegex.lastIndex determines.

d - hasIndices (ES2021 added)

The case in point is that:

var pReg = new PureRegex("(a)b(c)", "d")
var matches = pReg.exec("abc")
console.log(matches.indices)
// [ [ 0, 3 ], [ 0, 1 ], [ 2, 3 ], groups: undefined ]

Security and Optimization

While using NFA, it provides fundamental immunity against ReDoS with a well-designed algorithm of runtime, but which differs from Thompson's construction. That is to say it also supports backtrack, instead of compromising. Although before v1.1.0, it eliminates the capability of backtrack to obtain a complete resistance.

The immunity roots in the distinctive algorithm. Moreover, there are also numerous optimization underlying PureRegex, notably at compile time, but more than that. Sometimes, a regex could reduce merely to string search. A simple ReDoS example:

var str = "x".repeat(20)
var pat = "(x+)+y"
var pReg = new PureRegex(pat)
var nReg = new RegExp(pat)

console.time("PureRegex")
pReg.exec(str) //return at once, the engine doesn't actually be carried out
console.timeEnd("PureRegex")
console.time("RegExp")
nReg.exec(str) //block for a while
console.timeEnd("RegExp")

When exposed to the complicacy, the runtime engine could cope that at linearly-increasing cycles, since v1.2.0. It's noteworthy that it achieves even without the compile-time optimizations to that. So the same applies with other various patterns, like "(x+)+(?:(x)+)+y" and "(x+)+(\\1)y".

var str = "x".repeat(40) + "!xy"
var pat = "(x+)+y"
var pReg = new PureRegex(pat)
var nReg = new RegExp(pat)

console.time("PureRegex")
pReg.exec(str)
console.timeEnd("PureRegex")
console.time("RegExp")
nReg.exec(str) //blocking for long
console.timeEnd("RegExp")

Also stay immune to ordinary ReDoS since the first version, without any compilation optimization:

var str = "a".repeat(100)
var pat = "^(([a-z])+.)+[A-Z]([a-z])+$"
var pReg = new PureRegex(pat)
var nReg = new RegExp(pat)

console.time("PureRegex")
pReg.exec(str) //a few milliseconds
console.timeEnd("PureRegex")
console.time("RegExp")
nReg.exec(str) //always blocking
console.timeEnd("RegExp")

Since v1.3.0, the engine has gained the ability to match in reverse. Thus it reached its objective of linear time for arbitrary lookbehind assertions, as any other expressions.

Since v1.4.0, it has begun to enhance the adaptability of backreferrence, involving reforming the internal implement. Regretfully, determining whether a backreferrence is matched took polynomial-time, at least linear time.

Since v1.5.0, it has attempted to consummate the optimization mechanism in backreferrence, with the core algorithm fully exploited.

1.5.14

2 months ago

1.5.16

2 months ago

1.5.15

2 months ago

1.5.13

2 months ago

1.5.9

2 months ago

1.5.8

2 months ago

1.5.10

2 months ago

1.5.12

2 months ago

1.5.11

2 months ago

1.5.7

2 months ago

1.5.6

2 months ago

1.5.5

2 months ago

1.5.4

2 months ago

1.5.3

2 months ago

1.5.2

2 months ago

1.5.1

3 months ago

1.5.0

3 months ago

1.4.21

3 months ago

1.4.20

3 months ago

1.4.19

3 months ago

1.4.17

3 months ago

1.4.16

3 months ago

1.4.18

3 months ago

1.4.15

3 months ago

1.4.13

4 months ago

1.4.14

4 months ago

1.4.12

4 months ago

1.4.11

4 months ago

1.4.10

4 months ago

1.4.6

4 months ago

1.4.9

4 months ago

1.4.8

4 months ago

1.4.7

4 months ago

1.4.5

4 months ago

1.4.4

4 months ago

1.4.3

4 months ago

1.4.2

5 months ago

1.4.1

5 months ago

1.4.0

5 months ago

1.3.9

5 months ago

1.3.8

5 months ago

1.3.7

5 months ago

1.3.6

5 months ago

1.3.5

5 months ago

1.3.4

5 months ago

1.3.3

5 months ago

1.3.2

5 months ago

1.2.12

5 months ago

1.2.13

5 months ago

1.2.11

5 months ago

1.2.14

5 months ago

1.2.15

5 months ago

1.3.1

5 months ago

1.3.0

5 months ago

1.2.8

5 months ago

1.2.9

5 months ago

1.2.7

5 months ago

1.2.6

5 months ago

1.2.5

5 months ago

1.2.4

5 months ago

1.2.3

5 months ago

1.2.2

5 months ago

1.2.0

5 months ago

1.1.9

5 months ago

1.1.8

5 months ago

1.1.7

5 months ago

1.1.6

5 months ago

1.1.5

5 months ago

1.2.1

5 months ago

1.1.10

5 months ago

1.1.4

5 months ago

1.1.3

5 months ago

1.1.2

5 months ago

1.1.1

5 months ago

1.1.0

5 months ago

1.0.22

5 months ago

1.0.21

5 months ago

1.0.20

5 months ago

1.0.19

5 months ago

1.0.18

5 months ago

1.0.17-1

5 months ago

1.0.17

5 months ago

1.0.16

5 months ago

1.0.15

5 months ago

1.0.14

5 months ago

1.0.13

5 months ago

1.0.12

5 months ago

1.0.11

5 months ago

1.0.10

6 months ago

1.0.9

6 months ago

1.0.8

6 months ago

1.0.6

6 months ago

1.0.5

6 months ago

1.0.4

6 months ago

1.0.3

6 months ago

1.0.2

6 months ago

1.0.1

6 months ago

1.0.0

6 months ago