1.2.1 • Published 2 years ago

revexp v1.2.1

Weekly downloads
-
License
MIT
Repository
github
Last release
2 years ago

RevExp

This library calculates the possible inputs for a given regular expression.

The supported syntax is modelled on the standard Javascript RegExp object's syntax.

You can try it online or in the terminal:

npx revexp 'H(alp|el{2,}[o]) (Wor|wo)lds?' '??????W????'

Supported Features

Pattern FeatureExamplesSupport
Character classesa [aeiou] [^a-z] \d Full
Character escapes\u0020 \\ \n \( Full
Unicode property escapes\p{Letter} None
Branchingx\|y Full
Anchors^ $ Full
Word boundary assertions\b \BPartial
Standard quantifiersa* a+ a? Full
Range quantifiersa{2} a{2,} a{2,3} Full
Lazy quantifiersa+? a*? a?? a{2,}? Full
Possessive quantifiersa++ a*+ a?+ a{2,}+ None
Groups(abc) (?<name>abc) Full
Backreferences(abc)\1 (?<x>abc)\k<x>Partial
Non-capturing groups(?:abc) Full
Positive lookaheads(?=abc) None
Negative lookaheads(?!abc) None
Positive lookbehinds(?<=abc) None
Negative lookbehinds(?<!abc) None

: Two passes may be required to fully narrow an input when using this feature.

: Backreferences are currently handled as if the group were repeated (with an optional qualifier), meaning the syntax is accepted and any full or partial valid input will be accepted, but full or partial invalid inputs may not be rejected, and the library may not be able to fully narrow inputs when using patterns which include this feature.

FlagMeaningSupport
dOutput substring match indicesNot applicable
gGlobal searchNot applicable
iCase-insensitive searchNone (false)
mMulti-line search Full
s. matches newline Full
uUnicode modeNone (false)
ySticky searchNot applicable

Usage

Library

npm install --save revexp
import RevExp from 'revexp';

// compile the pattern
const pattern = new RevExp('a[bc]|x[yz]');

// provide already known information - the first character is unknown, the second must be "c"
const known = RevExp.string('?c', '?');

// resolve as much of the data as possible
const resolved = pattern.reverse(known);

// print it in a human-friendly format
console.log(String(resolved)); // "ac"

CLI

npx revexp 'a[bc]|x[yz]' '?c'

You can also optionally change the "unknown character":

npx revexp -u'.' 'a[bc]|x[yz]' '.c'

Examples

patterninputoutput
(hello\|hi)??hi
(hello\|hi)?????hello
(hello\|hi) *?????h[ei][ l][ l][ o]
(hello\|hi) *????hi
(a[bc]d\|e(f\|gh))???[ae][bcg][dh]
(a[bc]d\|e(f\|gh))?g?egh
*a{4} +a{2} *????????[ a]aaa[ a][ a]a[ a]

Performance and Security Considerations

Matching is performed using an algorithm which executes in polynomial time on the size of the input for a given pattern. This means it is not vulnerable to catastrophic backtracking (but note that untrusted patterns can take arbitrarily long to compile and use an unbounded amount of memory).

API

RevExp

This object stores a compiled regular expression. It is analogous to Javascript's native RegExp. Construction is expensive, as it must compile the regular expression into a graph form. Once created, the RevExp object can be used repeatedly.

constructor

const revexp = new RevExp('pattern');
const revexp = new RevExp(/existing regex/);
const revexp = RevExp.compile('pattern');

Creates a new RevExp object, compiling the pattern into an internal graph ready for use. The .compile form is just an alias; functionally both forms are the same.

If you pass in a RegExp or RevExp object, the pattern will be extracted and compiled. Note that this project is not 100% compatible with RegExp features (see above for details).

RevExp.string(input[, unknown])

const characterClassString = RevExp.string('foobar');
const characterClassString = RevExp.string('foob**', '*');
const characterClassString = RevExp.string(myCharClassArray);

Converts input into a character class array.

When invoked with a string input and no unknown parameter, the string is taken literally (i.e. it assumes all characters are known).

When invoked with a string input and an unknown parameter, any occurence of unknown within input is replaced with CharacterClass.ANY.

When invoked with an existing character class array, this will copy the array but make no changes to the values (this can be used to avoid mutating the original, for example).

The returned array has an overloaded toString which allows easy pretty-printing, and an isSingular method which returns true if all the contained character classes are singular.

.test(input[, unknown])

const success = myRevExp.test('foobar');
const success = myRevExp.test('foob**', '*');
const success = myRevExp.test(myCharClassArray);

Tests if input matches (or can match) the regular expression. Returns true if a match is possible, or false if no match is possible.

For an explanation of the input and unknown parameters, see the information for RevExp.string above.

For the most control, invoke with a character class array. This allows you to pass in exact ranges for each character. For example:

const input = [
  RevExp.CharacterClass.of('f'), // literal 'f'
  RevExp.CharacterClass.of('o'),
  RevExp.CharacterClass.of('o'),
  RevExp.CharacterClass.of('bz'), // 'b' or 'z'
  RevExp.CharacterClass.WORD, // a-zA-Z0-9_
  RevExp.CharacterClass.ANY, // anything
];
// input is equivalent to a pattern of 'foo[bz]\w.'
const success = myRevExp.test(input);
Examples
const revexp = new RevExp('(hello|hi+)');

console.log(revexp.test('hello')); // true
console.log(revexp.test('hi')); // true
console.log(revexp.test('nope')); // false
console.log(revexp.test('words hello words')); // false
console.log(revexp.test('h????')); // false
console.log(revexp.test('he???', '?')); // true
console.log(revexp.test('he??', '?')); // false

.reverse(input[, unknown])

const restrictions = myRevExp.reverse('foobar');
const restrictions = myRevExp.reverse('foob**', '*');
const restrictions = myRevExp.reverse(myCharClassArray);

Returns a character class array with the tightest possible restrictions on the characters based on the regular expression and the given input. If no matches are possible, returns null.

For an explanation of the input and unknown parameters, see the information for RevExp.string above.

The returned array has an overloaded toString which allows easy pretty-printing, and an isSingular method which returns true if all the contained character classes are singular.

Examples
const revexp = new RevExp('(hello|hi+)');

console.log(String(revexp.reverse('hello'))); // hello
console.log(String(revexp.reverse('nope'))); // null
console.log(String(revexp.reverse('he???'))); // null
console.log(String(revexp.reverse('he???', '?'))); // hello
console.log(String(revexp.reverse('h????', '?'))); // h[ei][il][il][io]

RevExp.CharacterClass

This object stores character class information (e.g. [a-z]). It also has various convenience constants defined for common character classes.

The CharacterClass object is immutable.

constants

RevExp.CharacterClass.ANY     // .
RevExp.CharacterClass.NONE    // [^]
RevExp.CharacterClass.NUMERIC // [0-9]
RevExp.CharacterClass.WORD    // [a-zA-Z0-9_]
RevExp.CharacterClass.NEWLINE // various unicode newline characters
RevExp.CharacterClass.SPACE   // various unicode space characters

constructor

There are various helpers available for building custom CharacterClasses:

RevExp.CharacterClass.of('c')
RevExp.CharacterClass.of('ax')
RevExp.CharacterClass.of(['a', 'x'])
RevExp.CharacterClass.ofCode(0x20)
RevExp.CharacterClass.range('a', 'z')
RevExp.CharacterClass.union(cc1, cc2, cc3, ...)

toString

As a convenience, CharacterClass has a toString which will render the class in the standard regular expression format;

console.log(String(RevExp.CharacterClass.of('c'))) // 'c'
console.log(String(RevExp.CharacterClass.of('ax'))) // '[ax]'
console.log(String(RevExp.CharacterClass.range('a', 'z'))) // '[a-z]'
console.log(String(RevExp.CharacterClass.of('abcdz'))) // '[a-dz]'
console.log(String(RevExp.CharacterClass.of('a').inverse())) // '[^a]'

includes(char)

Returns true if the character class contains the given character.

isEmpty()

Returns true if the character class contains no characters (i.e. is impossible).

isSingular()

Returns true if the character class contains exactly one character (i.e. is fully resolved).

singularChar()

Returns the single character represented by this character class. If the class represents more than one character (or no characters), this throws an error.

intersects(other)

Returns true if there is any overlap between the two character classes.

intersect(other)

Returns a new character class which contains only the characters shared by both character classes.

union(other)

Returns a new character class which contains all characters contained in either character class.

rangeTo(other)

Returns a new character class which represents the range of characters from the current character to the target character (inclusive at both sides). The order does not matter.

This method can only be invoked on singular character classes.

inverse()

Returns a new character class which is the inverse of the current class (i.e. any character in this class is not in the returned class, and vice-versa).

1.2.1

2 years ago

1.2.0

3 years ago

1.1.0

3 years ago

1.0.2

3 years ago

1.0.1

3 years ago

1.0.3

3 years ago

1.0.0

3 years ago