rex_regex v0.1.1
rex_regex-js
javascript manipulation of regular expressions
want to use it ? want to get a glimpse ? want to participate ?
contents
Presentation
This module aims to symplify the task in creating BIG regexes with dynamic aspect (variables can be set then changed at anytime)
The purpose of it was for building a parser (see example)
The name, I find it kind of fun, since Rex Reges seems to mean king of the kings in latin.
USE
works in browser or in node.
Add it to a script tag and call the object rex_regex to use it in the broser
In node, use a require. For simplicity reasons the variable will be called the same here.
const rex_regex = require("rex_regex");//or path to the file
elements
rex_regex creates elements. to create an element, for example a group matching the word "hello", you call either
var group = rex_regex.group("hello")
//or
var goup = new rex_regex.group("hello")
elements have all in common these properties and functions
group.raw; // "hello"
group.text; // "(hello)"
group.regex("g"); // /(hello)/g
group.one(); // new element (hello)
group.any(); // new element (hello)*
group.many(); // new element (hello)+
group.some(3,5); // new element (hello){3,5}
group.some(3); // new element (hello){3}
group.some(3,Infinity); // new element (hello){3,}
// thanks mfix22
// https://github.com/mfix22/rexrex
// if I saw your module earlier I could have used it instead of coding this
For now there are 3 element types :
rex_regex.chars()
the chars element is a sequel of characters.
Be careful, escaped \ are not working in chars, because \ are ignored in the any, many an some functions to allow escaping other characters. so don't do rex_regex.chars("\\\\") and blame me
the any and many and some functions will apply to each character : a+b+ ab ...
var a = rex_regex.char("ab")
a.many().text; // "a+b+"
a.some(5).regex("m"); // /a{5}b{5}/m
rex_regex.set()
the set element is a set of characters, for example ab9
when calling it, DON'T write the brackets.
any and many and some will apply to the whole set ab9*
sets have a special operator, to be used carefully because poorly coded : not. It will return a new set element with the ^ at the beginning. To improve this will need to add properties to the element like negated:true, I don't know
var a = rex_regex.set("ab9")
a.any().text; // "[ab9]*"
a.some(5,7).regex("g"); // /[ab9]{5,7}/g
a.not().text; // [^ab9]
rex_regex.group()
the group element is a group of characters, for example (ab9)
when calling it, DON'T write the brackets.
there is no (? ) or (?! ) or anything of this kind ... because I didn't need it, but if you want to participate please feel at ease. To improve this will need to add properties to the element like lazy:true, I don't know
var a = rex_regex.group("ab9")
a.raw; // "ab9"
a.any().text; // "(ab9)*"
a.some(5,Infinity).regex(); // /(ab9){5,}/
chaining
every function except .regex() returns a new rex_regex._core.Element
,
furthermore when creating a new rex_regex
you can send either a string OR a rex_regex Element (the text will be taken),
or several, multiple arguments are ok.
allowing you to chain calls like in the example.
var regexp = rex_regex.chars(
)
why ?
Personally, I needed to have flexible variables in a regex, so I just coded it and that's all.
example
let's build a simple parser which separate words; spaces and hashtags in 2 groups, with only letters in hashtags :
// variable definitions -------------------
var hashtagC = rex_regex.chars("#");
var hashtagAuthorizedS = rex_regex.set("a-zA-Z");
var wordS = rex_regex.set("\\w");
var spacesS = rex_regex.set("\\s\\t\\n")
// making groups -------------------------
var hashtagGroup = rex_regex.group(
hashtagC,
hashtagAuthorizedS.many()
);
var wordGroup = rex_regex.group(
wordS.many()
);
var spaceGroup = rex_regex.group(
spacesS.many()
);
var otherGroup = rex_regex.group(
".+"// i agree it's easier to do like that sometimes
)
var bigRegex =
rex_regex.chars(
hashtagGroup,"|", // or
wordGroup,"|",
spaceGroup,"|",
otherGroup
).regex("g")
// gives /(#[a-zA-Z]+)|([\w]+)|([\s\t\n]+)|(.+)/g
// try it here
of course typing the regex is so much FASTER, but it would need you to think a bit if one day you would like to change the # for a @. (not really in fact... but it helps seeing what you are doing, at least to me it seems like it does)
This example is very simple, I coded this to produce the following regex :
([#@][^ \t\r\n:#>;\-]+;|[:#][^ \t\r\n:#>;\-,\\<=+*%°ç^_`\-&|([{~}\]\)§!?$£¤€.]*[>;\-])|
(:[/!@#'":]|[:#][^ \t\r\n:#>;\-,\\<=+*%°ç^_`\-&|([{~}\]\)§!?$£¤€.]+)|
([/!@#'":];|[^ \t\r\n:#>;\-,\\<=+*%°ç^_`\-&|([{~}\]\)§!?$£¤€.]+[>;\-])|
([ \t]+)|
(\r|\n|\r\n)|
([^ \t\r\n:#>;\-]+)|
([^ \t\r\n]+)
I would never have had the patience to write this without a tool like rex_regex. And Imagine if one day I wanted to change some characters ?? I say, headache !
Making Of
Pull rules
If you want to participate, you are most welcome
here are the few rules to keep it coherent
priorities in writing Let's keep some guideline across the code
readable first
today computers are powerfull, let's write something easy to read, with spaces, linebreaks and lots of comments, even if it costs, those who need can minify
flexible first ex-aequo
let's store as many core parameters, and assemble them in logic order, so they can be adapted later. (ex the regex creator instead of a regex string)
fast third
if we can make the code fast, it's after the readability, but it's cool too
light last
computers are POWERFUL today, I prefer big objects with clear property names, parsing a few strings should'nt kill your memory.
What's done
see use
What's next (to-do list)
adding properties to sets ang groups like
- negated:true
- lazy:true
- named:true
- name:"name"
- ...