compromise-strict v0.0.2
The compromise match syntax is a custom language for matching and querying tags and metadata in a document.
This plugin is an experimental re-write of this syntax using a formal parser (chevrotrain) and a strict spec.
This may be useful for some purposes, where recursive queries, or error-reporting is required.
This library implements a subset of the match-syntax, and has different edge-cases.
It does add 135kb to filesize, so is not meant as a replacement of the default match method.
This library can be used to generate rail-road diagrams of match queries, or to test them for syntax errors. Pre-compiling matches may result in small, but noticable performance improvements over the native .match().
import nlp from "compromise";
import plugin from "compromise-strict";
nlp.extend(plugin);
let doc = nlp("hello world")
.strictMatch("(?P<greeting>hi|hello|good morning) #Noun")
.groups("greeting");
console.log(doc.text());
commonjs:
const nlp = require("compromise");
nlp.extend(require("compromise-strict"));
let doc = nlp("Good morning world")
.strictMatch("(?P<greeting>hi|hello|good morning) #Noun")
.groups("greeting");
console.log(doc.text());
Pre-Compling
strict has the ability to pre-compile a match statement into a parsed format, which may improve performace of the match query. This plugin automatically adds this as a helper-method on the main nlp
constructor.
// ... rest from usage above
const m = nlp.preCompile("(?P<greeting>hi|hello|good morning) #Noun");
let doc = nlp("hello world").strictMatch(m).groups("greeting");
console.log(doc.text());
Supported RegexP grammar
- StartOf:
^
- start of string - Value: can be repeated
- Any:
.
- match any word - Tag:
#Noun
- part of speech / tags - Word:
hello
- just the word- EscapedWord:
\#Noun
matches the word#Noun
- EscapedWord:
- Group:
(...)
- match groups, will also capture which saves group content, values of...
will be matched.- Or:
(value0|value1|value2 value3)
- matches either value statements in group. - Named:
(?P<name>...)
- saves group which can later be accessed by name - NonCapturing:
(?:...)
- don't save group's matched content - Positive Lookahead:
(?=...)
- does not consume, asserts that group matches ahead - Negative Lookahead:
(?!...)
- does not consume, opposite of positive lookahead
- Or:
- Modifiers: goes at the end of value, ex:
Hi+
- Plus:
+
- matches one or more occurances of value - Star:
*
- matches zero or more occurances of value - Question:
?
- matches zer or one occurance of value - Non Greedy Matches:
+?
,*?
,??
match as little as possible while still maintining a match. - Note: repeatedly matched groups will overwrite and save only the last value.
- Plus:
- Any:
- EndOf:
$
- end of string
Railroad diagrams
Chevrotrain has the neat ability to generate diagrams to explain the match lookup. You can see an example of this working in ./lib/gen_diagram.js
GPL-3
3 years ago