Compromise-strict NPM

The compromise match syntax is a custom language for matching and querying tags and metadata in a document.

This plugin is an experimental re-write of this syntax using a formal parser (chevrotrain) and a strict spec.

This may be useful for some purposes, where recursive queries, or error-reporting is required.

This library implements a subset of the match-syntax, and has different edge-cases.

It does add 135kb to filesize, so is not meant as a replacement of the default match method.

This library can be used to generate rail-road diagrams of match queries, or to test them for syntax errors. Pre-compiling matches may result in small, but noticable performance improvements over the native .match().

import nlp from "compromise";
import plugin from "compromise-strict";

nlp.extend(plugin);

let doc = nlp("hello world")
  .strictMatch("(?P<greeting>hi|hello|good morning) #Noun")
  .groups("greeting");
console.log(doc.text());

commonjs:

const nlp = require("compromise");
nlp.extend(require("compromise-strict"));

let doc = nlp("Good morning world")
  .strictMatch("(?P<greeting>hi|hello|good morning) #Noun")
  .groups("greeting");
console.log(doc.text());

Pre-Compling

strict has the ability to pre-compile a match statement into a parsed format, which may improve performace of the match query. This plugin automatically adds this as a helper-method on the main nlp constructor.

// ... rest from usage above
const m = nlp.preCompile("(?P<greeting>hi|hello|good morning) #Noun");
let doc = nlp("hello world").strictMatch(m).groups("greeting");
console.log(doc.text());

Supported RegexP grammar

StartOf: ^ - start of string
Value: can be repeated
- Any: . - match any word
- Tag: #Noun - part of speech / tags
- Word: hello - just the word
  - EscapedWord: \#Noun matches the word #Noun
- Group: (...) - match groups, will also capture which saves group content, values of ... will be matched.
  - Or: (value0|value1|value2 value3) - matches either value statements in group.
  - Named: (?P<name>...) - saves group which can later be accessed by name
  - NonCapturing: (?:...) - don't save group's matched content
  - Positive Lookahead: (?=...) - does not consume, asserts that group matches ahead
  - Negative Lookahead: (?!...) - does not consume, opposite of positive lookahead
- Modifiers: goes at the end of value, ex: Hi+
  - Plus: + - matches one or more occurances of value
  - Star: * - matches zero or more occurances of value
  - Question: ? - matches zer or one occurance of value
  - Non Greedy Matches: +?, *?, ?? match as little as possible while still maintining a match.
  - Note: repeatedly matched groups will overwrite and save only the last value.
EndOf: $ - end of string

Railroad diagrams

Chevrotrain has the neat ability to generate diagrams to explain the match lookup. You can see an example of this working in ./lib/gen_diagram.js

GPL-3

compromise nlp-compromise regex match nlp natural-language-parsing natural language pikevm

chevrotain

@infinitebrahmanuniverse/nolb-compr @everything-registry/sub-chunk-1368 @zalastax/nolb-compr stills

0.0.2

5 years ago