0.0.2 • Published 3 years ago

compromise-strict v0.0.2

Weekly downloads
5
License
GPL-3.0-or-later
Repository
github
Last release
3 years ago

The compromise match syntax is a custom language for matching and querying tags and metadata in a document.

This plugin is an experimental re-write of this syntax using a formal parser (chevrotrain) and a strict spec.

This may be useful for some purposes, where recursive queries, or error-reporting is required.

This library implements a subset of the match-syntax, and has different edge-cases.

It does add 135kb to filesize, so is not meant as a replacement of the default match method.

This library can be used to generate rail-road diagrams of match queries, or to test them for syntax errors. Pre-compiling matches may result in small, but noticable performance improvements over the native .match().

import nlp from "compromise";
import plugin from "compromise-strict";

nlp.extend(plugin);

let doc = nlp("hello world")
  .strictMatch("(?P<greeting>hi|hello|good morning) #Noun")
  .groups("greeting");
console.log(doc.text());

commonjs:

const nlp = require("compromise");
nlp.extend(require("compromise-strict"));

let doc = nlp("Good morning world")
  .strictMatch("(?P<greeting>hi|hello|good morning) #Noun")
  .groups("greeting");
console.log(doc.text());

Pre-Compling

strict has the ability to pre-compile a match statement into a parsed format, which may improve performace of the match query. This plugin automatically adds this as a helper-method on the main nlp constructor.

// ... rest from usage above
const m = nlp.preCompile("(?P<greeting>hi|hello|good morning) #Noun");
let doc = nlp("hello world").strictMatch(m).groups("greeting");
console.log(doc.text());

Supported RegexP grammar

  • StartOf: ^ - start of string
  • Value: can be repeated
    • Any: . - match any word
    • Tag: #Noun - part of speech / tags
    • Word: hello - just the word
      • EscapedWord: \#Noun matches the word #Noun
    • Group: (...) - match groups, will also capture which saves group content, values of ... will be matched.
      • Or: (value0|value1|value2 value3) - matches either value statements in group.
      • Named: (?P<name>...) - saves group which can later be accessed by name
      • NonCapturing: (?:...) - don't save group's matched content
      • Positive Lookahead: (?=...) - does not consume, asserts that group matches ahead
      • Negative Lookahead: (?!...) - does not consume, opposite of positive lookahead
    • Modifiers: goes at the end of value, ex: Hi+
      • Plus: + - matches one or more occurances of value
      • Star: * - matches zero or more occurances of value
      • Question: ? - matches zer or one occurance of value
      • Non Greedy Matches: +?, *?, ?? match as little as possible while still maintining a match.
      • Note: repeatedly matched groups will overwrite and save only the last value.
  • EndOf: $ - end of string

Railroad diagrams

image

Chevrotrain has the neat ability to generate diagrams to explain the match lookup. You can see an example of this working in ./lib/gen_diagram.js

GPL-3