0.0.3 • Published 3 years ago

compromise-hash v0.0.3

Weekly downloads
37
License
MIT
Repository
github
Last release
3 years ago

Demo

const nlp = require('compromise')
nlp.extend(require('compromise-hash'))

let doc = nlp('The Children are right to laugh at you, Ralph')

// generate an md5 hash for the document
doc.hash()
// 'KD83KH3L2B39_UI3N1X'

let b = doc.clone()
doc.isEqual(b)
//true

.hash()

this hash function incorporates the term pos-tags, and whitespace, so that tagging or normalizing the document will change the hash.

Md5 is not considered a very-secure hash, so heads-up if you're doing some top-secret work.

It can though, be used successfully to compare two documents, without looping through tags:

let docA = nlp('hello there')
let docB = nlp('hello there')
console.log(docA.hash() === docB.hash())
// true

docB.match('hello').tag('Greeting')
console.log(docA.hash() === docB.hash())
// false

if you're looking for insensitivity to punctuation, or case, you can normalize or transform your document before making the hash.

let doc = nlp(`He isn't... working  `)
doc.normalize({
  case: true,
  punctuation: true,
  contractions: true,
})

nlp('he is not working').hash() === doc.hash()
// true

MIT