0.0.4 • Published 3 years ago

compromise-redact v0.0.4

Weekly downloads
40
License
MIT
Repository
github
Last release
3 years ago

a work-in-progress text anonymization plugin.

This is not a very secure way to anonymize text. Please don't use this library for any serious, or unsupervised data anonymization. It is intended as a tool for low-risk text anonymization, or along-side a human proof-reader.

const nlp = require('compromise')
nlp.extend(require('compromise-dates'))
nlp.extend(require('compromise-redact'))

// create a document
let doc = nlp('i gave John Smith 900£ in December.')

// add options for our redaction
let m = doc.redact({
  dates: '▇',
  organizations: '*',
  places: false, // false means don't redact
  // accept a function for custom redactions
  money: val => {
    let num = val.toNumber()
    // +/- 50
    return num + Math.random() * 100 - 50
  },
  // custom
  people: person => {
    if (person.has('Smith')) {
      return 'Mr. T'
    }
    return person
  },
})
m.debug()

Considerations

compromise-redact requires compromise-dates to be installed, if you need dates to be redacted. and compromise-numbers if you want money and numbers redacted.

if you change the name of a person, their gender may leak by subsequent pronouns.

See also

MIT