0.0.4 • Published 3 years ago
compromise-redact v0.0.4
a work-in-progress text anonymization plugin.
This is not a very secure way to anonymize text. Please don't use this library for any serious, or unsupervised data anonymization. It is intended as a tool for low-risk text anonymization, or along-side a human proof-reader.
const nlp = require('compromise')
nlp.extend(require('compromise-dates'))
nlp.extend(require('compromise-redact'))
// create a document
let doc = nlp('i gave John Smith 900£ in December.')
// add options for our redaction
let m = doc.redact({
dates: '▇',
organizations: '*',
places: false, // false means don't redact
// accept a function for custom redactions
money: val => {
let num = val.toNumber()
// +/- 50
return num + Math.random() * 100 - 50
},
// custom
people: person => {
if (person.has('Smith')) {
return 'Mr. T'
}
return person
},
})
m.debug()
Considerations
compromise-redact requires compromise-dates to be installed, if you need dates to be redacted.
and compromise-numbers if you want money
and numbers
redacted.
if you change the name of a person, their gender may leak by subsequent pronouns.
See also
MIT