0.0.2 • Published 3 years ago

compromise-penn-tags v0.0.2

Weekly downloads
45
License
MIT
Repository
github
Last release
3 years ago
nlp("pour through a book").pennTags()
/*
[{
  text: 'pour through a book',
  terms: [
    { text: 'pour', penn: 'VBP', tags: [Array] },
    { text: 'through', penn: 'IN', tags: [Array] },
    { text: 'a', penn: 'WDT', tags: [Array] },
    { text: 'book', penn: 'NN', tags: [Array] }
  ]
}]
*/

Demo

This plugin is meant to supply a mapping between the standard Penn Tagset and the custom tagset in compromise.

This lets users evaluate the compromise POS-tagger by comparing it to other libraries or testing data.

Please note that tokenization choices vary considerably between pos-tagger libraries, making this comparison more difficult.

Compromise makes some unique decisions tokenizing punctuation and contractions.

Unlike most pos-taggers, compromise terms have many tags, including descendent, or assumed tags.

Compromise is also less-confident than most libraries about declaring whether a Noun is a Singular or Plural - if the penn-tag is NNPS compromise may return NNP instead.

the .pennTags() method accepts the same options as the .json() method does.

nlp('in the town where I was born').pennTags({offset:true})
/*
[{
  text: 'in the town where I was born',
  terms: [
    { text: 'in', penn: 'IN', tags: [Array] },
    { text: 'the', penn: 'WDT', tags: [Array] },
    { text: 'town', penn: 'NN', tags: [Array] },
    { text: 'where', penn: 'CC', tags: [Array] },
    { text: 'I', penn: 'PRP', tags: [Array] },
    { text: 'was', penn: 'VB', tags: [Array] },
    { text: 'born', penn: 'VB', tags: [Array] }
  ],
  offset: { index: 0, start: 0, length: 28 }
}]
*/

work-in-progress

MIT