Ath-text-processing-package NPM

Text processing package

Intro

It basically accepts any html text, but can also be markdown text and analyses the dimensions: 1. readTime - number of minutes it is required to read this text 2. keywords - phrases of 1, 2, 3 words that repeat themselves in the text. 3. vulgarityIndex - you need to scan for vulgar words in english and calculate an index for a story if it is vulgar or not. 4. nudityIndex - images need to be analysed if they contain adult content 5. images: need to be parsed from the text into a separate array (ordered by occurance in the text!) 6. language -> recognise language of the text. It needs to work great for english, japanese spanish and german. 7. plain - plain version of a text without html tags and images that could be for example sent out in an email 8. textImageRatio 9. compressed version of the plain text.

Inputs

Any HTML text

Outputs

{
 readTime: number,
 keywords: {
   1: string[]
   2: string[]
   3: string[]
 }

compressed: string
  nudityIndex: number (0:1)
  vulgarityIndex: number (0:1)
  images: [{ url: string }],
  language: "en" | "de" etc
  textImageRatio: number
  plain: string
}

Interface

interface TextAnalyzer {
  getReadTime: () => Text
  getPlainText: () => Text
  extractImages: () => Images
  analyzeLang: () => Lang
  extractKeywords: (noOfWordsInKeyword) => Keywords  
  analyze: () => TextAnalysis // get complete analysis
}