1.0.4 • Published 3 years ago

atext-wordz v1.0.4

Weekly downloads
-
License
MIT
Repository
github
Last release
3 years ago

npm version License: MIT

Provides a list of words within an entire text alongside few statistics


Getting started

  1. Install the package
    $ npm i atext-wordz
  2. Require it's functions
    const { getWStatsList , getWStatsObj , getWordList } = require( "atext-wordz" );
  3. Call it's functions Regarding your needs you have to pick in what format you wish to get the result. NOTE : You have 3 choices

    const result = getWStatsList( text , options );
    // ==> [ {}:wordstats(1), {}, {}, ... , {}:wordstats(N)    ]
    
    // OR
    
    const result = getWStatsObj( text , options );
    // ==> { wordstats1, wordstats2, ..., wordstatsN  }
    
    // OR
    
    const result = getWordList( text , options );
    // ==> ["word 1", "word 2", ... "word N"]

Light demo

Assuming you have a demo.txt file in a demo folder at the same level as this .js file and you want to get word stats.

const { fs } = require('fs') ;

const { getWStatsList , getWStatsObj , getWordList } = require( "../atext-wordz" );

fs.readFile( "./demo/demo.txt" , "utf8" , ( err , text ) => {

  const sortString = ` by number of a > than b's `;
  const cbOnNewWord = ( word ) => {
    // TODO: make first sector actions on new word found
  };

  const options = {  sortString , cbOnNewWord };

  const result = getWStatsList( text , options );

  console.log( result );
  // outputs :
  // < an array of word statistics sorted by most used words >

});

Options

There is few options to meet your requirements at this time. Here is the definition table.

optiontypedefault
sortStringstring""
minimumLengthnumber2
cbOnNewWordfunction(word:string) => {}
  1. sortString You can sort your words and stats before the service wraps everything up. Thanks to the integrated byStr~Sort npm module. You may find usefull to ceck it's sortString section.

    const sortString = `
        by order of a greater than b's then
        by number of a < than b's
    `;

    NOTE : Every sort sentence starts by byand can be ended by then to chain other sort sentences

  2. minimumLength You can define the minimum length of words during the analysis, phase.

  3. cbOnNewWord Provides you with a callback function that will be called whenever a new word is encountered. Which means, only once per word.


Stats

The services will gives you a stats matching an instance of IStatsOfWords or IStatsOfWordsObject or a simple array of strings.

Here are the definitions for each of them:

IStatsOfWords |field|type|notes| |-|-|-| |word|string|the word| |order|number|the order of appearence| |number|number|the number appearence| |length|number|the word's length|

IStatsOfWordsObject Each word will be a key and stats will be the value of that pair ||order|number|length| |-|-|-|-| |type|number|number|number|


Word detection

It is not that easy to detect words in a text that is quite big and containing many noises. It's not as easy as spliting on every space. And a normal text relies also on punctuation.

By chance French and English punctuation may not very this much or not at all.

Therefore, detecting anything matching anything something else than a "special" character chould be considered as part of a world. Things come very complicated when dealing with languages that are not that strict about isolating words... like japanese or chinese to list very a few.

Here is the regex that helped to detect non special characters :

const special = 
    /[�\d\s\\[\]\x20-\x40\-`{-~\xA0-\xBF×Ø÷øʹ͵ͺ;!?♪╚-╬┘-▀\uFF3B\uFF40\uFF5B-\uFF65¥・()]/i;