Atext-wordz NPM

Provides a list of words within an entire text alongside few statistics

Getting started

Install the package
```
$ npm i atext-wordz
```

Require it's functions

const { getWStatsList , getWStatsObj , getWordList } = require( "atext-wordz" );

Call it's functions Regarding your needs you have to pick in what format you wish to get the result. NOTE : You have 3 choices

const result = getWStatsList( text , options );
// ==> [ {}:wordstats(1), {}, {}, ... , {}:wordstats(N)    ]

// OR

const result = getWStatsObj( text , options );
// ==> { wordstats1, wordstats2, ..., wordstatsN  }

// OR

const result = getWordList( text , options );
// ==> ["word 1", "word 2", ... "word N"]

Light demo

Assuming you have a demo.txt file in a demo folder at the same level as this .js file and you want to get word stats.

const { fs } = require('fs') ;

const { getWStatsList , getWStatsObj , getWordList } = require( "../atext-wordz" );

fs.readFile( "./demo/demo.txt" , "utf8" , ( err , text ) => {

  const sortString = ` by number of a > than b's `;
  const cbOnNewWord = ( word ) => {
    // TODO: make first sector actions on new word found
  };

  const options = {  sortString , cbOnNewWord };

  const result = getWStatsList( text , options );

  console.log( result );
  // outputs :
  // < an array of word statistics sorted by most used words >

});

Options

There is few options to meet your requirements at this time. Here is the definition table.

option	type	default
sortString	string	""
minimumLength	number	2
cbOnNewWord	function	(word:string) => {}

sortString You can sort your words and stats before the service wraps everything up. Thanks to the integrated byStr~Sort npm module. You may find usefull to ceck it's sortString section.
```
const sortString = `
    by order of a greater than b's then
    by number of a < than b's
`;
```
NOTE : Every sort sentence starts by byand can be ended by then to chain other sort sentences
minimumLength You can define the minimum length of words during the analysis, phase.
cbOnNewWord Provides you with a callback function that will be called whenever a new word is encountered. Which means, only once per word.

Stats

The services will gives you a stats matching an instance of IStatsOfWords or IStatsOfWordsObject or a simple array of strings.

Here are the definitions for each of them:

IStatsOfWordsObject Each word will be a key and stats will be the value of that pair ||order|number|length| |-|-|-|-| |type|number|number|number|

Word detection

It is not that easy to detect words in a text that is quite big and containing many noises. It's not as easy as spliting on every space. And a normal text relies also on punctuation.

By chance French and English punctuation may not very this much or not at all.

Therefore, detecting anything matching anything something else than a "special" character chould be considered as part of a world. Things come very complicated when dealing with languages that are not that strict about isolating words... like japanese or chinese to list very a few.

Here is the regex that helped to detect non special characters :

const special = 
    /[�\d\s\\[\]\x20-\x40\-`{-~\xA0-\xBF×Ø÷øʹ͵ͺ;！？♪╚-╬┘-▀\uFF3B\uFF40\uFF5B-\uFF65￥・（）]/i;

text analysis stats fun useful

@infinitebrahmanuniverse/nolb-ate @everything-registry/sub-chunk-1177 @zalastax/nolb-ate

4 years ago

4 years ago

4 years ago

4 years ago

4 years ago