Frequent-phrases NPM

Frequent Phrases

Process large chunks of text into a node tree, which can then be traversed to grab phrases that match the given criteria.

To install:

npm install frequent-phrases

Basic Usage

The workflow is generally:

Construct FrequentPhrase instance
Define custom config (Optional)
Process text
Output frequent phrases

Construct

const FP = new FrequentPhrase();

Custom Config (more info HERE)

The default config object is as follows:

const defaultConfig = {
    maxPhraseLength: 6,
    selectionAlgorithm: 'dropOff',
    selectionConfig: {
        dropOff: {
            threshold: 0.5
        }
    },
    scoringAlgorithm: 'default',
    parserConfig: {
        chunkSentences: true,
        removeTypedSentences: true
    },
    preProcessing: {
        trim: 3
    },
    postProcessing: {
        uniqueWordAtCutoffDepth: 1
    }
}

Access the config property to modify this after instantiation, or construct a new config object and pass it in.

const FP = new FrequentPhrase();

FP.config = newConfigObject;
// or
FP.config.maxPhraseLength = 8; // etc.

Process Text And/Or Generate Phrases

The last bit can be separated out, or done altogether.

const speech = 'Five score years ago, a great American, in whose symbolic shadow' // ... MLK's I Have A Dream speech

To process text and then extract phrases:

await FP.process(speech);

// then get Frequent Phrases
await FP.getFrequentPhrases().then((res) => console.log(res))

To do both, just pass text in to getFrequentPhrases(). Note that this method overwrites previous tree data, and is best served if you are instantiating a new FrequentPhrase() everytime.

await FP.getFrequentPhrases(speech).then((res) => console.log(res));

Both methods will yield the same result:

// ^^^^^ console.log(res);
{
    ok: true
    msg: ''
    frequentPhrases: [
        { phrase: "", score: 0 },
        { phrase: "", score: 0 },
        { phrase: "", score: 0 },
        ...
    ]
    executionTime: '3.544ms'
}

***For More

Modifying the Library

To help understanding of best ways to modify for a specific use-case, the library works as follows: 1. Input corpus 2. Pre-process potential candidates 3. Select Candidates 4. Score selected Candidates 5. Post-process candidates 6. Output

Config

Pre-Processing

trim - Trims candidate pool to only originate from the top trim starter words. Trim defaults to 0, or no trim.

Candidate Selection

Selection Algorithm - Algorithm to use for selection algorithm. Default is a simple dropoff, which cuts off phrases based on their relative visits between child / parent.
Selection Config - Stores constants to modify how selection algorithms perform. See here.

Candidate Scoring:

Defines what scoring algorithm is used. Default algo is based solely on averaged visits, meaning a higher visit average yields higher scores.

Post-Processing

uniqueWordAtCutoffDepth - Trims scored candidates so that the highest-scored phrase from each starter word is represented.

Parser Config

chunkSentences - convert a string into an array of it's contained sentences
removeTypedSentences - find the unique, longest sentence amongst a gamut of typed copies of the same sentence.
- e.g.: We are only interested in the sentence 'How are you?' but we have:
  - 'H'
  - 'Ho'
  - 'How'
  - ...
  - 'How are you?'

FrequentPhrase.getFrequentPhrases(body)

Return Frequent Phrases from data already processed.

Returns: Promise.<FP> - Frequent phrases present in the text

Param	Description
body	OPTIONAL - string of text, if passed it will be processed and then phrases will be extracted. If not passed, phrases will be extracted from existing data.

FrequentPhrase.process(body)

Process a string of sentences. Frequent phrases can only be extracted from processed text.

Returns: Promise<string[] | FPNode[]> - registry, rootNode

Param	Description
body	string of text, if passed it will be processed and then phrases will be extracted. If not passed, phrases will be extracted from existing data.

FrequentPhrase.reset()

Cleans out the sentence registry and destroys the node tree

Returns: Promise<string[] | FPNode[]> - registry, rootNode

frequent-phrases phrases frequent nlp

@infinitebrahmanuniverse/nolb-freq @everything-registry/sub-chunk-1702

0.1.2-alpha

3 years ago

0.1.1-alpha

3 years ago

0.1.0-alpha

3 years ago