1.2.0 • Published 1 year ago

methodius-cli v1.2.0

Weekly downloads
-
License
Hippocratic-2.1
Repository
github
Last release
1 year ago

Methodius CLI (an N-gram utility)

Methodius A utility for analyzing frequency of text in chunks.

This CLI lets you do it from the command line.

Hippocratic License HL3-LAW-MEDIA-MIL-SOC-SV

npm

Installation

Prerequisites

  • Node LTS (as of September 2023, Node 18.16.0)

Running on-demand

Download this package. Then run

npm install

Globally via NPM

npm i -g methodius-cli

Usage

Basic Scanning

Get all the details:

methodius -f "great-expectations.txt"

Decide what properties you'd like to see: (use -p for each property you want to see)

methodius -f "great-expectations.txt" -p "uniqueWords" -p "uniqueBigrams" -p letterFrequencies

Do the same on multiple files

methodius -f "great-expectations.txt" -f "a-tale-of-two-cities.txt" -p "uniqueWords" -p "uniqueBigrams" -p letterFrequencies

Output multiple files to a directory

methodius -f "great-expectations.txt" -f "a-tale-of-two-cities.txt" -p "uniqueWords" -p "uniqueBigrams" -p letterFrequencies -o "dickens/"

Set your own output file

methodius -f "great-expectations.txt" -f "a-tale-of-two-cities.txt" -p "uniqueWords" -o uniqueWords.json 

Options

OptionAliasDescriptionDefaults
--files-ffully qualified path to a file Required.samples/alice.txt
--topLimit-lfor any methods, this sets the number of top-ngrams to get. Optional.15
--properties-pwhich properties to return. Optional. Get the list off of the repo'bigramFrequencies','trigramFrequencies','letterFrequencies','meanWordSize', 'medianWordSize','wordFrequencies','bigramPositions','trigramPositions', 'uniqueWords'
--topMethods-swhich "top" methods to use. optional.'topBigrams', 'topTrigrams','topWords',
--outputFileName-oname of the output file. Optional.analysis.json or <inputfilename>.analysis.json if multiple files. This could also be a directory: analysis/en/
--mergeResults-mMerges the results files. output will be .merged.json . Optional.false

Merging results

--mergeResults, -m, and methodius-merge analyzes all of the results files and creates a single file that contains all of the results. How it merges is based on the type of value for the property:

  • If a property in a results file is an Object or a Map, what's merged are the keys. Duplicates are removed.
  • If a property in a results file is an array or Set (which would be weird because JSON can't output a Set), the arrays are concatenated. Duplicates are removed.
  • If a property in a results file is a number, the numbers are averaged.
Merging results with methodius-merge

If you want to merge results after the fact, you can use the methodius-merge command. This takes all of the same arguments as methodius. It exists so that there's the option to pick the files you merge.

methodius-merge -f "alice.analysis.json" "huck-fin.analysis.json" -o "merged.json"

You can also designate which properties from the analysis files you want to merge:

methodius-merge -f "alice.analysis.json" "huck-fin.analysis.json" -o "merged.json" -p uniqueWords