1.2.1 • Published 1 month ago

ltr v1.2.1

Weekly downloads
-
License
MIT
Repository
github
Last release
1 month ago

ltr

A simple command-line text segmenter that uses the Intl.Segmenter API to split text into characters, words and sentences.

It takes cues from standard Unix command-line tools such as wc, uniq, and sort.

Getting started

ltr runs in Node.js and can be installed globally with npm:

npm install -g ltr

You can also run it without installing it first, using npx:

npx ltr --help

Usage

ltr [command] [file1, [file2, …]]

ltr accepts one or more input files, or uses the standard input (stdin) when no files are provided. You can also concatenate stdin to other input files by using the - (dash) operand.

General options:

  • -h, --help.
  • -v, --version.

Available commands:

  • ltr chars — extract graphemes;
  • ltr words — extract words;
  • ltr sentences — extract sentences.

The tool returns one value per line.

Options

-l, --locale

By default, ltr works with the current locale. An explicit locale can be specified.

ltr sentences --locale=ro my-doc.txt

-u, --unique

Return unique values, removing any duplicates.

ltr words --unique my-doc.txt

-i, --ignore-case

Ignore case when performing operations. Causes values to be returned in lowercase.

ltr words --ignore-case my-doc.txt

-I, --ignore-accents

Ignore diacritical marks when performing operations. Causes values to be returned without diacritical marks.

ltr words --ignore-accents my-doc.txt

-c, --count

Count occurences of each unique value.

ltr words --count my-doc.txt

-t, --total

Count total occurrences. The option implies --count.

ltr words --total my-doc.txt

-s, --sort

Sort the values.

ltr words --sort my-doc.txt

When --count is present, values are sorted by occurrences, from most frequent to least. Otherwise values are sorted alphabetically in ascending order.

-r, --reverse

Reverse the order of the values. It can be used to reverse the sorting order, but can also be used on its own to list values in the reverse order of occurrence.

ltr words --sort --reverse my-doc.txt

Working with HTML and Markdown

Although you can feed HMTL and Markdown to ltr, the list of returned value will have the added noise of markup constructs.

You can convert HTML or Markdown to plain text with trimd before calling ltr:

# Using Markdown:
trimd demarkdown my-post.md | ltr words --count --total

# Using HTML:
trimd demarkup my-page.html | ltr words --count --total

Furhtermore, when using HTML documents you may want to focus on the main part of the content to reduce the interference of ancillary page content. You can use hred to extract the content of a single element:

# Using HTML, just the <main> content:
cat my-page.html | trimd demarkup | ltr words --count --total
1.2.0

1 month ago

1.1.0

1 month ago

1.0.1

1 month ago

1.2.1

1 month ago

1.0.0

1 month ago

1.0.6

7 years ago

1.0.5

7 years ago