1.0.1 • Published 7 months ago

lfs2-compression v1.0.1

Weekly downloads
-
License
MIT
Repository
-
Last release
7 months ago

LFS2 grammar compression

Copyright 2023 Laurens Holst

Project information

Implementation of LFS2 grammar compression described in the paper Linear-Time Text Compression by Longest-First Substitution by Nakamura et al.

Grammar compression identifies and extracts repeated subsequences from a string or a collection of strings, producing a context-free grammar (a directed graph). Since finding the smallest grammar is NP-hard, greedy strategies were developed as an approximation. Of these, LFS2 is one which runs in O(n) time.

Example

Input:

new Grammar([[Grammar.start, 'abcababc'.split('')]]).compress();

Output:

Grammar {
  rules: Map(3) {
    Symbol(S) => [ Symbol(1), Symbol(2), Symbol(1) ],
    Symbol(1) => [ Symbol(2), 'c' ],
    Symbol(2) => [ 'a', 'b' ]
  }
}

References

1: https://doi.org/10.3390/a2041429 (R. Nakamura, S. Inenaga, H. Bannai, T. Funamoto, M. Takeda, A. Shinohara. "Linear-Time Text Compression by Longest-First Substitution." Algorithms 2.4 (2009) 1429-1448)