0.5.8 • Published 3 years ago

nnsplit v0.5.8

Weekly downloads
20
License
MIT
Repository
github
Last release
3 years ago

NNSplit

PyPI Crates.io npm CI License

A tool to split text using a neural network. The main application is sentence boundary detection, but e. g. compound splitting for German is also supported.

Features

  • Robust: Not reliant on proper punctuation, spelling and case. See the metrics.
  • Small: NNSplit uses a byte-level LSTM, so weights are small (< 4MB) and models can be trained for every unicode encodable language.
  • Portable: NNSplit is written in Rust with bindings for Rust, Python, and Javascript (Browser and Node.js). See how to get started in the usage section.
  • Fast: Up to 2x faster than Spacy sentencization, see the benchmark.
  • Multilingual: NNSplit currently has models for 9 different languages (German, English, French, Norwegian, Swedish, Simplified Chinese, Turkish, Russian and Ukrainian). Try them in the demo.

Documentation has moved to the NNSplit website: https://bminixhofer.github.io/nnsplit.

License

NNSplit is licensed under the MIT license.

0.5.8

3 years ago

0.5.7

3 years ago

0.5.5

3 years ago

0.5.4

3 years ago

0.5.2

4 years ago

0.5.1

4 years ago

0.5.0

4 years ago

0.4.12

4 years ago

0.4.10

4 years ago

0.4.9

4 years ago

0.4.1

4 years ago

0.3.2

4 years ago

0.3.4

4 years ago

0.3.3

4 years ago

0.3.0

4 years ago

0.3.1

4 years ago

0.2.2

4 years ago

0.2.1

4 years ago

0.2.0

4 years ago

0.1.1

4 years ago

0.1.0

4 years ago