0.5.8 • Published 4 years ago
nnsplit v0.5.8
NNSplit
A tool to split text using a neural network. The main application is sentence boundary detection, but e. g. compound splitting for German is also supported.
Features
- Robust: Not reliant on proper punctuation, spelling and case. See the metrics.
- Small: NNSplit uses a byte-level LSTM, so weights are small (< 4MB) and models can be trained for every unicode encodable language.
- Portable: NNSplit is written in Rust with bindings for Rust, Python, and Javascript (Browser and Node.js). See how to get started in the usage section.
- Fast: Up to 2x faster than Spacy sentencization, see the benchmark.
- Multilingual: NNSplit currently has models for 9 different languages (German, English, French, Norwegian, Swedish, Simplified Chinese, Turkish, Russian and Ukrainian). Try them in the demo.
Documentation has moved to the NNSplit website: https://bminixhofer.github.io/nnsplit.
License
NNSplit is licensed under the MIT license.
0.5.8
4 years ago
0.5.7
4 years ago
0.5.5
4 years ago
0.5.4
4 years ago
0.5.2
5 years ago
0.5.1
5 years ago
0.5.0
5 years ago
0.4.12
5 years ago
0.4.10
5 years ago
0.4.9
5 years ago
0.4.1
5 years ago
0.3.2
5 years ago
0.3.4
5 years ago
0.3.3
5 years ago
0.3.0
5 years ago
0.3.1
5 years ago
0.2.2
5 years ago
0.2.1
5 years ago
0.2.0
5 years ago
0.1.1
5 years ago
0.1.0
5 years ago