1.2.0 • Published 7 years ago

stopwords-json v1.2.0

Weekly downloads
82
License
Apache-2.0
Repository
github
Last release
7 years ago

stopwords-json Build Status npm Bower

Stopwords for various languages in JSON format. Per Wikipedia:

Stop words are words which are filtered out prior to, or after, processing of natural language data ... these are some of the most common, short function words, such as the, is, at, which, and on.

You can use all stopwords with stopwords-all.json (keyed by language ISO 639-1 code), or see the below table for individual language stopword files.

Languages

There are a total of 50 supported languages:

LanguageStopword countFilename
Afrikaans51af.json
Arabic162ar.json
Armenian45hy.json
Basque98eu.json
Bengali116bn.json
Breton126br.json
Bulgarian259bg.json
Catalan218ca.json
Chinese542zh.json
Croatian179hr.json
Czech346cs.json
Danish101da.json
Dutch275nl.json
English570en.json
Esperanto173eo.json
Estonian35et.json
Finnish772fi.json
French606fr.json
Galician160gl.json
German596de.json
Greek75el.json
Hausa39ha.json
Hebrew194he.json
Hindi225hi.json
Hungarian781hu.json
Indonesian355id.json
Irish109ga.json
Italian619it.json
Japanese109ja.json
Korean679ko.json
Latin49la.json
Latvian161lv.json
Marathi99mr.json
Norwegian172no.json
Persian332fa.json
Polish260pl.json
Portuguese408pt.json
Romanian282ro.json
Russian539ru.json
Slovak110sk.json
Slovenian446sl.json
Somalia30so.json
Southern Sotho31st.json
Spanish577es.json
Swahili74sw.json
Swedish401sv.json
Thai115th.json
Turkish279tr.json
Yoruba60yo.json
Zulu29zu.json

Sources

License and Copyright

Copyright (c) 2017 Peter Graham, contributors. Released under the Apache-2.0 license.