2.3.0 • Published 6 years ago
mallet-topics v2.3.0
Mallet Topics
A javascript wrapper for the MALLET command line tool for topic modelling. Really? Yeah.
Dependencies
MALLET 2.0.8
Java
Installation
yarn add mallet-topicsExample
const { importData, trainTopics } = require('mallet-topics')
const malletExecutable = '/path/to/mallet-2.0.8/bin/mallet'
const dataDir = '/path/to/dir/containing/textfiles'
importData(
malletExecutable,
dataDir,
)
.then(({ malletDataFile }) => {
console.log(`Successfully imported data into ${malletDataFile}`)
return malletDataFile
})
.then(malletDataFile => trainTopics(
malletExecutable,
malletDataFile,
))
.then(({ topicKeysFile, docTopicsFile }) => {
console.log(`Successfully trained topics. Have a look at ${topicKeysFile} and ${docTopicsFile}`)
})
.catch(err => {
console.log(err.message)
})Docs
importData(mallet, dataDir, options)
Returns a promise which resolves when data is successfully imported to MALLET format. The resolve value is an object with a property malletDataFile which points to the newly created .mallet file.
mallet- absolute path to executable e.g./path/to/mallet-2.0.8/bin/malletdataDir- absolute path to directory of text files to classify (one file per document)optionsmalletDataFile- filepath to write data in MALLET format (default./${Date.now()}_data.mallet)stopFile- path to file containing newline-separated stopwords to omit from classificationonStdData(stdType, msg)- function to handle data sent tostdoutorstderrfrom MALLET child process (default(stdType, msg) => console.log(msg.toString()))singleFile- boolean to determine whetherimport-dirorimport-fileis used. Defaultfalse(one instance per file).
trainTopics(mallet, malletDataFile, options)
Returns a promise which resolves when topics are successfully generated. The resolve value is an object with properties topicKeysFile and docTopicsFile which contain the generated topics and document topic scores respectively.
mallet- absolute path to executable e.g./path/to/mallet-2.0.8/bin/malletmalletDataFile- filepath to data file created byimportDataoptionsnumTopics- number of topics to generate (default10)numIterations- number of sampling iterations (default100)topicKeysFile- filepath to write topics in tab-separated format (default./${Date.now()}_topics.tsv)docTopicsFile- filepath to write topic scores for each document in tab-separated format (default./${Date.now()}_doc_topics.tsv)optimizeInterval- number of iterations between hyperparameter optimizations (defaultundefined)optimizeBurnIn- number of iterations before hyperparameter optimization begins (default2*optimizeInterval)onStdData(stdType, msg)- function to handle data sent tostdoutorstderrfrom MALLET child process (default(stdType, msg) => console.log(msg.toString()))