aida-nlp v0.1.8
Check the demo
It's a chatbott running from the browser using Tensorflow.js and using the Web Speech API for speach to text and text to speach.
Train online
You can train from the browser using Javascript and Tensorflow.js (using your local GPU resources) or from the browser using Python and Tensorflow with Keras thanks to Google Colaboratory's free TPU's. There is no need to setup a local environment, the trained models can be saved for later use.
Local NPM package setup
- Install the npm package:
yarn add aida-nlp
Create your chatito definition files, here you define your intents and your possible sentence models in mutiple
.chatito
files, and save them to a directory. e.g.: ´./chatito´Create a config file like
aida_config.json
where you define the path to your chatito definition files, the chatito dataset output path and the output path for the trained NLP models:
{
"chatito": {
"inputPath": "./chatito",
"outputPath": "./dataset"
},
"aida": {
"outputPath": "./model",
"language": "en"
}
}
Generate and encode the dataset for training:
npx aida-nlp aida_config.json --action dataset
. The dataset will be available at the configured output path.Start training:
npx aida-nlp aida_config.json --action train
. The models will be saved at the configured output path.Run
npx aida-nlp aida_config.json --action test
for trying the generated testing dataset.
Local setup cloning the project
Alternatively to training online and using npm package, you can setup the project locally. Clone the GH proejct and install dependencies for node and python (given NodeJS with yarn and Python3 are installed):
- Run
yarn install
from the./typescript
directory - Run
pip3 install -r requirements.txt
from the./python
directory
Create a dataset
Edit or create the chatito files inside ./typescript/examples/en/intents
to customize the dataset generation as you need. You can read more about Chatito.
Then, from ./typescript
directory, run npm run dataset:en:process
. This will generate many files at the ./typescript/public/models
directory. The dataset, the dataset parameters, the testing parameters and the embeddings dictionary. (Note: Aida also supports spanish language, if you need other language you can add if you first download the fastText embeddings for that language).
Training
Ttrain from 3 local environments:
- For python: open ./python/main.ipynb
with jupyter notebook or jupyter lab. Python will load your custom settings generated at step 3. And save the models in a TensorflowJS compatible format at the output
directory.
- For web browsers: from `./typescript` run `npm run web:start`. Then navigate to `http://localhost:8000/train` for the training web UI. After training, downloading the model to the `./typescript/public/pretrained/web` directory (NOTE: this will also generate and download a new dataset).
- For Node.js: from `./typescript` run `npm run node:start`. This will load the previously dataset generated files from `./typescript/public/models`.
Technical Overview
Read the technical overview documentation.
Future ideas
Add tests
Add example that predicts from AWS Lambda
Experiment with multi layer language models based on character features like bigrams or trigrams for transfer learning, probably using a custom BiLSTM or LSTM architecture similar but simplier to Universal Language Model Fine-tuning for Text Classification (blog post).
Author
Rodrigo Pimentel