0.0.2 • Published 8 years ago

tjd-dol-5500-sample-utterances v0.0.2

Weekly downloads
3
License
MIT
Repository
github
Last release
8 years ago

sampleutterances

Nodejs program that creates a list of slot values for a custom slot type in the Amazon Echo service

Installation

npm install sampleutterances

Usage

This program will connect to a mongodb instance, query a database, and return a document of plans that match.

It accepts the following command line parameters:

--output

Provides an output file name that will contain the slot values. Any existing file with the same name will be deleted.

Default

./Sponsor Name Slot Values.txt

Example

node app.js --output="myoutputfile.txt"

--dburl

Provides the full path to the MongoDb instance and database name.

Default

mongodb://10.0.0.27/LawyerServices

Example

node app.js --dburl=mongodb:://mywebaddress/mymongodbname

--maxentries

Sets the maximum number of entries this program will produce. The maximum of 50,000 is set by Amazon so there is no reason to specify this argument unless you're just testing or Amazon updates their max and I fail to effect a contemporaneous update.

Default

50000

Example

node app.js --maxentries=100

--maxcharacters

Sets the maximum number of characters for all the entries combined. The maximum of 600,000 is set by Amazon so there is no reason to specify this argument unless you're just testing or Amazon updates their max and I fail to effect a contemporaneous update.

Default

600000

Example

node app.js --maxcharacters=32768

--minparticipants

Each document in the database describes an employer benefit plan as drawn from the U.S. Department of Labor. Each such document contains a property that discloses the number of participants in the plan. There are more plan sponsors than can be accommodated by Amazon's custom slot feature, so we have to limit the plans we pull. I limit it based on the number of participants on theory that if I extract the plans with the highest number of participants, I will have extracted the plans that users are most likely to request.

You have to tune this number carefully so that you squeeze out all that you can from the database into Amazon's custom slot. After the program runs, it will give you some idea about whether it thinks you should change this value and rerun the program.

Default

275

Example

node app.js --minparticipants=250

minParticipants = getArgument("--minparticipants", minParticipants) * 1;

Example Invocation

The following command line

node app.js --minparticipants=300

Will produce output like this:

Connecting to MondoDb
Connected OK
Querying for plan names having more than 300 participants
Found 30426 plans. Normalizing and converting names . . .
Eliminated 4494 duplicates in this phase.
Sorting 25932 plan names to remove additional duplicates
Writing results.
Eliminated 131 duplicate entries.
Created 25801 sample utterances containing 598468 characters.

Notes

I can't imagine that application has any general use. It is specifically designed to take a list of employee benefit plans and create a file that can be copied and pasted into Amazon's Echo service as part custom slot type relating to a custom skill. In particular, it has been developed for my use in implementing a custom "Alexa" skill using Amzaon's Echo Voice Services.

To Do

The program works. A further refinement would be to eliminate near-duplicates. For example, currently it would pass through the following two entries:

ABX AIR
ABX AIR PROFIT SHARING

Those are two different plans, for sure, but those are the same sponsors and a user is likely to ask for "ABX AIR" anyway and get both results. Squeezing these and names like these would increase the number of slot values we can provide Amazon and thereby improve the recognition rate for some more obscure plans.

Copyright

Copyright (c) 2016 by Thomas J. Daley <tjd@powerdaley.com>