Botium-connector-alexa-avs NPM

Botium Connector for Amazon Alexa with AVS

This is a Botium connector for testing your Amazon Alexa Skills.

Did you read the Botium in a Nutshell articles ? Be warned, without prior knowledge of Botium you won't be able to properly use this library!

How it works ?

A virtual Alexa device is registered to be used in testing, and it Works the same ways as a physical Alexa device. It is not bound to any Alexa Skill, so you have to activate your Alexa skill with its activation utterance.

The steps for Botium to run a conversation with an Alexa skill are:

Converts BotiumScript text to speech (Botium Speech Processing, Cloud Speech-to-Text API or Amazon Polly)
Asks Alexa with Amazon AVS
Converts answer to text (Botium Speech Processing, Cloud Text-to-Speech API, aka Cloud Speech API or Amazon Transcribe)
Makes BotiumScript assertions

Warning 1: TTS and STT can translate wrong. And so the test will fail, even if Alexa works well. Google STT handles this problem more sophisticated. See ALEXA_AVS_STT_GOOGLE_CLOUD_SPEECH_SEND_TEXT_AS_PHRASE_HINT Capability. Another option is to use a homophones list to translate utterances always recognized wrong ("let us" vs "lettuce" ...), see ALEXA_AVS_STT_HOMOPHONES capability below.

Warning 2: When using the cloud-based APIs, there are costs involved.Please check the pricing of the choosen cloud APIs.

It can be used as any other Botium connector with all Botium Stack components:

The Alexa skill to test doesn't have to be published - it can be tested while still in "development mode", making this connector the perfect choice for Continuous Testing in CI Pipelines.

Requirements

For Text-To-Speech and Speech-To-Text, this connector currently supports Botium Speech Processing as well as cloud services by Amazon and Google. You only have to configure one of them.

Node.js v10

Node.js v8 required, but Node.js v10 recommended because Http2 module of Node.js (AVS uses HTTP2)

Botium Speech Processing

Please see Botium Speech Processing repository for installation instructions.

Google Cloud Text-to-Speech API

Select or create a Cloud Platform project
Enable billing for your project (free tier available).
Enable the Google Cloud Text-to-Speech API.
Set up authentication with a service account so you can access the API from your local workstation.
1. Make sure to select Owner or Cloud Speech Service Agent as role
Save the JSON credentials file, you will need it later.

Google Cloud Speech-to-Text API

Same steps as in Google Cloud Text-to-Speech API, just other API
It is recommended that Speech-to-Text and Text-to-Speech API are sharing the same project and the same credentials

Amazon Polly

See steps

In short:

Create an IAM user (see here for help)
- Important: choose Programmatic access as access type
- Note access key and secret, you will need it later
Choose Attach existing policies to user directly to give permissions AmazonPollyFullAccess
- Feel free to use finer grained policies if you know what you are doing

Amazon Transcribe

Amazon Transcribe is very slow for our usecase, use Botium Speech Processing or Google Cloud Speech-to-Text API if possible

Amazon Transcribe only worked for english language in our tests

Create an S3 Bucket
- Botium uses default name botium-connector-alexa-avs
Amazon Polly and Amazon Transcribe are sharing the same IAM user by default (can be changed in botium.json later)
Add existing policies AmazonTranscribeFullAccess and AmazonS3FullAccess to this user
- Feel free to use finer grained policies if you know what you are doing

Amazon AVS API of the Product to test

In the Alexa Voice Service Developer console, you have to register a virtual Alexa device and enable code-based linking.

Follow Step 1: Enable CBL
note your "Client ID", the "Client secret" and your "Product ID", you will need it later

Preparing Botium Capabilities

The connector repository includes a tool to compose the Botium capabilities (including private keys, access tokens etc). Create a project directory of your choice, and follow the steps below.

1. Prepare amazonConfig.json

Note: If you use Botium Box, then this step is optional. You can import the created amazonConfig.json in Botium Box, or you enter the values (AVS Product ID, Client ID...) in the Botium Box connector wizard.

Copy AVS "Client ID", the "Client secret" and your "Product ID" from steps above (Amazon AVS API of the Product to test) to a file named amazonConfig.json (see sample in cfg folder of this repository):
When using Amazon Transcibe / Amazon Polly:
- Set region you want to use (Be aware, a region has not all APIs. For example eu-west-1 has Polly, Transcribe, and S3 too)
- Copy Access Key ID, and Secret Access Key from steps above (Amazon Polly, Amazon Transcribe)
When using Amazon Transcibe
- Create a bucket in S3, or use an existing one.

{
  "deviceInfo": {
    "clientId": "xxx_required_for_AVS",
    "clientSecret": "xxx_required_for_AVS",
    "productId": "xxx_required_for_AVS"
  },
  "region": "xxx_optional_Polly_or_Transcribe",
  "accessKeyId": "xxx_optional_Polly_or_Transcribe",
  "secretAccessKey": "xxx_optional_Polly_or_Transcribe",
  "bucketName": "xxx_optional_Polly_or_Transcribe"
}

2. Prepare googleConfig.json (Optional)

Copy and rename Google Cloud JSON credentials to a file named googleConfig.json. (Suppose Speech-to-Text and Text-to-Speech API are sharing the same project)

{
  "type": "service_account",
  "project_id": "xxx",
  ...
}

3. Run the "Botium Connector Alexa AVS Initialization Tool"

There are several ways of running this tool, depending on how you installed it:

When you are using the Botium CLI, then just run

> botium-cli init-alexa-avs

When you installed the NPM package for this repository, then run

> botium-connector-alexa-avs-init

When you cloned or downloaded this repository, and you are in the "samples" folder, then run

> npm run init-alexa-avs
or
> ./node_modules/.bin/botium-connector-alexa-avs-init

If you use Botium Box, you dont have to use this tool - follow the suggested steps, you will be presented a hyperlink you have to open in your browser to connect the Botium virtual device to your Amazon account.

4. Use the generated botium.json

A file named botium.json is generated containing the required capabilities to be used with Botium.

You can remove the amazonConfig.json and googleConfig.json now

Supported Capabilities

Set the capability CONTAINERMODE to alexa-avs to activate this connector.

ALEXA_AVS_AVS_CLIENT_ID

See json downloaded from AVS

ALEXA_AVS_AVS_CLIENT_SECRET

See json downloaded from AVS

ALEXA_AVS_AVS_REFRESH_TOKEN

The simpliest way to acquire it, is the initialization tool described above

ALEXA_AVS_AVS_LANGUAGE_CODE

Language setting for Alexa. Example: en_US

ALEXA_AVS_STT_HOMOPHONES

This connector uses speech recognition for turning the output speech coming back from Alexa into text. This process is not perfect, even with the best STT engines available - to compensate for this, homophones can be specified for errors that occur when a reply from Alexa is misunderstood. The text will be replaced in actual responses from the virtual Alexa device.

Homophones lists can be specified as JSON lists or as CSV, embedded in botium.json or as an external file.

Specify in botium.json

  "ALEXA_AVS_STT_HOMOPHONES": {
    "lettuce": ["let us"],
    "fairy": ["ferry", "fair I"]
  }

Specify in botium.json as CSV text

  "ALEXA_AVS_STT_HOMOPHONES": "lettuce,let us\r\nfairy,ferry,fair I"

Point to homophones file in botium.json

  "ALEXA_AVS_STT_HOMOPHONES": "./homophones.txt"

homophones.txt:

lettuce,let us
fairy,ferry,fair I

Capabilities for Speech-To-Text

ALEXA_AVS_STT_URL

ALEXA_AVS_STT_PARAMS

ALEXA_AVS_STT_METHOD

ALEXA_AVS_STT_BODY

ALEXA_AVS_STT_HEADERS

ALEXA_AVS_STT_TIMEOUT

ALEXA_AVS_STT_SEND_TEXT_AS_PHRASE_HINT

Default: true

After we got speech response from Alexa, and we are sending it to Google STT, there is a possibility to send the expected answer with it. Google will use this as hint.

If this flag is true, then the test is not strict. It can accept small differences between answer of Alexa, and our expectations in testcase. Because Google STT corrects the difference.

If this flag is false, then the test is strict. A test can fail even if the answer of Alexa, and our expectations in testcase are the same. Just because Google STT doesn't translate well.

ALEXA_AVS_STT_SEND_TEXT_AS_PHRASE_HINT_USE_NEGATED

Default: true

The utterance is negated

#me
hi!

#bot
!goodbye

Then it will be send as expected answer to Google STT, expect this flag is false.

Capabilities for Text-To-Speech

ALEXA_AVS_TTS_URL

ALEXA_AVS_TTS_PARAMS

ALEXA_AVS_TTS_METHOD

ALEXA_AVS_TTS_BODY

ALEXA_AVS_TTS_HEADERS

ALEXA_AVS_TTS_TIMEOUT

Open Issues and Restrictions

If a text is very long (more thousand), then connector dies because AVS error. Long messages should be sent in chunks.
Stream/connection create/close optimalization. Create/close global, for dialog, or for question-answer pair?
Auth token lifetime is 1h. Deal with it. (Acquire it frequently, or catch error, and refresh token)