s4-cli v2.2.0
s4-cli
Table of Contents
Overview
The s4-cli is a command line tool for the s4 service that allows users to send audio data to the service, obtaining text or source separated audio as a response. Data can be sent to the service from multiple sources (pre recorded/real time), while requesting different processing options (batch/stream), and different responses (test/cleaned up audio).
Data Sources
NOTE: Currently, all pre recorded data sources for ASR must be be encoded as 16-bit, 16kHz .wav or PCM. This applies to ASR only, in both batch and stream mode.
Pre Recorded Files
The tool can be used to send one or more pre recorded files (in .wav format) to the service, either as individual files, or as a group of files in a directory. When sending a collection of files from a directory, regular expression matching patterns can be used to filter the files that are chosen for processing.
Real Time Audio
The tool also supports the capture of real time audio data from the default microphone on the device. It relies on Sox to access the microphone, and currently assumes a default microphone configuration (8 channel, 16KHz sample rate, 32 bit signed integer).
Processing Options
Data can be processed by the service in one of two modes batch, or stream. The key difference between the two processing modes is that batch mode processing combines all of the input data into a single file before processing, while stream mode processing processes audio chunks as they are received by the server. Note that batch mode is currently not supported when using real time audio.
Response Types
The service returns one of two possible responses text or audio. When text mode is chosen, the service performs ASR on the cleaned up audio (and raw input), returning the resultant text from the separated audio. When audio mode is chosen, the service does not attempt any ASR, but instead returns the cleaned up audio stream as a response.
The following table summarizes the different options available when using the CLI tool:
| Stream Mode | Batch Mode | |
|---|---|---|
| Pre Recorded Audio | text/audio | text/audio* |
| Real Time Audio | text/audio | Not Supported |
*Batch mode does not provide cleaned up audio as a direct response. However, cleaned up streams are stored in the cloud, and can be downloaded using the request id
Installation
Prerequisites
The following is a list of prerequisites required to run the s4-cli
- NodeJS
v0.12.0- Required to run the tool - Sox
v14.4.1- Required to capture real time audio - Git - Required to download and install the tool from npm
NodeJS
Install version 0.12.0 of NodeJS.
Windows
An installation package for Node can be downloaded from the NodeJs downloads page. Download and install the appropriate installation package for your operating system.
Mac OSX
An installation package .dmg file for Mac OSX can be downloaded and installed from the NodeJs downloads page.
Alternate Approach:
NodeJs can be installed via HomeBrew by running the following in a command shell:
brew update
brew install nodeUbuntu
Instructions on installing NodeJS on Linux using a package manager can be found here https://github.com/joyent/node/wiki/Installing-Node.js-via-package-manager
Sox
This program is only required if the tool will be used with real time audio capture.
Windows
The simplest way to install this package would be download and run the installation package for your platform from the Sox downloads page.
The downloaded package is a .zip file that contains the sox executable, and related files. These files can be extracted to any convenient location on file system. Once extracted, ensure that the sox folder has been added to the PATH variable.
This can be done by updating the path variable within a terminal shell as follows:
PATH=%PATH%;c:\sox-14.4.1\;This change only applies to the command shell that it is executed in. If a global setting is preferred, update your path variable under Environment Variables. This panel can be found here: My Computer --> Properties --> Advanced --> Environment Variables.
Mac OSX
An installation package .dmg file for Mac OSX can be downloaded and installed from the Sox downloads page.
Alternate Approach:
Sox can be installed via HomeBrew by running:
brew update
brew install soxUbuntu
Sox can be installed on linux using a package manager. The following is an example that uses apt-get to install Sox on Ubuntu.
sudo apt-get update
sudo apt-get install sox libsox-fmt-allGit
Git is a version control tool that is used, among other things, to create copies of code from a remote source control repository. In this case, node's package manager tool npm uses the Git to download and install the CLI on the local computer.
Windows
The simplest way to install this program would be to download and run the installation package for your platform from the Git downloads page.
Mac OSX
An installation package .dmg file for Mac OSX can be downloaded and installed from the Git downloads page
Alternate Approach:
Git can be installed via HomeBrew by running:
brew update
brew install gitUbuntu
Git can be installed on linux using a package manager. The following is an example that uses apt-get to install Git on Ubuntu.
sudo apt-get update
sudo apt-get install gitInstallation
Once all the prerequisites have been installed, s4-cli can be installed via npm by running the following in a command shell:
NOTE: The
command shellrefers to the terminal program in Mac OSX/Linux, or thecmd.exeprogram on Windows operating systems
npm install -g s4-cliNOTE: The above command may sometimes fail because elevated privileges may be required (this depends on how nodejs/npm has been setup).
If that is the case, the problem can be resolved by prefixing the above command with
sudo(Linux/Mac OSX), or by running the command in a terminal window running with administrator privileges (Windows)
You can test if the CLI has been installed correctly by typing:
s4-cli --helpThe above command should display the available command line options for the s4-cli tool.
Usage
This section outlines the common use cases for using the s4-cli tool on the command line. The basic usage of the s4-cli tool is as follows:
NOTE: The
command shellrefers to the terminal program in Mac OSX/Linux, or thecmd.exeprogram on Windows operating systems
s4-cli [ACTION] [OPTIONS]Where:
[ACTION]: This argument specifies the type of action to perform on the input to the service. This argument can be one of the following values:
asr-batch: Requests the service to clean up the data in batch mode, and then perform ASR on the cleaned up data, and return the text obtained by performing ASR on the separated audio.asr-stream: Requests the service to clean up the data in streaming mode, and then perform ASR on the cleaned up data, and return the text obtained by performing ASR on the separated audio.audio-stream: Requests the service to clean up the data in streaming mode, and return the cleaned up stream.
[OPTIONS]: Are other arguments that can be passed to the command line tool. The following is a brief summary of supported options:
--help: Displays help information that includes usage and options details.--urlor-u: The base url of the service, including protocol type. If not specified, this parameter defaults to:http://s4front-end.elasticbeanstalk.com/--api-keyor-a: This is the API key that uniquely identifies the entity making the request. Please contact your ADI representative if you do not have a key, and would like to obtain one.--mic-config: This is a microphone configuration parameter that is sent to the service. This parameter will be used by the service when processing the input. If not specified, this parameter defaults todefault microphone config--algorithm: This parameter identifies the algorithm used to clean up the input data. If not specified, this parameter defaults to:ntf-v1--tag: This is a string parameter that will be used as the folder name under which input/output artifacts are stored in cloud storage. This value is especially useful when multiple files are being processed simultaneously, and it is desirable to tag the files so that they may be reviewed as a group--input-file: When specified, this parameter identifies a single input file that will be sent to the service for processing.--input-dir: When specified, this parameter identifies a directory whose entire file contents will be sent to the service for processing. Files within the directory may be filtered using the the--patternoption--audio-device: When sepcified, this parameter indicates that real time audio will be captured from the default audio device, and sent to the service for processing--pattern: This parameter can be used in conjunction with the--input-diroption to specify a regular expression filter that will be applied on the names of the files within the input directory. Only files that match the regular expression pattern will be selected for further processing.--output-dir: Specifies the directory in which output artifacts generated by the CLI will be stored. If not specified, this parameter defaults to./out. Note that this directory must exist on the file system if raw audio is requested from the server.--output-summary: An optional file name that will contain a report of execution. The report will be stored inreport.jsonif this parameter is omitted. The file will be created in the output directory, as specified by the--output-dirparameter.
Some things to remember:
- At least one action parameter has to be specified (
asr-batch,asr-streamoraudio-stream).- At least one input source has to be specified (
--input-file,--input-diror--audio-device)- If the action specified requires the server to return an audio stream, the output directory specified by
--output-dirmust exist on the file system- If a tag value (
--tag) is specified, files will be stored under a directory with the same name as the tag value.
Considerations for Real Time Audio
Configuring the Default Microphone
When recording real time audio, the CLI attempts to capture data directly from the default microphone on the computer. It is important to ensure that the default microphone has been set correctly before using the CLI.
For example, this can be done on Mac OSX by running the following in the terminal:
set AUDIODEV=hw:1Note that this is not a global setting, and only applies to all s4-cli execution within that terminal session.
Waiting for Recording to Start
On some computers, there could be a slight delay between when the s4-cli starts execution, and when actual recording commences. It is recommended that the user pause until the following message is displayed:
Audio is being captured from the default audio device. Press <ESC> to stopRecording can be stopped by pressing the ESC key.
Execution Behavior
This section provides an overview of the execution behavior of the service. While the documentation provided here is geared towards using the CLI, the behavior of the service remains the same irrespective of how it is accessed.
- When a request is received by the service, it generates a unique id for the request, called the
requestId. This id is globally unique, and is used to tag all input to and output generated by the service. - All input sent to the service will be stored in the cloud (AWS S3). The following are the rules used when storing data:
- All files in S3 are partitioned by API key. This means that each API key has a separate S3 partition allocated to it.
- Each request is assgined a separate folder that will in turn hold three artifacts for every input file (1) The unprocessed input file, (2) The cleaned up audio data, (3) The noisy audio data
- The folder will have the same name as the
requestId, unless a tag value (--tag) is specified. If the tag value is specified, it will be used to name the S3 folder - Note that reusing the same tag value for multiple requests will result in previous results being overwritten by the latest request
- The service will process the request data, and send responses back to the client (in this case the CLI)
- The CLI shows responses from the service on the terminal, and also optionally saves summary information in a
.jsonfile. For audio processing request where the response is cleaned up audio, the service response will be saved in the output directory with the same name as therequestId.
Summary Report Format:
The summary report for requests is stored in a .json file. The following is an example of the summary report format:
[
{
"inputType":"file",
"input":"data/SoundTest.wav",
"message":"OK",
"success":true,
"output":{
"0":{ "status":"success", "text": "this" },
"1":{ "status":"success", "text": "this" },
"2":{ "status":"success", "text": "this" },
"3":{ "status":"success", "text": "this is" },
"4":{ "status":"success", "text": "this is" },
"5":{ "status":"success", "text": "this is" },
"6":{ "status":"success", "text": "this is a" },
"7":{ "status":"success", "text": "this is a" },
"8":{ "status":"success", "text": "this is a" },
"9":{ "status":"success", "text": "this is a test" }
"metadata":{
"requestId":"e31cd313-9eef-4d36-9938-a3936c19c9de",
"s3Folder":"e31cd313-9eef-4d36-9938-a3936c19c9de",
"micConfig":"az7",
"algorithm":"ntf-v1"
}
}
},
{
...
}
]The file contains a JavaScript array with one object for every input received as a part of the request. Each object in turn contains information about the type of request, and also the response from the server, including metadata such as the id of the request, folder name in S3, etc.
Examples
Perform clean up and ASR in batch mode on a single pre recorded file:
s4-cli asr-batch --api-key=_apikey_ --mic-config= _micconfig_ --input-file=./data/sound-recording-1.wavPerform clean up and ASR in stream mode on all .wav files in a given directory:
s4-cli asr-stream --api-key=_apikey_ --mic-config= _micconfig_ --input-dir=./data --pattern='.*.wav$'Perform clean up in stream mode on real time data captured from the default audio device:
s4-cli audio-stream --api-key=_apikey_ --mic-config= _micconfig_ --audio-device10 years ago
10 years ago
10 years ago
10 years ago
11 years ago
11 years ago
11 years ago
11 years ago
11 years ago
11 years ago
11 years ago
11 years ago
11 years ago
11 years ago
11 years ago
11 years ago
11 years ago
11 years ago