@davi-ai/speechmarkdown-davi-js NPM

speechmarkdown-davi-js

Speech Markdown grammar, parser, and formatters for use with JavaScript.

Supported platforms:

microsoft-azure

Partial / no support:

amazon-alexa
amazon-polly
amazon-polly-neural
google-assistant
samsung-bixby

how to use

import { SpeechMarkdown } from '@davi-ai/speechmarkdown-davi-js'

const options = {
  platform: 'microsoft-azure',
  includeSpeakTag: false,
  globalVoiceAndLang: {
    voice: 'en-US-JennyMultiLingualNeural',
    lang: 'fr-FR'
  }
}

const speechMarkdownParser = new SpeechMarkdown(options)

You can use multiple options, the most useful ones are :

platform : 'microsoft-azure' to generate SSML for azure neural voices
includeSpeakTag : add or not a tag at the beginning and tag at the ending.
globalVoiceAndLang: { voice?: string, lang?: string } : added for microsoft voices and retorik-framework architecture. If you use a selected voice as main voice, put it in 'voice' field
(format language-CULTURE-VoiceName (ex: en-US-GuyNeural, en-US-JennyNeural)) When using a multilingual voice (ex: JennyMultilingualNeural, if the text has to be spoken in a different language than the one of this language, add
the 'lang' field with the desired language, formatted language-CULTURE (ex: fr-FR, en-US, de-DE, ...)

With theses parameters, you will receive a complete SSML string, excepted for the tag that has to be put manually around. We don't use the includeSpeakTag = true
parameter because it only puts a tag, and to use Microsoft voices we need a complete tag as follows :

  <speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xmlns:mstts="https://www.w3.org/2001/mstts" xmlns:emo="http://www.w3.org/2009/10/emotionml" xml:lang="fr-FR">

available speechmarkdown tags

There are many different tags and most of them have restrictions. To get the current documentation, go to docs.microsoft.com

On 2023/07/28, the available tags are :

voice :
- (text to be read with that voice)voice:"voice name"
- the text can contain other tags except 'voice'
- the voice name can be as follows :
  - language-CULTURE-VoiceName (ex: en-US-GuyNeural, en-US-JennyNeural))
  - full Microsoft name (ex: Microsoft Server Speech Text to Speech Voice (en-US, JennyMultilingualNeural))
- example : (Bonjour, comment ça va ?)[voice:"fr-FR-DeniseNeural"]
lang :
- (text to be read in this language)lang:"language name"
- the text can contain other tags except 'voice' and 'lang'
- the lang name must be formatted as language-CULTURE (ex: fr-FR, en-US)
- example : (Bonjour, comment ça va ?)[lang:"en-US"]
break :
- break time in seconds / milliseconds or break:"strength value"
- strength values :
  - none
  - x-weak
  - weak
  - medium
  - strong
  - x-strong
- example : ts [break:"strong"] / [1s] / [250ms]
silence :
- silence:"type value"
- type and value are required
- type can be :
  - Leading : beginning of text
  - Tailing : end of text
  - SentenceBoundary : between adjacent sentences
- value is an integer giving time in seconds or milliseconds, lower than 5000ms
- example : [silence:"Leading 1s"]
prosody :
- (text for which the prosody will be adjusted)pitch:"value";contour="value";range="value";rate="value";volume="value"
- you can use any of the modifiers below, from one to all of them
- modifiers :
  - pitch
  - contour
  - range
  - rate
  - volume
- example : (this will be spoken slow and high)[rate:"slow";pitch:"high"]
emphasis :
- emphasis:"value" or ++text will be strong++
- value can be / corresponding symbols around text :
  - reduced / -text reduced-
  - none / ~text without change
  - moderate / +text stronger+
  - strong / ++text much stronger++
- example : [emphasis:"moderate"] / +bonjour+
say-as :
- (text to be said as)modifier
- modifier can be :
  - address
  - number
  - characters
  - fraction
  - ordinal
  - telephone
  - time
  - date
- example : I need this answer (ASAP)[characters] / My phyone number is (0386300000)[telephone]
ipa :
- the International Phonetic Alphabet (ipa) allows you to force the pronunciation of a word / sentence
- example : I love (paintball)[ipa:"peɪntbɔːl"]
emotions :
- emotion:"style role/styledegree"
- the style is mandatory, and depends on the voice speaking at that time (ex: fr-FR-DeniseNeural can only use 'sad' and 'cheerful' while ja-JP-NanamiNeural can use
  'chat', cheerful' and 'customerservice')
- role and styledegree are optionnal. Role is a string, while styledegree is a number. Note that 'role' is restricted to very few voices
- example : (It's so cool ! We are going to a great park today !)[voice:"en-US-JennyNeural";emotion:"excited 2"]
audio :
- example :
backgroundaudio :
- backgroundaudio:"src volume fadein fadeout"
- src mandatory, other fields optionnal but all fields on the left must be provided before using one on the right (ex: to use fadein,
  you must have provided a value for src and volume)
- only one backgroundaudio tag possible
- example : [backgroundaudio:"https://cdn.retorik.ai/retorik-framework/audiofiles/audiotest.mp3 0.5 2000 1500"]
lexicon :
- lexicon:"url to the lexicon xml file"
- the lexicon file is restricted to one language (en-US, fr-FR, ...) so it won't be used if the voice uses another language
- it does nothing when using a multilingual voice (ex: JennyMultilingualNeural), even if the lang tag of this voice is the same as the one in the lexicon file
- the lexicon inputs are case-sensitive, for example 'hello' and 'Hello' must be treated separately
- example : [lexicon:"https://cdn.retorik.ai/retorik-framework/lexicon-en-US.xml"] Hi everybody ! BTW how are you today ?
bookmark :
- bookmark:"bookmark text"
- example : Bookmark after city name : first Paris [bookmark:"city1"], then Berlin [bookmark:"city2"]

License

Licensed under the MIT. See the LICENSE file for details.

tslib myna-parser

@davi-ai/retorik-framework

2 years ago

2 years ago

2 years ago

2 years ago

2 years ago

2 years ago