1.0.6 • Published 1 year ago

@davi-ai/speechmarkdown-davi-js v1.0.6

Weekly downloads
-
License
MIT
Repository
-
Last release
1 year ago

TypeScript version Node.js version MIT

speechmarkdown-davi-js

Speech Markdown grammar, parser, and formatters for use with JavaScript.

Supported platforms:

  • microsoft-azure

Partial / no support:

  • amazon-alexa
  • amazon-polly
  • amazon-polly-neural
  • google-assistant
  • samsung-bixby

how to use

import { SpeechMarkdown } from '@davi-ai/speechmarkdown-davi-js'

const options = {
  platform: 'microsoft-azure',
  includeSpeakTag: false,
  globalVoiceAndLang: {
    voice: 'en-US-JennyMultiLingualNeural',
    lang: 'fr-FR'
  }
}

const speechMarkdownParser = new SpeechMarkdown(options)

You can use multiple options, the most useful ones are :

  • platform : 'microsoft-azure' to generate SSML for azure neural voices
  • includeSpeakTag : add or not a tag at the beginning and tag at the ending.
  • globalVoiceAndLang: { voice?: string, lang?: string } : added for microsoft voices and retorik-framework architecture. If you use a selected voice as main voice, put it in 'voice' field
    (format language-CULTURE-VoiceName (ex: en-US-GuyNeural, en-US-JennyNeural)) When using a multilingual voice (ex: JennyMultilingualNeural, if the text has to be spoken in a different language than the one of this language, add
    the 'lang' field with the desired language, formatted language-CULTURE (ex: fr-FR, en-US, de-DE, ...)

With theses parameters, you will receive a complete SSML string, excepted for the tag that has to be put manually around. We don't use the includeSpeakTag = true
parameter because it only puts a tag, and to use Microsoft voices we need a complete tag as follows :

  <speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xmlns:mstts="https://www.w3.org/2001/mstts" xmlns:emo="http://www.w3.org/2009/10/emotionml" xml:lang="fr-FR">

available speechmarkdown tags

There are many different tags and most of them have restrictions. To get the current documentation, go to docs.microsoft.com

On 2023/07/28, the available tags are :

  • voice :
    • (text to be read with that voice)voice:"voice name"
    • the text can contain other tags except 'voice'
    • the voice name can be as follows :
      • language-CULTURE-VoiceName (ex: en-US-GuyNeural, en-US-JennyNeural))
      • full Microsoft name (ex: Microsoft Server Speech Text to Speech Voice (en-US, JennyMultilingualNeural))
    • example : (Bonjour, comment ça va ?)[voice:"fr-FR-DeniseNeural"]
  • lang :
    • (text to be read in this language)lang:"language name"
    • the text can contain other tags except 'voice' and 'lang'
    • the lang name must be formatted as language-CULTURE (ex: fr-FR, en-US)
    • example : (Bonjour, comment ça va ?)[lang:"en-US"]
  • break :
  • silence :
    • silence:"type value"
    • type and value are required
    • type can be :
      • Leading : beginning of text
      • Tailing : end of text
      • SentenceBoundary : between adjacent sentences
    • value is an integer giving time in seconds or milliseconds, lower than 5000ms
    • example : [silence:"Leading 1s"]
  • prosody :
  • emphasis :
    • emphasis:"value" or ++text will be strong++
    • value can be / corresponding symbols around text :
      • reduced / -text reduced-
      • none / ~text without change
      • moderate / +text stronger+
      • strong / ++text much stronger++
    • example : [emphasis:"moderate"] / +bonjour+
  • say-as :
    • (text to be said as)modifier
    • modifier can be :
      • address
      • number
      • characters
      • fraction
      • ordinal
      • telephone
      • time
      • date
    • example : I need this answer (ASAP)[characters] / My phyone number is (0386300000)[telephone]
  • ipa :
    • the International Phonetic Alphabet (ipa) allows you to force the pronunciation of a word / sentence
    • example : I love (paintball)[ipa:"peɪntbɔːl"]
  • emotions :
    • emotion:"style role/styledegree"
    • the style is mandatory, and depends on the voice speaking at that time (ex: fr-FR-DeniseNeural can only use 'sad' and 'cheerful' while ja-JP-NanamiNeural can use
      'chat', cheerful' and 'customerservice')
    • role and styledegree are optionnal. Role is a string, while styledegree is a number. Note that 'role' is restricted to very few voices
    • example : (It's so cool ! We are going to a great park today !)[voice:"en-US-JennyNeural";emotion:"excited 2"]
  • audio :
    • "src"
    • example : "https://cdn.retorik.ai/retorik-framework/audiofiles/audiotest.mp3"
  • backgroundaudio :
    • backgroundaudio:"src volume fadein fadeout"
    • src mandatory, other fields optionnal but all fields on the left must be provided before using one on the right (ex: to use fadein,
      you must have provided a value for src and volume)
    • only one backgroundaudio tag possible
    • example : [backgroundaudio:"https://cdn.retorik.ai/retorik-framework/audiofiles/audiotest.mp3 0.5 2000 1500"]
  • lexicon :
    • lexicon:"url to the lexicon xml file"
    • the lexicon file is restricted to one language (en-US, fr-FR, ...) so it won't be used if the voice uses another language
    • it does nothing when using a multilingual voice (ex: JennyMultilingualNeural), even if the lang tag of this voice is the same as the one in the lexicon file
    • the lexicon inputs are case-sensitive, for example 'hello' and 'Hello' must be treated separately
    • example : [lexicon:"https://cdn.retorik.ai/retorik-framework/lexicon-en-US.xml"] Hi everybody ! BTW how are you today ?
  • bookmark :
    • bookmark:"bookmark text"
    • example : Bookmark after city name : first Paris [bookmark:"city1"], then Berlin [bookmark:"city2"]

License

Licensed under the MIT. See the LICENSE file for details.

1.0.6

1 year ago

1.0.5

1 year ago

1.0.4

2 years ago

1.0.3

2 years ago

1.0.2

2 years ago

1.0.1

2 years ago