Dialogue Cloud

Speech Engines for Anywhere365

Introduction

Anywhere365 supports different Speech Providers for Text-To-Speech (TTS) and Speech Recognition (SR). By default, Anywhere365 uses the locally installed ‘MicrosoftSpeechSynthesizer’ which is always a part of Anywhere365. Additionally, modern cloud connected speech engines with ever evolving cognitive capabilities (i.e. new words, languages and dialects are constantly being added) can be configured as a much more intelligent and natural sounding voice interaction experience. Only one additional cloud connected speech provider can be configured per UCC A Unified Contact Center, or UCC, is a queue of interactions (voice, email, IM, etc.) that are handled by Agents. Each UCC has its own settings, IVR menus and Agents. Agents can belong to one or several UCCs and can have multiple skills (competencies). A UCC can be visualized as a contact center “micro service”. Customers can utilize one UCC (e.g. a global helpdesk), a few UCC’s (e.g. for each department or regional office) or hundreds of UCC’s (e.g. for each bed at a hospital). They are interconnected and can all be managed from one central location.. If a cloud connected speech provider is configured for use but (temporarily) unavailable or mis-configured, the ‘MicrosoftSpeechSynthesizer’ will always act as a fallback.

For a graphical representation of the speech engine inter-connections see the Architecture pages. Learn More

The currently supported speech engines are:

  • MicrosoftSpeechSynthesizer

  • MicrosoftCognitiveServices

  • GoogleCloudTextToSpeechV1

Note: Not all messages as configured in a UCC SharePoint site can leverage the cloud connected speech providers and can only be used with the legacy 'MicrosoftSpeechSynthesizer' for text-to-speech messages. See below which message types can be played with which engine if relying on SharePoint configuration alone.
To have a full IVR Interactive Voice Response, or IVR, is a telephone application to take orders via telephone keypad or voice through a computer. By choosing menu options the caller receives information, without the intervention of a human operator, or will be forwarded to the appropriate Agent. flow experience based on cognitive cloud service based voices, upgrade to a License with Dialogue Studio and configure the entire IVR flow in Dialogue Studio.

MicrosoftSpeechSynthesizer

This is the Text-To-Speech provider installed as default part of Anywhere365. This provider can be used for all text to speech operations.

Note: The MicrosoftSpeechSynthesizer is the Microsoft Server based speech SDK which has not been further developed by Microsoft since the year 2011. It supports just 26 language dialects of varying and fixed quality with one female voice per language dialect only for Text-To-Speech (TTS) as well a rudimentary, short-phrase, Speech Recognition (SR) engine for the same 26 language dialects. Learn More - Microsoft

Supported message types

How to enable

To enable, no settings need to be added to SharePoint (as it is the default), but the settings below may be configured in the global "Settings" list for testing purposes (for example, to differentiate with the cloud speech engines mentioned further below):

UCC Settings list

Setting​

Description

Value

Remark

SpeechProvider​

The Anywhere365 technical name for the Microsoft server speech service.

MicrosoftSpeechSynthesizer

Optional as this is the default version

SpeechPreferredVoiceName

The full Anywhere365 name for the Microsoft voice voice
(use the full syntax as shown in example).

Example:
Microsoft Server Speech Text to Speech Voice (en-GB, Hazel)

Optional, if not specified, the global setting "CultureInfo" will be used to determine the voice.
(As only one voice is supported for the local server Speech Synthesizer this setting is quite redundant. The advice is to rely on the CultureInfo setting only)

See image below for a list of all TTS voices available for the MicrosoftSpeechSynthesizer:

MicrosoftCognitiveServices​

This speech provider uses the Azure cloud cognitive (i.e. always learning and improving) service for TTS operations. This provider offers better quality compared to the default provider. This advanced text to speech provider can only be used for specific message types:

Supported message types

  • Waiting Queue messages configured in the IVRQuestions list on SharePoint

  • Callback Queue messages configured in the IVRQuestions list on SharePoint

  • All Dialogue Studio voice nodes (like Ask, Say, QM)

How to enable

To enable this speech provider you have to configure the global Settings list with settings below.

A Cognitive Services API key can be obtained through the Azure Portal. After you have added a "Cognitive Services" resource to your Azure Subscription, copy key 1 from the "Keys and Endpoint" section. Make sure to use correct endpoint addresses in the settings below when you choose to host the service in a region other than western Europe.

UCC Settings list

Setting​

Description

Value

Remark

​SpeechMicrosoftCognitiveApiKey​

The api key for Microsoft Cognitive services. ​

<Long String>​

Mandatory

​SpeechMicrosoftCognitiveApiEndpoint

The api endpoint for Microsoft Cognitive services.

Example: 

https://westeurope.tts.speech.microsoft.com/cognitiveservices/v1

Mandatory

SpeechMicrosoftCognitiveApiAuthorizationEndpoint

The authorization endpoint for Microsoft Cognitive services.

Example: 

https://westeurope.api.cognitive.microsoft.com/sts/v1.0/issueToken

Mandatory

SpeechProvider

The Anywhere365 technical name for the Microsoft Cognitive cloud speech services.

MicrosoftCognitiveServices

Mandatory

​​SpeechPreferredVoiceName

The full Anywhere365 name for the Microsoft voice voice
(use the full syntax as shown in the example made up of platform provider followed by in brackets the culture, comma and voice-name).
For a current list of selectable voices Learn More - Microsoft

Example:

Microsoft Server Speech Text to Speech Voice (en-GB, OliverNeural)

Optional, if not specified, the global setting "CultureInfo" will try to approximate an appropriate voice (first voice for the language retrieved from the cloud provider).

Note: The highly advanced, and even customizable, Microsoft Cognitive Services cloud Text-to-Speech is subject to additional costs, these may even vary per selected voice and quality, and are billed on the customer's Azure subscription. For rates see https://azure.microsoft.com/en-us/pricing/details/cognitive-services/speech-services/.

GoogleCloudTextToSpeech​

Google Cloud Text to Speech service provides the most comprehensive set of voices. For a complete overview of all voices check https://cloud.google.com/text-to-speech/docs/voices. This advanced text to speech provider can only be used for specific message types:

Supported message types

  • Waiting Queue messages configured in the IVRQuestions list on SharePoint

  • Callback Queue messages configured in the IVRQuestions list on SharePoint

  • All Dialogue Studio voice nodes (like Ask, Say, QM)

How to enable

To enable this speech provider you have to configure two lists, global Settings and the PluginSettings.

The PluginSettings list will contain the Credentials JSON of the Google service.

UCC Settings List

Setting​

Description

Value

Remark

SpeechProvider​

The Anywhere365 technical name for the Google Cognitive cloud speech services.

GoogleCloudTextToSpeechV1

Mandatory

SpeechPreferredVoiceName

The full Anywhere365 name for the Google cloud voice
(use the full syntax as shown in the example).
For a list of selectable voices Learn More - Google

Example:

nl-NL-Wavenet-C

Optional, if not specified, the global setting "CultureInfo" will try to approximate an appropriate voice (first voice for the language retrieved from the cloud provider).

PluginSettings List

Setting​

Scope​​

Value

Remark

GoogleAppCredentialsJson​

TextToSpeech

Json generation in Google Cloud.

Example:

Mandatory

Note: The highly advanced, and even customizable, Google Cloud Text-to-Speech is subject to additional costs, these may even vary per selected voice and quality, and are billed on the customers Google Cloud subscription. For rates see: https://cloud.google.com/text-to-speech/pricing

Configuring Voice Selection in UCC

By default, the UCC will select the voice which name matches the value of the setting SpeechPreferredVoiceName (a setting in the global "Settings" list of the UCC, and should match the "Voicename" of the corresponding text to speech provider).

If there is no exact match, it will try to select a voice based on the value of the setting "CultureInfo". If there are multiple matching voices (for say the CultureInfo en-AU), the first voice that matches the criteria will be selected from the cloud speech provider.

When a speech provider can't initialize or is misconfigured the MicrosoftSpeechSythesizer will be used as fallback.

Configuring Voice Selection Dialogue Studio

Via Dialogue Studio it is possible to configure the voice in three ways:

  1. Default (UCC Configured); see above "Configuring Voice Selection in UCC".

  2. Custom Voice; configure the Culture and Gender, and the UCC will select a voice of the configured text to speech provider matching the criteria.

  3. SSML; this makes it possible to select multiple voices for each individual node in your flow (useful for multilingual messages).