What’s New in Windows Phone 8 (3 out of 8)–Voice Recognition

November 1, 2012

This post dedicated to additional interesting feature in Windows Phone 8 – voice recognition and speech synthesizing.

Speech synthesizing

Let’s start from speech synthesizing. The functionality localized in Windows.Phone.Speech.Synthesis namespace. Using classes in this namespace you can create text-to-speech (TTS) scenarios in your application. The functionality enables read some document identified by URI, string of text with SSML markup or plain text. My sample focuses on plain text scenario.

Class responsible for speech synthesizing is SpeechSynthesizer. To speak plain text use code snippet below:

SpeechSynthesizer synth = new SpeechSynthesizer();

await synth.SpeakTextAsync("Hello from Windows Phone 8");

Speech synthesizer supports changing speech voice from system installed voice. The following code snippet select random voice and speaks out greeting text using this voice:

var voices = Windows.Phone.Speech.Synthesis.InstalledVoices.All;

int voiceNumber = rnd.Next(voices.Count);

synth.SetVoice(voices[voiceNumber]);

 

CultureInfo ci = new CultureInfo(voices[voiceNumber].Language);

string lang = ci.EnglishName;

int pos = ci.EnglishName.IndexOf("(");

if (pos > 0)

    lang = lang.Substring(0, pos - 1);

 

await synth.SpeakTextAsync("Hello from voice of " + voices[voiceNumber].DisplayName + 

                            ". I'm a " + voices[voiceNumber].Gender + 

                            ". I'm specializing in " + lang + " language");

Sometimes, like in snippet above speech process could take some time. In case your application logic needs to cancel speech process, it is advised to use IAsyncOperation instead of new await keyword. The following code demonstrates this approach – user presented with message box and same text synthesized using SpeechSynthesizer. When user closes MessageBox the speech stops:

SpeechSynthesizer synth = new SpeechSynthesizer();

 

string theMessage = "Pages updated. From outside the app, press and hold the Start button and say 'Blog Sample show me last page'";

 

try

{

    var task = synth.SpeakTextAsync(theMessage);

    MessageBox.Show(theMessage);

    task.Cancel();

}

catch (TaskCanceledException TaskEx)

{

    //Ignore the error - expected when cancelling the task

}

catch (Exception ex)

{

    //Some other error...

    //...

}

Now, let’s move to speech recognition.

Speech Recognition

Speech recognition support for applications based on defined list of voice commands and grammars. Also built-in grammar is available. Voice commands and user-defined grammars could be used to build in-app navigation, while built-in grammars could be used to build short message dictation or in-app/web search queries functionality.

Voice commands file is an XML file which defines a list of commands application listens to. Before using speech recognition application must initialize voice commands from file or from code using APIs described later. Voice commands file looks like the following code snippet:

<?xml version="1.0" encoding="utf-8" ?>

<VoiceCommands xmlns="http://schemas.microsoft.com/voicecommands/1.0">

  <CommandSet xml:lang="en-us" Name="NavigationCommands">

    <CommandPrefix>Blog Sample</CommandPrefix>

    

    <Example>Show (Page Number) page</Example>

 

    <Command Name="showPage">

      <Example>Show me (some) page</Example>

      <ListenFor>[Show] [me] {pageNames} page</ListenFor>

      <ListenFor>Go to {pageNames} page </ListenFor>

      <Feedback>Navigating to {pageNames} page...</Feedback>

      <Navigate Target="/ThePage.xaml"/>

    </Command>

 

    <Command Name="showAboutPage">

      <Example>Show me the About Page</Example>

      <ListenFor>show [me] the about page</ListenFor>

      <ListenFor>the about page</ListenFor>

      <Feedback>Showing you the About page...</Feedback>

      <Navigate Target="About.xaml"/>

    </Command>

    

    <PhraseList Label="pageNames">

      <Item>First</Item>

      <Item>Second</Item>

      <Item>Third</Item>

    </PhraseList>

  </CommandSet>

 

  <!-- Other CommandSets for other languages -->

 

</VoiceCommands>

Few “special” elements on such file:

{} – Contains the reference to a value from corresponding PhraseList

[] – Designates optional word or words.

Note: for complete reference of available options for VoiceCommand files refer to documentation.

Grammar files also defines the words and phrases that an application will recognize in speech input. Like voice commands, grammars are also using the XML formatted file defined by the Speech Recognition Grammar Specification (SRGS) Version 1.0. Also this file should be loaded before using grammars or initialized from code behind. Sample grammar file looks like:

<?xml version="1.0" encoding="utf-8" ?>

<grammar version="1.0" xml:lang="en-US" root="sampleCommand" tag-format="semantics/1.0" xmlns="http://www.w3.org/2001/06/grammar" 

         xmlns:sapi="http://schemas.microsoft.com/Speech/2002/06/SRGSExtensions">

 

  <rule id="sampleCommand" scope="public">

    Navigate to

 

    <one-of>

      <item> page </item>

      <item> item </item>

    </one-of>

 

    <item repeat="0-1"> in the </item>

 

    <ruleref uri="#pages"/>

 

    <one-of>

      <item repeat="0-1"> pages </item>

      <item repeat="0-1"> items </item>

    </one-of>

  </rule>

 

  <rule id="pages" scope="public">

    <one-of>

      <item> Interesting </item>

      <item> Boring </item>

      <item> New </item>

      <item> Old </item>

      <item> Personal </item>

    </one-of>

  </rule>

</grammar>

Note: For complete descriptions of SRGS grammar elements, refer to documentation.

Now let’s see how to use those files from within application.

To use voice recognition, the application must declare the following capabilities (in application manifest):

<Capability Name="ID_CAP_SPEECH_RECOGNITION" />

<Capability Name="ID_CAP_MICROPHONE" />

Let’s see how to initialize speech recognizer with Voice Commands file:

try

{

    // Set the path to the Voice Command Definition (VCD) file.

    String path = Windows.ApplicationModel.Package.Current.InstalledLocation.Path + "\\VoiceCommands.xml";

    Uri uri = new Uri(path, UriKind.Absolute);

 

    // Load the VCD file.

    await Windows.Phone.Speech.VoiceCommands.VoiceCommandService.InstallCommandSetsFromFileAsync(uri);

 

    //...

}

catch (Exception ex)

{

    //...

}

From now on the application’s commands are available for phone-wide speech commands. This means, that user could long press windows key and say something like: “Blog Sample show second page” or “Blog Sample Go to Third page”. As result to recognized text, the page defined Navigate element of voice command file will be invoked. Also, user could ask phone “What can I say?” and receive list of available commands incl. the applications registered using voice commands:

image

As response to recognized text system provides visual and spoken response:

image

As mentioned before, the page in the app is invoked. Navigation context holds information which helps to support voice recognition scenarios. In case of voice initiated navigation the query string contains recognized voice command (“voiceCommandName”), recognition result (“reco”) and recognized text in PhraseList named element. In case of voice command file defined above, the query string contains the following values:

voiceCommandName = showPage

reco = Blog Sample Show Third page

pageName = Third

Note, that italic information provided based on VCD file.

This information could be used while processing navigation arguments in OnNavigatedTo function as follows:

protected override void OnNavigatedTo(System.Windows.Navigation.NavigationEventArgs e)

{

    base.OnNavigatedTo(e);

 

    if (NavigationContext.QueryString.ContainsKey("voiceCommandName"))

    {

        string voiceCommandName = NavigationContext.QueryString["voiceCommandName"];

 

        switch (voiceCommandName)

        {

            case "showPage":

                string pageNumber = NavigationContext.QueryString["pageNames"];

 

                PageTitle.Text = pageNumber + " page";

                break;

 

            case //...

 

            //...

 

            default:

                break;

        }

    }

}

Last, but not least – modifications to voice commands once loaded. In some case application manipulates with dynamic data which should be exposed in VCD file. In such case the commands list could be dynamically updated as follows:

try

{

    IReadOnlyDictionary<String, VoiceCommandSet> collectionOfCommandSets = VoiceCommandService.InstalledCommandSets;

    VoiceCommandSet targetCommandSet = collectionOfCommandSets["NavigationCommands"];

 

    String[] updatedPagesList = { "First", "Second", "Third", "Fourth", "Fifth", 

                                    "Other", "Interesting", "Boring", "Last" };

 

    await targetCommandSet.UpdatePhraseListAsync("pageNames", updatedPagesList);

 

    //...

}

catch (Exception ex)

{

    //...

}

Notes: The “NavigationCommands” is the name of CommandSet in loaded VCD file and “pageNames” is the PhraseList label in VCD file.

Now let’s see how to recognize the speech from within the application.

Speech recognizers located in Windows.Phone.Speech.Recognition namespace. There are two of them: SpeechRecognizerUI and SpeechRecognizer. SpeechRecognizerUI provides built-in experience while listening to the speech and can show system confirmation after recognition. SpeechRecognizer enables user-defined speech scenarios as it doesn’t show any UI. Before first usage of speech recognizer it must be initialized. Also custom grammar/voice commands should be loaded, otherwise built-in system grammar will be used.

Note: Built-in dictation and web search grammars are large files and they are reside online (not on the phone). The performance may not be as fast as with custom, on-phone grammars.

While initializing speech recognizer application can provide various options, like text to be shown on system UI (in case of SpeechRecognizerUI), showing recognition confirmations (voice and textual), etc.

SpeechRecognizerUI recognizer;

bool speechInitialized = false;

List<string> pagesList = new List<string>();

//...

private void InitializePagesList()

{

    pagesList.Add("First");

    pagesList.Add("Second");

    pagesList.Add("Third");

    pagesList.Add("Fourth");

    pagesList.Add("Fifth");

}

 

private void InitializeSpeechRecognition()

{

    if (this.speechInitialized)

        return;

 

    try

    {

        speechInitialized = true;

 

        recognizer = new SpeechRecognizerUI();

        recognizer.Settings.ListenText = "Navigate to which page?";

        recognizer.Settings.ExampleText = "For example, 'First' or 'Second'";

        recognizer.Settings.ShowConfirmation = true;

        recognizer.Recognizer.Grammars.AddGrammarFromList("pages", pagesList);

    }

    catch (Exception ex)

    {

        //...

    }

}

Alternatively, application could initialize speech recognition from SRGS file (instead of using AddGrammarFromList):

string path = Package.Current.InstalledLocation.Path + "\\CostomSRGSGrammars.xml";

Uri customGrammar = new Uri(path, UriKind.Absolute);

 

recoWithUI.Recognizer.Grammars.AddGrammarFromUri("appCommands", customGrammar);

await recoWithUI.Recognizer.PreloadGrammarsAsync();

Once speech recognizer initialized and configured it could be started as follows:

var recoResult = await recognizer.RecognizeWithUIAsync();

The recognition result provides ResultStatus and RecognitionResult which among other information holds recognized Text and TextConfidence and RuleName. Those parameters could be used in application logic. Next code snippet uses Text and TextConfidence for navigation purposes:

var url = string.Format("/ThePage.xaml?recognizedPage={0}&confidence={1}", 

                        recoResult.RecognitionResult.Text, 

                        recoResult.RecognitionResult.TextConfidence);

NavigationService.Navigate(new Uri(url, UriKind.Relative));

It it important to mention, that application can use more than one speech recognizer object and each once could be initialized with its own grammars.

That is all about voice recognition and speech synthesizing in Windows Phone 8.

 

Next time I will show how to associate application with file extension and handle custom protocol.

 

Stay tuned,

Alex

Add comment
facebook linkedin twitter email

Leave a Reply

Your email address will not be published.

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

*