0% found this document useful (0 votes)

27 views9 pages

VUIs and Mobile Applications

Uploaded by

Indira Sureshkumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

27 views9 pages

VUIs and Mobile Applications

Uploaded by

Indira Sureshkumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

VUIs and Mobile Applications (chapter7)

Introduction

As their name indicates, voice user interfaces (VUIs) are interfaces that allow users to
interact with computing systems through use of voice. Although our voice can be used in different
ways, VUIs typically refer to communication through the use of language. This narrows down the
problem at hand as communication through VUIs is a subset of communicating through aural user
interfaces. It is possible to communicate with computers through sounds other than pronounced
words and sentences. Different sounds can be used to communicate information through their
frequency, amplitude, duration, and other properties that make them unique. However, language
is how we naturally communicate, be it through voice or text; therefore, when referring to VUIs,
we refer to communicating with machines using pronounced language

A voice-user interface (VUI) is a system that allows spoken human interaction with computers.
VUI typically is using speech recognition in order to understand spoken requests from the user and
is also able to answer these requests through text or voice outputs

 VUI, has recently dramatically increased in popularity.

 VUI is making use of speech recognition technology in order to enable users to interact
with devices by using just their voices.
 VUI allows efficient interactions which are more ‘human’ than any other form of the
user interface such as mouse or keyboard since “Speech is the primary mean of human
communication”
 A VUI will generally be much faster than a traditional UI.

 A voice command device (VCD) is a device – such as a computer – which is controlled

with a VUI.
 Voice user interfaces are now appearing more and more everywhere, especially on newer
smartphones. They are also integrated into recent automobiles, domotics (home
automation), computers, home appliances (washing machines etc…) and TV remote
controls.

1
There are three basic technologies at the heart of systems that interact with users through the
use of voice:

1- Voice recognition,
2- Voice transcription, and
3- Speech synthesis (a subset of which is referred to as text-to-speech)

Qualities of Speech

The qualities of speech are those things that differentiate speech from other types of aural
input. To build good VUIs, a general understanding of the physical qualities of speech not only
give us a better high-level insight into the operation of voice recognition engines but also leads us
toward building better VUIs. So, let us take a survey of what these qualities of speech are:

Amplitude

Speech, as in any other type of aural input, has a loudness level. The loudness of a sound
is based on the amplitude of the sound wave that makes the sound. The amplitude of speech is
important in that input devices are designed to receive sounds within certain amplitude thresholds.

Frequencies and Pitch

Another quality of speech is the set of frequencies that make up the sounds that in turn
make speech. Along with the amplitude of our voice, the combination of these frequencies is what
makes up the different pitches and sounds that we use to make speech.

Meaning and Context

Sounds make up words, words make phrases and sentences, and a combination of spoken
sounds, words, phrases, and sentences make up spoken language. However, the meaning of
different combinations of sounds, words, phrases, and sentences in spoken language depends
largely on the context in which the spoken language is used.

2
Language

Language is a way of combining audible sounds, signs, and gestures to allow people to
ommunicate. When dealing with VUIs, we are concerned with spoken language, which is the
subset of the aural things that can represent a language. Interpreting this spoken language is the
final goal of both voice recognition and voice transcription systems.

Voice transcription

Transcription can be performed by a machine or written by a human. Transcription

converts recorded speech into written format. There are automated voice transcriptions today, but
they are speaker dependent. This means that the voice transcription system has to learn the speech
patterns of the user through a training process. This training process can be long, taking several
hours, before enabling the system to recognize the user’s voice reasonably well

Voice recognition

Voice recognition allows the recognition of a word, a phrase, a sentence, or multiple sentences
pronounced by voice against a finite set possible matches. In other words, the computing system
tries to match something said by the user to a given set of possibilities that it already understands.
For example, telephone companies have used this technology for years in their directory systems
asking the user to “Please say one to reach Bob and two to reach Phil.” If the user says “three,” the
system cannot find a match and tells the user “That’s not a valid option; please say one to reach
Bob and two to reach Phil.” Some major distinctions between voice transcription and voice
recognition, as defined in this text, are the following:

1- Voice recognition systems rely on predefined interactions with the user. These interactions
can be composed of predefined words, phrases, or sentences.
2- Voice recognition systems attempt to map these predefined words, phrases, sentences, or
composites to something that is understandable by the computing system.

4. Performance of voice recognition systems is inversely related to the size of the grammar
vocabulary for a given transaction not necessarily true of voice transcription systems as their
performance is much less dependent on the specifics of a particular user interaction.

3
5. Voice recognition systems are typically speaker independent. Whereas most voice
transcription systems are tuned to a given user’s voice and spoke patterns, voice
transcription systems are designed to be independent of the user’s voice.

Functionality offered by today’s voice recognition systems could be in one of several of the
following forms

1-Directed Dialogue: As the term suggests, this is the case when every response by the user is
preceded by a list of acceptable responses as well as an explanation of them. An example of
such a dialogue is shown in following Figure:

As you see in this example, the dialogue is directed by the system. The analog of this type
of dialog in the GUI world would be an interface entirely comprised of list boxes that are bound
to predetermined selections.

3- Natural Language: The ultimate goal of voice recognition is to be able to have the
computing system understand the commands that you give it even if things are paraphrased
or are put into the wrong order.

4
This is an example of a natural language interaction, taking place between a user and the
computing system. As you can see in this example, although the system can only
understand sentences and words that concern turning on and off lights or adjusting the
temperature in the house, the user is able to put those words and sentences in any order
desired.

4- Mixed-Mode Dialogues: Mixed-mode dialogs are often referred to as mixed initiative or

natural language as well (though this second is a bit of a misnomer as mixed-mode
dialogues are not as flexible as natural language dialogues). Mixed-mode dialogues allow
natural language interactions while directing the user so as to contain the possible responses
to a given question to a minimum.

This example shows an example of a mixed-mode dialog between our fictitious system and
Bob. Note that although the system drives the conversation taking place with the user and
tries to limit the expected responses from the user, it understands the user’s natural
language response

5
Obviously, directed dialog

1- Limits the interaction to a few specific questions and answers, sometimes with a list of
possible responses.
2- This type of system works in situations with few potential customer responses.
3- This type is also easy to create, but that’s about the end of the benefits related of
customer experience,

Natural language addresses these common concerns by

1- Letting the caller speak freely, as if speaking to a live person.

2- Natural language processing uses AI to interpret whatever the customer says.
3- Callers may respond with full sentences, and the system will pick out the most important
information and generate a helpful response.
4- This system is perfect for situations with many possible responses.
5- It’s more complex to develop but Natural language is the best option for most VUI systems
today.

The following is natural language VUI example:

6
Java Speech APIs

As in the case of other APIs, the Java platform offers a canonical API, JSAPI, or Java
Speech APIs, is this canonical API. Although this API is no different than any other API,
considering it has two benefits.

First, because it has been agreed on by more than one commercial entity, there is less bias in it
toward any particular platform implementation.

Second, it gives us a good high-level view of what any API may implement in providing access to
the underlying technologies for a VUI. There are three main packages in JSAPI:

1. javax.speech: This package provides the infrastructure to connect to the voice channels for input
and output and to manage dictionary vocabularies dynamically. It also provides the interfaces
that are later used by the other two packages in JSAPI.

2. javax.speech.synthesis: As its name may suggest, this package provides an API suitable for
providing an interface to speech-synthesis systems. This package provides the utilities to adjust
the different values for the quality of speech provided by the speech-synthesis engine for tighter
control of the synthesis. It also provides JSML hooks into the system so that the synthesis can
be done based on JSML.

3. javax.speech.recognition: This package provides the interfaces for managing grammars, rules,
recognition results, and the settings of the recognition engine. As may be suspected, it takes
advantage of JSGF.

7
VXML

VXML is the W3C's standard XML format (Voice Extensible Markup Language) for
specifying interactive voice dialogues between a human and a computer. VXML used for

- Telephone -based speech applications.

- Voice browsing of the web

VXML allows voice applications to be developed and deployed in similar way to HTML
for visual applications where as HTML documents are interpreted by a visual web browser, VXML
documents are interpreted by a voice browser. VXML has two main components

- tags .... Control what the program does.

- grammars .... Controls what speech is recognized, what the user can say.

Advantages of VoiceXML

1. To provide a simple mechanism to build VUIs.

2. Separates user interaction code (in VoiceXML) from service logic (e.g. scripts)

3. To provide a language that is portable across multiple voice recognition platforms.

4. To enable one VXML document to hold multiple voice interactions. This lessens the number of
interactions between the application running on the voice platform and applications, databases, or
other interfaces providing the business logic necessary to generate the VXML document.

5. To offer a small but sufficient set of features for basic telephony call control and text-to-
speech interactions. Although the functionality through VXML for these functions are sufficient
for smaller efforts, it is recommended that more comprehensive efforts, particularly for mobile
applications.

8
Using VXML for Mobile Applications

There is a great deal of effort today to implement embedded voice recognition systems
onto mobile devices. VXML can be produced from existing GUIs using transcoding mechanisms
that convert another user interface markup language to VXML, using XSLs or other technologies.
In theory, the primary advantage of using VXML in this manner is to reduce the necessary
development, create consistency among different interfaces, and to reduce the cost of changes and
maintenance during the lifetime of an application. In practice, things do not quite work that way
because of a number of factors. Some of these are as follows:

1. There is a large performance cost in using VXML on the server side. This problem can typically
be easily solved by increasing CPU and memory. Because CPU and memory are now mostly
inexpensive commodities, the problem is not significant. Nevertheless, it is important to be aware
of the performance loss when moving from a legacy VUI to a VXML-based VUI.

2. Because VXML has been designed as a browser that utilizes ECMAScript to reduce traffic with
the other applications, it is stateful. And because it is stateful, the task of transforming a generic
interface to VXML can be a fairly complicated one.

(ECMAScript is a standard script language, developed with the cooperation of Netscape and
Microsoft and mainly derived from Netscape's JavaScript, the widely-used scripting
language that is used in Web pages to affect how they look or behave for the user).

3. Generating VXML with typical server-side technologies such as JSP or ASP has the drawback
of increasing network traffic although it has the advantage of allowing us to apply the same
techniques used for generating HTML-based Web pages to generate VXML pages.

4. Automated conversion mechanisms that consume HTML or XHTML to produce VXML are
typically flawed. Whereas the same technologies do a fair job with text and GUIs, VUIs are much
more sensitive to errors. One wrong or poorly designed interaction with the user will turn the user
off from using the system.

Voice Portals
No ratings yet
Voice Portals
19 pages
Understanding Voice User Interfaces (VUI)
No ratings yet
Understanding Voice User Interfaces (VUI)
8 pages
Voice Recognition: An Examination of An Evolving Technology and Its Use in Organizations
No ratings yet
Voice Recognition: An Examination of An Evolving Technology and Its Use in Organizations
8 pages
Voice Recognition System Report
No ratings yet
Voice Recognition System Report
17 pages
5 Interfaces
No ratings yet
5 Interfaces
50 pages
Voice Communication With Computers (VanNostrand) (1993)
No ratings yet
Voice Communication With Computers (VanNostrand) (1993)
342 pages
A Comparative Study of Various Approaches For Dialogue Management
No ratings yet
A Comparative Study of Various Approaches For Dialogue Management
8 pages
An Investigation Into The Efficiency and Utility of Voice User Interfaces Siri and Google Assistant For Smartphones
No ratings yet
An Investigation Into The Efficiency and Utility of Voice User Interfaces Siri and Google Assistant For Smartphones
8 pages
Automated Speech Recognition Systems Applications in Industry
No ratings yet
Automated Speech Recognition Systems Applications in Industry
4 pages
Introduction To Linguistics 14
No ratings yet
Introduction To Linguistics 14
27 pages
Speech Recognition Seminar
No ratings yet
Speech Recognition Seminar
19 pages
Telecommunication Applications of Speech Recognition
No ratings yet
Telecommunication Applications of Speech Recognition
100 pages
Speech Recognition, Digitization, Generation
100% (6)
Speech Recognition, Digitization, Generation
12 pages
Speech Recognition Full Report
No ratings yet
Speech Recognition Full Report
11 pages
Ai For Speech Recognition
100% (4)
Ai For Speech Recognition
24 pages
Artificial Intelligence For Speech Recog
No ratings yet
Artificial Intelligence For Speech Recog
5 pages
Artificial Intelligence For Speech Recognition
No ratings yet
Artificial Intelligence For Speech Recognition
5 pages
Chapter 1. INTRODUCTION
No ratings yet
Chapter 1. INTRODUCTION
2 pages
AI Methodology in Speech Recognition
No ratings yet
AI Methodology in Speech Recognition
10 pages
Speech Recognition - Specific Task of Speech Recognition: Abstract
No ratings yet
Speech Recognition - Specific Task of Speech Recognition: Abstract
7 pages
Voice Browser Technology Overview
0% (1)
Voice Browser Technology Overview
5 pages
Interfaces Voice Applications: User For
No ratings yet
Interfaces Voice Applications: User For
7 pages
Oreilly Design For Voice Interfaces
No ratings yet
Oreilly Design For Voice Interfaces
37 pages
History and Uses of Voice Recognition
No ratings yet
History and Uses of Voice Recognition
22 pages
Voice Response System
0% (1)
Voice Response System
74 pages
CASE STUDY - Speech Recognition
No ratings yet
CASE STUDY - Speech Recognition
25 pages
Voice-to-Text via Deep Learning
No ratings yet
Voice-to-Text via Deep Learning
6 pages
Speech Recognition Using Ic HM2007
100% (4)
Speech Recognition Using Ic HM2007
31 pages
History of Voice Recognition Technology
No ratings yet
History of Voice Recognition Technology
4 pages
Voice User Interface Design
No ratings yet
Voice User Interface Design
329 pages
Speech Recognition System Overview
No ratings yet
Speech Recognition System Overview
7 pages
Voice AI: Enhancing Customer Service
No ratings yet
Voice AI: Enhancing Customer Service
29 pages
Clive Kaganzi - M00848134 Coursework 2 Report
No ratings yet
Clive Kaganzi - M00848134 Coursework 2 Report
4 pages
Speech Recognition Using Python
100% (2)
Speech Recognition Using Python
6 pages
Uid Report
No ratings yet
Uid Report
15 pages
Voice Technology Seminar
100% (1)
Voice Technology Seminar
35 pages
Introduction To Speech Recognition
No ratings yet
Introduction To Speech Recognition
7 pages
Voice Recognition Software Pros and Cons
No ratings yet
Voice Recognition Software Pros and Cons
7 pages
How Voice Works
No ratings yet
How Voice Works
3 pages
Personal Voice Assistant in Python
86% (22)
Personal Voice Assistant in Python
30 pages
Speech Recognition in AI (COMP 334)
No ratings yet
Speech Recognition in AI (COMP 334)
26 pages
CI Advantages & Disadvatages
No ratings yet
CI Advantages & Disadvatages
15 pages
Virtual Voice Assistant My Proj123
No ratings yet
Virtual Voice Assistant My Proj123
29 pages
4 Must-Have Design Patterns For Engaging Voice-First User Interfaces
No ratings yet
4 Must-Have Design Patterns For Engaging Voice-First User Interfaces
10 pages
AI-based Desktop Voice Assistant
No ratings yet
AI-based Desktop Voice Assistant
4 pages
Voice-Controlled UI for Disabled Users
No ratings yet
Voice-Controlled UI for Disabled Users
19 pages
An Intelligent Web-Based Voice Chat Bot
No ratings yet
An Intelligent Web-Based Voice Chat Bot
33 pages
(IJCST-V4I2P62) :Dr.V.Ajantha Devi, Ms.V.Suganya
No ratings yet
(IJCST-V4I2P62) :Dr.V.Ajantha Devi, Ms.V.Suganya
6 pages
AI in Speech Recognition Technology
No ratings yet
AI in Speech Recognition Technology
27 pages
Advances in Automatic Speech Recognition: From Audio-Only To Audio-Visual Speech Recognition
No ratings yet
Advances in Automatic Speech Recognition: From Audio-Only To Audio-Visual Speech Recognition
6 pages
Final Report
No ratings yet
Final Report
35 pages
Projectreport
No ratings yet
Projectreport
41 pages
JETIR2003165
No ratings yet
JETIR2003165
4 pages
Personal Assistant Chatbot
No ratings yet
Personal Assistant Chatbot
5 pages
Post LTC07 115
No ratings yet
Post LTC07 115
13 pages
Personal Voice Assistant in Python
100% (1)
Personal Voice Assistant in Python
30 pages
Knowledge of God in Christ, Expressed in Particular Beliefs in Specific Truths by Which We Adhere To Christ
No ratings yet
Knowledge of God in Christ, Expressed in Particular Beliefs in Specific Truths by Which We Adhere To Christ
5 pages
Oliwer Grzonka
No ratings yet
Oliwer Grzonka
2 pages
Mafia
No ratings yet
Mafia
16 pages
I04 - Unit 12 Our World - Lesson A - For Students
No ratings yet
I04 - Unit 12 Our World - Lesson A - For Students
69 pages
Conquering First Grade (Jodene Lynn Smith) (Z-Library)
100% (1)
Conquering First Grade (Jodene Lynn Smith) (Z-Library)
170 pages
India Bulls
No ratings yet
India Bulls
4 pages
Modernist Novelists: Conrad to Joyce
No ratings yet
Modernist Novelists: Conrad to Joyce
7 pages
Norbaiyah Abd Kadir Full Paper
No ratings yet
Norbaiyah Abd Kadir Full Paper
12 pages
China-Henrietta Harrison PDF
100% (1)
China-Henrietta Harrison PDF
305 pages
Understanding Lordship and Surrender
No ratings yet
Understanding Lordship and Surrender
7 pages
Handouts Islamic Studies EDU512
No ratings yet
Handouts Islamic Studies EDU512
96 pages
Math 312: Mathematics for Teachers
No ratings yet
Math 312: Mathematics for Teachers
6 pages
World Literature Midterm
No ratings yet
World Literature Midterm
6 pages
Pilgrims Progress PDF
100% (4)
Pilgrims Progress PDF
172 pages
Intro Eviews 2
No ratings yet
Intro Eviews 2
18 pages
GATE Preparation Classes Schedule
No ratings yet
GATE Preparation Classes Schedule
7 pages
Grammar Various Sentences Form 3
No ratings yet
Grammar Various Sentences Form 3
3 pages
LYNX NGT 900IM Rev 8 (Record)
No ratings yet
LYNX NGT 900IM Rev 8 (Record)
260 pages
Edwin H. Kremer Department of Computer Science Utrecht University
No ratings yet
Edwin H. Kremer Department of Computer Science Utrecht University
4 pages
Memory: From Short to Long-Term
No ratings yet
Memory: From Short to Long-Term
14 pages
SDET Study Plan
No ratings yet
SDET Study Plan
3 pages
A Short Story Definition
No ratings yet
A Short Story Definition
1 page
Korean Peninsula: A Tale of Two Paths
No ratings yet
Korean Peninsula: A Tale of Two Paths
6 pages
Intro to Operating Systems
No ratings yet
Intro to Operating Systems
67 pages
Semester-1 /2: Chitkara School of Engineering and Technology
No ratings yet
Semester-1 /2: Chitkara School of Engineering and Technology
4 pages
15 Minute Hamlet
No ratings yet
15 Minute Hamlet
3 pages
SBI PO Exam Analysis & Cut-Off 2014
No ratings yet
SBI PO Exam Analysis & Cut-Off 2014
3 pages
Modal Verbs: Can, Could, Must, Should, Ought To, May, Might
No ratings yet
Modal Verbs: Can, Could, Must, Should, Ought To, May, Might
12 pages
Essential Linux Interview Questions
No ratings yet
Essential Linux Interview Questions
14 pages
Better English - 5 Class Answers
0% (2)
Better English - 5 Class Answers
128 pages

VUIs and Mobile Applications

Uploaded by

VUIs and Mobile Applications

Uploaded by

VUIs and Mobile Applications (chapter7)

 VUI, has recently dramatically increased in popularity.

 A voice command device (VCD) is a device – such as a computer – which is controlled

Frequencies and Pitch

Meaning and Context

Transcription can be performed by a machine or written by a human. Transcription

4- Mixed-Mode Dialogues: Mixed-mode dialogs are often referred to as mixed initiative or

Natural language addresses these common concerns by

1- Letting the caller speak freely, as if speaking to a live person.

The following is natural language VUI example:

- Telephone -based speech applications.

- Voice browsing of the web

- tags .... Control what the program does.

1. To provide a simple mechanism to build VUIs.

3. To provide a language that is portable across multiple voice recognition platforms.

You might also like