
The Voice Browser DFP Framework
Informative note from the Voice Browser Working Group, 8 February 2006
The DFP (Data Flow Presentation)
framework, developed by the Voice Browser Activity Voice Browser Activity, explains
how Voice Browser specifications can be used together to create
modular voice applications.
The framework is composed of three layers:
- Data--A component in the data layer manages data for the
applicat ion. The data representation language is a hierarchical
XML representatio n representing various kinds of data including
(but not limited to ) (1) environment data (2) domain-specific
data used for canonicalization to E MMA for holding
interpretations and (3) interaction history to track user
inputs.
- Flow--A component in the flow layer controls the application
flow. It does so by interacting with data and presentation
layers. A flow c omponent does not directly interact with the
user. Rather, it requests us er interaction by invoking a
component on the presentation layer. The inv ocation may include
data derived from the data component. When informatio n is
returned from a presentation component, the flow component can
then update the data representation in a data layer component. The
flow compon ent is responsible for marshalling this data into the
canonical format us ed by the data component. Application flows
may be structured in terms of a state machines or other
appropriate techniques such as rules, scripts, etc. Voice Browser
languages to describe application flow include
CCXML and
SCXML.
- Presentation--Components on the presentation layer interact with
the user; for example, by playing media files and synthesized speech, and
by accepting speech and DTMF input from the user. A flow component invok
es a presentation component with data from the data layer. A presentation
may have a local data representation which persists for the duration of
the active presentation. During the presentation, the presentation, flow
and data components may exchange further information. The flow component
may also cancel an active presentation. Once a presentation is complete,
the presentation component indicates this to the flow component; this ind
ication may include data collected, or derived from, user input and inter
action. Voice browser languages for user presentation include
VoiceXML 2.0,
VoiceXML 2.1
and VoiceXML 3.0 described in this document.
The interface between flow and presentation components is defined in terms of invocation requests and their responses, as well as asynchronous notifications. In each case, these can be modelled as events, where the event has an event name and a data payload. The payload is modelled in terms of property name value pairs, where the name is a string, and the value can be an atomic type (e.g. string, integer or boolean) or a complex type (e.g. a nested properties structure). The precise format of the data payload is not yet decided.
A flow component can invoked a presentation by sending a 'start' event. The event needs to include sufficient information to start the presentation; for example, it may include a URI referencing a VoiceXML script and may also include information which is passed to this script upon initiation.
Once the presentation is started, a flow component may cancel the presentation by issuing a 'stop' event. Otherwise, the presentation runs until completion and a 'stopped' event is returned from the presentation to the flow component. The stopped event may include data collected during the presentation with the user. Prior to the presentation being stopped, the flow and presentation components may send each other 'update' events.
An example of this interface is where a CCXML component is a flow component and VoiceXML 2.1 is a presentation component. At some stage in the application flow, the CCXML script starts a VoiceXML presentation by executing a <dialogstart> element with a src attribute indicating the script to run. Once the presentation has completed, a dialog.exit event is returned to the CCXML component.
More advanced interaction with the presentation is possible in the DFP framework than is currently permitted with VoiceXML 2.0/2.1. Consequently, VoiceXML 3.0 may be enhanced with capabilities such as:
With the DFP framework, developers are able to structure their application in a modular manner, where data, flow and presentations are expressed in components at the appropriate layer.
An application's flow can be expressed in terms of states in a flow component: for a given state, a presentation component is invoked and the results returned from presentation component triggers state transitions in the flow component. This enables a clear separation of flow from presentation within the application, and faciliates development of reusable presentation components (such as parameterized VoiceXML <form>s for credit-card collection, scrollable lists, etc) which can be invoked from a variety of flow components.
Application developers can also take advantage of flow components which support parallel invocation of presentations. For example, a SCXML flow component may start three presentation components executing at the same time; one presentation component presents background music, another continuously listens for an attention word, and the third component presents the application whose name is spoken by the user after speaking the attention word.
Finally, the framework promotes, but does not mandate, various application practises. The strong implication is that markup on each layer should only express what is appropriate at that layer. For example, presentation layer components should not express 'flow' concepts such as 'goto'. So instead of writing a single large VoiceXML presentation which uses <goto> to navigate between application states expressed as <form>s, the application could be written as a flow component and a set of 'micro-dialog' presentation components. For example, a SCXML flow component which has a set of states corresponding to application states, together with a set of (reusable) VoiceXML presentations composed of a single VoiceXML <form> to interact with the user and return results to the flow component's states. This modular approach faciliates application development, maintainance, debugging and reuse.