The Voice Browser DFP Framework

1 DFP Overview

The DFP (Data Flow Presentation) framework, developed by the Voice Browser Activity Voice Browser Activity, explains how Voice Browser specifications can be used together to create modular voice applications.

The framework is composed of three layers:

Data--A component in the data layer manages data for the applicat ion. The data representation language is a hierarchical XML representatio n representing various kinds of data including (but not limited to ) (1) environment data (2) domain-specific data used for canonicalization to E MMA for holding interpretations and (3) interaction history to track user inputs.
Flow--A component in the flow layer controls the application flow. It does so by interacting with data and presentation layers. A flow c omponent does not directly interact with the user. Rather, it requests us er interaction by invoking a component on the presentation layer. The inv ocation may include data derived from the data component. When informatio n is returned from a presentation component, the flow component can then update the data representation in a data layer component. The flow compon ent is responsible for marshalling this data into the canonical format us ed by the data component. Application flows may be structured in terms of a state machines or other appropriate techniques such as rules, scripts, etc. Voice Browser languages to describe application flow include CCXML and SCXML.
Presentation--Components on the presentation layer interact with the user; for example, by playing media files and synthesized speech, and by accepting speech and DTMF input from the user. A flow component invok es a presentation component with data from the data layer. A presentation may have a local data representation which persists for the duration of the active presentation. During the presentation, the presentation, flow and data components may exchange further information. The flow component may also cancel an active presentation. Once a presentation is complete, the presentation component indicates this to the flow component; this ind ication may include data collected, or derived from, user input and inter action. Voice browser languages for user presentation include VoiceXML 2.0, VoiceXML 2.1 and VoiceXML 3.0 described in this document.

2 Relationship to other Approaches

The DFP framework is an instance of Mod el-View-Controller (MVC) design pattern. The data layer instantiates MVC' s model, the flow layer instantiates the controller, and the presentation layer instantiates the view.

The DFP framework is also intended as a voice-centric instance of the Multimodal archtecture [MMIARCH] developed by th e W3C Multimodal Interaction Activity. The data layers are identical. The MMI's runtime framework corresponds to the flow layer. And MMI's modality components correspond to DFP's presentation components. Ongoing collaboration between the activities will further refine and clarify the alignment between these approaches.

3 Interface between Flow and Presentation Layers

The interface between flow and presentation components is defined in terms of invocation requests and their responses, as well as asynchronous notifications. In each case, these can be modelled as events, where the event has an event name and a data payload. The payload is modelled in terms of property name value pairs, where the name is a string, and the value can be an atomic type (e.g. string, integer or boolean) or a complex type (e.g. a nested properties structure). The precise format of the data payload is not yet decided.

A flow component can invoked a presentation by sending a 'start' event. The event needs to include sufficient information to start the presentation; for example, it may include a URI referencing a VoiceXML script and may also include information which is passed to this script upon initiation.

Once the presentation is started, a flow component may cancel the presentation by issuing a 'stop' event. Otherwise, the presentation runs until completion and a 'stopped' event is returned from the presentation to the flow component. The stopped event may include data collected during the presentation with the user. Prior to the presentation being stopped, the flow and presentation components may send each other 'update' events.

An example of this interface is where a CCXML component is a flow component and VoiceXML 2.1 is a presentation component. At some stage in the application flow, the CCXML script starts a VoiceXML presentation by executing a <dialogstart> element with a src attribute indicating the script to run. Once the presentation has completed, a dialog.exit event is returned to the CCXML component.

More advanced interaction with the presentation is possible in the DFP framework than is currently permitted with VoiceXML 2.0/2.1. Consequently, VoiceXML 3.0 may be enhanced with capabilities such as:

VoiceXML dialogs are cancelleable.
VoiceXML dialogs can received events from the flow layer during execution. These events are exposed in the presentation markup.
VoiceXML dialogs can send events to the flow layer during execution. These events are speicified in the presentation markup.

4 Benefits

With the DFP framework, developers are able to structure their application in a modular manner, where data, flow and presentations are expressed in components at the appropriate layer.

An application's flow can be expressed in terms of states in a flow component: for a given state, a presentation component is invoked and the results returned from presentation component triggers state transitions in the flow component. This enables a clear separation of flow from presentation within the application, and faciliates development of reusable presentation components (such as parameterized VoiceXML <form>s for credit-card collection, scrollable lists, etc) which can be invoked from a variety of flow components.

Application developers can also take advantage of flow components which support parallel invocation of presentations. For example, a SCXML flow component may start three presentation components executing at the same time; one presentation component presents background music, another continuously listens for an attention word, and the third component presents the application whose name is spoken by the user after speaking the attention word.

Finally, the framework promotes, but does not mandate, various application practises. The strong implication is that markup on each layer should only express what is appropriate at that layer. For example, presentation layer components should not express 'flow' concepts such as 'goto'. So instead of writing a single large VoiceXML presentation which uses <goto> to navigate between application states expressed as <form>s, the application could be written as a flow component and a set of 'micro-dialog' presentation components. For example, a SCXML flow component which has a set of states corresponding to application states, together with a set of (reusable) VoiceXML presentations composed of a single VoiceXML <form> to interact with the user and return results to the flow component's states. This modular approach faciliates application development, maintainance, debugging and reuse.