0% found this document useful (0 votes)
67 views10 pages

Osc Best Practices Final

manual

Uploaded by

GiuseppeGigante
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
67 views10 pages

Osc Best Practices Final

manual

Uploaded by

GiuseppeGigante
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Best Practices for Open Sound Control

Andrew Schmeder and Adrian Freed and David Wessel


Center for New Music and Audio Technologies (CNMAT), UC Berkeley
1750 Arch Street
Berkeley CA, 94720
USA,
{andy,adrian,wessel}@cnmat.berkeley.edu.edu

Abstract 1.3 Format Structure


The structure of the Open Sound Control (OSC) OSC streams are sequences of frames defined
content format is introduced with historical con- with respect to a point in time called a timetag.
text. The needs for temporal synchronization and The frames are called bundles. Inside a bundle
dynamic range of audio control data are described
are some number of messages, each of which
in terms of accuracy, precision, bit-depth, bit-rate,
and sampling frequency. Specific details are given represent the state of a sub-stream at the
for the case of instrumental gesture control, spa- enclosing reference timetag. The sub-streams
tial audio control and synthesis algorithm control. are labelled with a human-readable character
The consideration of various transport mechanisms string called an address. In a message, the
used with OSC is discussed for datagram, serial address is associated with a vector of primitive
and isochronous modes. A summary of design data types that include common 32-bit binary
approaches for describing audio control data is encodings for integers, real numbers and text
shown, and the case is argued that multi-layered (Figure 1).
information-rich representations that support mul-
tiple strategies for describing semantic structure are
Unlike audio data, which is sampled regularly
necessary. on a fixed temporal grid, bundles may contain
mixtures of sub-streams that are sampled at dif-
Keywords ferent or variable rates. Therefore, the number
audio control data, signal quality assurance, best of messages appearing in a bundle may vary
practices, open sound control depending on what data is being sampled in
that moment.
1 Introduction
1.1 Definition OSC Stream

Open Sound Control (OSC) is a digital media Bundle


Message...
Bundle

(...)
content format for streams of real-time audio
control messages. By audio control we mean
any time-based information related to an audio OSC Bundle
stream other than the audio component itself. Bundle NTP Timestamp Encapsulated Message(s)
This definition also separates control data from Identifier
Seconds Seconds Message #1 (...)
stream meta-data that is essentially not time- #bundle
Fraction
Length OSC Message

dependent (e.g. [Wright et al., 2000]). Natu-


rally such a format has application outside of
audio technology, and OSC has found use in OSC Message

domains such as show control and robotics. Address Data


Typetags Arguments
1.2 What is Open /foo/bar ,ifs 1, 3.14, "baz"

It should be noted that OSC is not a standard


as it does not provide any test for certifica-
tion of conformance. The openness of Open
Sound Control is that it has no license require- Figure 1: Structure of the OSC content format
ments, does not require patented algorithms or
protected intellectual property, and makes no 1.4 History
strong assertions about how the format is to be OSC was invented in 1997 by Adrian Freed and
used in applications. Matt Wright at The Center for New Music and
Audio Technologies (CNMAT), where its first 2 Temporal Audio Control
use was to control sound synthesis algorithms
2.1 Instrumental Gestures
in the CAST system using network messaging
(The CNMAT Additive Synthesizer Toolkit) Musical instrumental gestures are actuations of
[Adrian Freed, 1997]. CAST was implemented a musical instrument by direct human control
for the SGI Irix platform, which was one of the (typically by kinetic neuro-muscular actuation).
first general purpose operating systems to pro- The transduction of a physical gesture into a
vide reliable realtime performance to user-space digital representation requires measurement of
applications [Freed, 1996] [Cortesi and Thomas, the temporal trajectory of all relevant dimen-
2001]. This capability was a key influence in sions of the physical system such as spatial
the design of the OSC format in particular with position, applied force, damping and friction.
respect to the inclusion of timestamped bundles An assessment of the performable dynamic
that enable high quality time synchronization range with respect to each physical dimension
of discrete events distributed over a network. is beyond the scope of this document, however,
Following the success of the CAST messaging in a somewhat general way it is possible to
system, the protocol was refined and published estimate the quantity of temporal information
online as the OSC 1.0 Specification in 2002 contained in an isolated sub-stream of a musical
[Wright, 2002]. performance.
1.5 Best Practices 2.1.1 Temporal Information Rate
Considering OSC as a content format alone does It is estimated that the smallest controllable
not account for how it can and should be used temporal precision by human kinetic actuation
in applications. A larger picture exists around is 1 millisecond (msec), based on an example of
the needs and requirements of sound control in an instrumental gesture called the flam that is
general with respect to the signal quality and known to have a very fine temporal structure
description of control data. In the past these [Wessel and Wright, 2002]. This limit of 1msec
needs have been underestimated by hardware coincides with the threshold for just noticable
and software designs, leading to less than ideal difference in onset time between two auditory
results. Research into the needs of audio control events.
data are summarized in this paper along with The flam technique in drumming is a method
recommendations for how to best apply OSC of striking the surface of a drum with two
features so that the requirements are satisfied. sticks so that the relative arrival time of each
stick modulates the timbre (spectral quality)
1.6 Systems Integration of the resulting sound. Because of the very
In the context of the Open System Interconnec- close temporal proximity of the events, a human
tion Basic Reference Model (OSI Model), OSC listener perceives them to be a single event
is classified as a Layer 6 or Presentation Layer where the spectral centroid of the timbre is
entity. correlated to the temporal fine structure. This
However in the larger scope of how OSC is inter-onset time can be reliably controlled by a
used, other layers are considered as part of the trained performer between 1-10 msec.
practice. Related topics and their associated The flam is an example of an instrumental
layer are listed in Table 1. gesture with temporal precision of 1 msec.
However the temporal accuracy of instrumental
OSI Layer Topic in OSC Practice gestures is at least an order of magnitude larger.
7 Application Semantics, Choreography The tolerable latency for performance of music
6 Presentation OSC Format requiring very tight rhythmic synchronization
5 Session Enumeration, Discovery between two players has been measured to be
4 Transport Latency, Reliability 10 msec [Chafe et al., 2004]. Ignoring the
3 Network Stream Routing complications of fixed versus variable delays, it
2 Frame Hardware Clocks, Timing seems reasonable to assume 10 msec as an esti-
1 Bit Cabling, Wireless, Power mate of temporal accuracy in an instrumental
gesture event. And finally, it is also reasonable
Table 1: OSC related topics in the context of to suppose that trained musicians can perform
associated OSI Model layers event rates up to 10 events per second in a
single sub-stream (polyphonic streams are not
considered here). to the simple fact that if a spatial parameter
The temporal information present in musical is modulated with a high frequency, perceptual
events can be estimated from these numbers. fusion takes place yielding a transformation of
The information in bits, also called the index the source in some way other than the intended
of difficulty in Fitts Law, is calculated from outcome. For example a virtual source with
the ratio between effective target distance and rotating dipole directivity pattern ceases to be
standard error of the effective target width : perceived as rotating for frequencies above 10
! hz [Schmeder, 2009a]. Similarly if the location
"
I = log2 1 + bits (1) of a virtual source alternates between two po-
sitions at a high rate, the observer perceives a
Suppose a musician performs the double- stationary source with a wider apparent source
strike flam at a rate of 10 hz. This is equivalent width.
to two tasks: 1) placement of events with an However, spatial audio control data requires
average separation of 100 msec and standard very high temporal precision to avoid phase arti-
error of 10 msec, 2) placement of two sub-events facts in multi-loudspeaker arrays. AES11-2003
with a separation of 10 msec and error of 1 recommends a between-channel synchronization
msec. Assuming the dual-task is repeated at error of +/- 5% per sample frame [Audio Engi-
10 hz then the total information rate is, neering Society, 2003] [Bouillot and et al, 2009].
An audio signal stream at 96khz requires that
# $ # $ the temporal synchronization error does not
! 100 10
I = log2 1+ + log2 1 + bits, (2) exceed 0.5 microseconds.
10 1
The effect of synchronization error in the
control stream for a spatial audio rendering
10 bits
I! = 68 . (3) engine may have an impact on the final repro-
sec sec duction quality. For example in a phase-mode
This estimate informs us that if the gesture as beamforming array ([Rafaely, 2005b]), synchro-
described was transformed to a digital represen- nization inaccuracy is similar to a positioning
tation without loss of information, the temporal error and synchronization jitter is functionally
dimension alone would require 68 bits/sec to similar to transducer noise [Rafaely, 2005a].
encode.
For sake of comparison the highest reported 2.3 Sound Synthesis
information transfer rates for target-selection In a pure signal processing context audio control
with a mouse in the ISO-1941-9 test are around data can be considered as the component of
3 bits/sec [MacKenzie et al., 2001]. the signal that is non-stationary. The rep-
A distinguishing feature of the musical con- resentation of control information is a topic
text of gesture is that the human performer uses specific to the design of any given algorithm,
a combination of extensive training, anticipa- and so it is impossible to state a universal set
tions of musical structure, and sensory feedback of requirements for audio control. It is worth
to continuously adjust and refine the gesture. noting that some audio synthesis algorithms
Therefore, musical gestures cannot be directly can have significant bandwidth requirements,
compared to reaction-time studies or task-based especially the data driven methods such as
assessments as they are used in the study of sinusoidal additive synthesis and concatenative
human-machine ergonomics. However it is clear synthesis.
from this example that musical gestures contain
a far greater density of temporal information 3 Temporal Quality Assurance
than is typical for human-computer interactions
3.1 Event Synchronization
in other contexts.
Assuming clock synchronization is available,
2.2 Spatial Audio Control the timetag can be used to schedule events
Spatial audio effects such as early reflections with fixed delays that account for the network
and reverberation are broad-band temporal- transport delay in communication between de-
spectral transformations, however the maxi- vices. The delays must be known and bounded.
mum useful rate at which a spatial audio effect This is called forward synchronization and can
can be controlled is limited to the sub-audible be efficiently implemented with the priority
frequency band between 0-50 hz. This is due queue data structure [Schmeder and Freed,
2008] [Brandt and Dannenberg, 1998] (Figure In Figure 4 we see what combinations of jitter
2). and carrier frequency will degrade a gesture
stream with 8-bit dynamic range.
OSC Bundle is x.Timestamp
Execute
Input x NOW? 0.01 msec 0.1 msec 1. msec 2. msec 4. msec
0.5 Hz 100.806 80.942 60.5853 54.4588 48.2834
1. Hz 89.4672 69.2973 49.5129 42.7719 37.1899
2 Hz 83.5256 64.1865 44.4936 37.811 32.166
is x.Timestamp
Future?
Defer 4 Hz 77.8606 58.3905 38.2024 32.4498 25.4497
8 Hz 72.3401 52.0053 31.2989 25.7653 20.1786
16 Hz 66.1133 45.8497 25.8291 19.7408 14.3312
32 Hz 60.2471 39.6844 19.7202 13.546 8.26448
is x.Timestamp 64 Hz 53.9285 33.8882 13.9203 7.90135 1.7457
Fault
Past?

Figure 4: Signal headroom as a function of


carrier frequency and standard deviation of
Figure 2: Forward synchronization scheduling
delay error. BOLD where effective headroom is
for presentation of messages
less than 8-bits dynamic range (8-bits = 48db).
Applications using OSC timestamps for syn-
chronization should make clear in their docu- 3.3 Jitter Attenuation
mentation what type of clock synchronization is From a simple inspection of the table shown,
to be used, if any, as well as limits on tolerable it is apparent that in order to transmit without
network delay. loss of information an instrumental gesture data
stream with frequency content up to 10 hz and
3.2 Effect of Jitter
8-bit dynamic range the jitter must be less than
Jitter is randomness in time. This may be 1/10th of a millisecond. Greater dynamic range
found in the transport delay, or in the clock requires proportionally less jitter where an error
synchronization error. Unless it is removed, the reduction by 50% improves dynamic range by 6
effect of jitter on a signal is to corrupt it with dB or 1-bit. To transmit a 10 hz signal with
noise. The noise is temporal so its magnitude 16-bit dynamic range requires jitter to be less
depends on the rate of change of the signal, than .5 microseconds.
or its frequency content. Even for relatively On contemporary consumer operating sys-
low-rate gesture signals, jitter noise can play tems typical random delays of 1-10 milliseconds
a significant role. In Figure 3 we see that a between hardware and software interrupts are
2 msec jitter causes a significant reduction in unacceptably large [Wright et al., 2004], and
the channel headroom. The fact that temporal this source of temporal noise ultimately inhibits
jitter has a strong influence on signal quality is the information transmission-rates for real-time
well known in the audio engineering community control streams. If the data is isochronously
where a typical sampling converter operating at sampled it is possible to use filters to smooth
96khz requires a clock with jitter measured in jitter [Adriensen and Space, 2005]. However
the picosecond range. this is not a typical expectation in audio con-
trol so something else must be done. If the
clock synchronization error is smaller than the
transport jitter (which is often the case) and the
data stream uses timestamps, then it is possible
to use forward synchronization to remove jitter
from a control signal (shown in Figure 5).
This operation trades lower jitter for a longer
fixed delay. Provided that total delays after
rescheduling are less than 10 msec, a satisfac-
tory music performance experience is possible.
3.4 Atomicity
Within each bundle exists a point-in-time sam-
Figure 3: Effective channel headroom after ple of a collection of sub-stream messages. The
jitter induced noise on a 10hz carrier signal scope over which the data is valid is defined both
contain any number of encapsulated messages
and there is no way for a parser to determine the
total number until the end of packet is reached.
Therefore the major need for sending OSC on
a serial transport is that the packets must be
encoded with some extra data to indicate where
the packet boundaries are, called a framing
protocol.
Two options have been proposed for packet
framing on serial links. The idea proposed in
the OSC 1.0 specification is an integer length-
count prefixed on the start of each packet that
indicates how many bytes to expect. This en-
Figure 5: Typical transport jitter of 1-5 msec coding requires a totally assured transport such
and its recovery by forward synchronization as TCP/IP or USB-Serial. A serial transport
with possible errors (such as RS232) will be
broken if there is any error in the encoded
with respect to the message addresses as well as
length.
some temporal window at the reference timetag.
The SLIP method for framing packets (RFC
In implementation practice for application
1055 [Romkey, 1988]) is an alternative that
design, message data needs to double-buffered
is robust to transmission errors and stream
or queued in a FIFO that is updated according
interruption. In general it is preferred over the
to the associated timetag. This prevents unin-
former for its simple error recovery.
tended temporal skew between sub-streams.
Assured transports must be used for any mis-
4 Transport Considerations sion critical application of OSC and generally
this means TCP/IP or something with a similar
A common transport protocol used with the feature set is needed.
OSC format is UDP/IP, but OSC can be encap-
sulated in any digital communication protocol. 4.3 Isochronous Stream Transport
The specific features of each transport can affect Isochronous protocols have guaranteed band-
the quality and availability of stream data at the width, in-order delivery of data, but are not
application layer. assured (no retries are made on failure). They
may or may not provide natural packet bound-
4.1 Datagram Transports aries.
A datagram transport (UDP being a canonical Ethernet AVB and the isochronous modes of
example) is a non-assured transport. Each USB and Firewire are examples of this transport
packet is either delivered in its entirety or not type. Ethernet AVB has additional features
delivered at all. Packets may be out of order, in in that it also provides a network clock for
which case OSC bundle timestamps can be used synchronization and its own timestamp for syn-
to recover the correct order. Datagram trans- chronization of events with total latency as low
ports provide a natural encapsulation boundary as 2 msec and synchronization error of less than
for each packet. In the case of UDP if the packet 0.5 microseconds. Class A streams in the AVB
exceeds the maximum transmission unit (MTU) framework are unique among all the transports
the packet may be fragmented over multiple discussed here in that they are guaranteed to
pieces. The fragmentation can introduce extra meet or exceed all synchronization and latency
delay as the UDP/IP stack must then reassem- requirements needed for audio control data as
ble the pieces before delivering the packet to an described in this paper [Marner, 2009].
application.
4.4 File Streams and Databases
4.2 Serial Stream Transport For the archival recording and recall of audio
Serial transports (TCP/IP being an example) control data streams, file systems and databases
provide a continuous data stream between end- can be treated as serial stream transports with
points. The OSC content format does not define high block-based jitter in the retrieval phase.
a means for representing the beginning and end By considering a file to be a type of serial
of a packet. In particular the OSC bundle can stream, OSC can use the same framing protocol
for serial stream encoding as a file format. necessary.
Again the SLIP method is recommended, and
it error recovery features enable robustness 4.6 Network Topology and Routing
against file corruption and truncation. The OSC 1.0 Specification included some lan-
When a recorded stream of OSC messages is guage regarding OSC client and server end-
replayed, the original timestamps are rewritten points. In fact this distinction is not necessary
according to a simple linear transformation, and OSC may be used on unidirectional trans-
and can then be reconstructed temporally using ports, and more complex network topologies in-
the forward synchronization scheduler. The cluding multicast, broadcast and peer-to-peer.
rewriting of timestamps does not require rel- The OSCgroups project provides a method for
ative time encodings. Since relative time can achieving multicast routing and automatic NAT
always be extracted from absolute time but traversal on the internet [Bencina, 2009].
not the converse, recording of OSC streams for A shortcoming of many current OSC imple-
archival purposes should always use absolute mentations using UDP/IP is missing support
time values. for bidirectional messaging. As a best practice
OSC Stream DB OSC Application implementations should try to leverage as much
Real-time Interface information as the transport layer can provide,
Commands
Interface
/play
/filter
Informational
Access Control
as this makes more simple the task of configu-
ration of the endpoint addresses in applications
/index

as well as stateful inspection at lower layers.


#bundle
Forward
Read
query results
Synchronization
Stream
Scheduler
5 Describing Control Data
The address field in an OSC message is where
Write
Stream
#bundle
to record
Real-time Gesture
Streams
descriptive information is placed to identify the
semantics of the message data. The set of
all possible addresses within an application is
Figure 6: Multi-stream recording and playback called an address space.
interface to a database
5.1 Descriptive Styles
A multi-stream approach for interfacing OSC Existing OSC practice includes a wide variety of
streams to a database and efficient queries over strategies for structuring address spaces. Here
the recorded data is demonstrated in the OSC- we intend to clarify the differences between
StreamDB project [Schmeder, 2009b] (Figure the styles rather than to promote any par-
6). ticular method as preferred. Four common
4.5 Bandwidth Constrained Transports styles have emerged over decades of software
engineering practice: RPC, REST, OOP and
When bandwidth constrained transports are RDF. Examples of OSC messages in each style
required such as wireless radios, OSC may be accomplishing the same task (setting the gain
modified with some effort to enable lower bit- on a channel) are given here.
rates. Depending on the nature of the data
stream, adaptive sampling dependent on the 5.1.1 RPC
information-rate can be used to reduce the The RPC (Remote Procedure Call) style em-
number of messages on the network. Interpo- ployes functional enumeration, and lends itself
lation between frames can be used to recover a to small address spaces of fixed functions:
smooth control signal in reconstruction. This /setgain (channel number = 3) (gain value = x)
is likely to work well for instrumental gesture
data as the actual effective number of new bits 5.1.2 REST
of information per second is relatively low. The REST (Representational State Transfer)
Another major source of bit-sparsity in OSC style encourages state-machine free represen-
streams is the message address field. Because tations by transmitting the entire state of an
it is typically a human-readable string, and entity in each transaction. Web application pro-
English text has a bit rate of 1-1.5 bits per grammers are familiar with this style wherein
character, about 80% of the bits are redundant. every time a page is loaded, the application has
A dictionary-type compression scheme could two phases: setup (recreating the entire applica-
be used to compress the address strings if tion state from the transferred representation)
and teardown (throwing it all away). The state- operations, patterns enable one-to-many map-
free property is what enables web-browsers to pings between patterns and groups of messages.
always pages outside the context of a browsing However this technique is only useful if the
session by recalling a bookmark. target messages have a structure that enables
OSC address spaces using the REST style of the provided pattern syntax to make useful
resource enumeration have a familiar appear- groupings. This structure is usually present
ance since it is the most common style used in when the address space follows the REST design
the construction of hyperlink addresses on the paradigm. Designers of address spaces for
web. applications should consider how the resulting
addresses might make use of grouped-control by
/channel/3/gain (x)
patterns.
5.1.3 OOP /channel/*/gain (common gain value x)
The OOP (Object Oriented Programming) style Some effort is needed to retain efficiency of
is based on an intuitive concept of objects that query operators in very large address spaces.
are self-contained entities containing both at- This is possible using database structures such
tributes and specialized functions called meth- as the RDTree [Schmeder, 2009b].
ods that are procedures transforming their own
attributes. 5.3 Problems with Stateful Encodings
/channel/3@gain (x) A stateful encoding of a control data stream
/channel/3/setgain (x) is one where the meaning of a message has
some dependence on a previously transmitted
OOP enables abstraction and layering in large message. The interpretation of the message by
systems. It may also need notations beyond the the receiver requires some memory of previously
basic / delimiter used in path-style addresses received messages in addition to the necessary
since the OOP structure requires differentiation logic to correctly fuse the information. This
of the object, attribute and method entity logic is typically a finite state machine, although
types. In the above example we have used in general it can be more complex.
@ following the XPath notation to indicate an Suppose that there is a switch, called
attribute. In Jamoma a : character is used to /button, with two possible states, off or
similar effect [Place et al., 2008]. on, represented by the numbers 0 and 1
5.1.4 RDF respectively. A designer wishing to conserve
The RDF (Resource Description Framework) network bandwidth decides only to transmit
style employs ontological tagging to describe a message when the switch changes from one
data with arbitrarily complex grammars. This state to the other and so sends the number +1
style of control is the most powerful of the to indicate a transition from 0 to 1 and the
alternatives shown here, however its use also number -1 to indicate a transition from 1 to 0.
requires greater verbosity since it makes no
+1
semantic assumptions about the data structure.
The interpretation of the delimiter / in OSC 0 1
as a hierarchical containment operator as it
implies in the REST and OOP paradigms is -1
not used in this style. Instead it is interpreted
as an unordered delimiter between tags, and Figure 7: Finite state machine for parsing the
each tag is a comma-separated triple of subject, transitions between a two states
predicate, and object entities.
The following is a valid sequence of messages
/channel,num,3 that can be verified at by a receiver using the
/op,is,set
/lvalue,is,gain
finite state machine shown in Figure 7.
/rvalue,units,dB (x) /button +1
/button -1
5.2 Leveraging Pattern Matching /button +1
/button -1
In OSC there a type of query operator called
address pattern matching. Similar to the use of A potential problem with this representation
wildcard operators in command-line file system is that it is not robust to any errors in the
Device
transmission of the data. Suppose that a Abstraction
message is lost due to the use of a non-assured s
u A A
transport such as UDP/IP, the sequence is then: Mapping A f f
Transforms v
/button +1 A A
t
/button +1 Signal +
/button -1 Processing -

In this example it is possible to make a


Electronic
more complex state machine that is capable of Sensing
of recovering from the missing data, but we
can immediately see that the complexity of the Physical
Materials
program has doubled (Figure 8).
+1
User Action

-1 0 1 +1
Figure 9: Intermediate layers of representation
between user interface controller and an appli-
-1
cation
Figure 8: Finite state machine for parsing the
transitions between a two states with error 5.4.1 Complications of Mapping
recovery logic
The use of transformational mapping of gesture
An alternative solution eliminates the need data is an important aspect in the design of
for a parsing engine entirely, by simply trans- interactive musical systems [Hunt et al., 2003].
ferring the entire state of the switch in every In many cases useful mapping transformations
message. carry out some type of information fusion that
/button 0 is a non-linear transformation of the data. How-
/button 0 ever non-linear transformations require extra
/button 0 attention because they transform uniform noise
/button 1 into a non-uniform noise with a complicated
/button 1
spatial structure. It is possible to design adap-
/button 0
... tive filters that are optimal for a non-linear
transformation, however this requires a more
Furthermore, if these messages are contin- complex processing graph than what is shown
uously transmitted even when the state does in Figure 9.
not change, then the receiver needs to make no
Consider the non-linear transformation func-
special effort to recover from a missed message.
tion,
While this example is contrived to the point of xy
being trivial, it does demonstrate that stateful f (x, y) = . (4)
x+y
encodings require more complex programs es-
pecially in the case of error handling. On mod- If x and y are corrupted with any noise (which
ern network transports with typical bandwidth is inevitable) then the transformed variable
capacity of 1Gbits/sec, the simplicity of state- f (x, y) will greatly amplify that noise when x
free representations is often more valuable than and y approach zero. This is evident from
saving network bandwidth. the fact that its derivative is unbounded as
(x, y) (0, 0).
5.4 Layering Control Data
Between the source user action performed on a xy 1
x,y f (x, y) = (5)
human input device to the high level application (x + y)2 x+y
control stream, there are several intermediate
layers of representation [Follmer et al., 2009] Therefore thresholding (outlier rejection) and
(Figure 9). The OSC format can be used at noise filtering must take place after the map-
every layer where a digital representation of ping transform, even though they are ostensibly
the data stream is present. Even though such signal processing layer operations (see Figure
streams may ultimately be not used in high 10). In other words, the layer model of Figure
level abstractions it is useful to retain the OSC 9 is not entirely correct as the layers are not
address labels at each step. strictly ordered. To enable out-of-order process-
ing across layers, designers should retain ver- bandwidth. Except for perhaps the most eso-
sions of data streams before and after mapping teric applications, a 0.5 microsecond temporal
transformations under different address labels. precision is sufficient for control of any audio
synthesis algorithm.
Outlier 6.1.4 Atomicity
Rejection
In all cases careful use of double-buffering, lock-
Mapping
Transforms free queues and local memory-barrier opera-
Noise
Filtering
tions should be used to ensure a best-effort
Signal is made for minimizing the synchronization
Processing
Nonlinear skew between bundle-encapsulated sub-stream
Function
messages.
Electronic

Raw Sensor
Sensing 6.1.5 Latency and Jitter
Data
The latency and jitter of secondary software
interrupts typical of for human-input device
Figure 10: Cross-layer dependencies exist in the streams are detrimental to the quality of control
processing chain for input control data. data. Bounded latency and minimal jitter
should be ensured for audio control data.
6 Conclusion 6.2 Control Meta-data
Here we present a brief summary of each point 6.2.1 Interface Design Patterns
made in the document. Many styles of meta-data description are
6.1 Transport and Synchronization possible including procedural (RPC), resource-
6.1.1 Instrumental Gesture Control oriented (REST), object-oriented (OOP) and
ontology-oriented (RDF). Application designers
For data streams from measurement of human
should feel free to choose the most appropriate
kinetic gestures, the temporal synchronization
style, however designers creating generic tools
error should be not more than .1 milliseconds to
for OSC processing should support as many
ensure lossless transmission of periodic signals
styles as possible.
up to 10 hz with 8 bit dynamic range, with
each extra bit of dynamic range requiring half 6.2.2 Stateful Representation
the temporal error (50 usec for 9 bits, 25 usec When stateful representations of control data
for 10 bits, etc.). To resolve events with 1 streams are used (with care), then assured
msec relative temporal precision the sampling transports should also be employed to reduce
frequency of measurement should be 2000 hz. software errors that may be triggered by trans-
6.1.2 Spatial Audio Control mission errors.
The transport should be capable of updating 6.2.3 Multi-Layered Representation
the spatial audio parameters at rates of 100hz Retaining data stream representations at mul-
in order to resolve the full range of perceivable tiple levels of abstraction is useful as some
spatial effects that may extend up to 50 hz. operations on control data streams cannot be
For spatial audio control data used in beam- performed in a strictly sequential order. The
forming or wavefield synthesis rendering, fol- general process structure for transformation of
lowing the AES recommended limits the syn- control data is a directed graph.
chronization error should be less than 5% of
a sample frame for the highest controlled fre- 7 Acknowledgements
quency. For example control over coefficients in
We are grateful to Meyer Sound Laboratories
a phase-mode beamforming array operating up
Inc. of Berkeley CA for financial support of
to 10 khz requires a synchronization accuracy
this work. The anonymous reviewers provided
of 5 microseconds.
helpful suggestions to improve this document.
6.1.3 Digital Audio Synthesis Control The users of OSC in the community includ-
For audio synthesis algorithms in general the ing computer music application designers and
needs of control data are dependent on the researchers have played an important role in
nature of the algorithm and may vary widely bringing to light important issues related to
depending on the level of detail and control audio control.
References Geoffrey M Marner. 2009. Time Stamp Ac-
Matthew Wright Adrian Freed. 1997. CAST: curacy needed by IEEE 802.1AS. Technical
CNMAT Additive Synthesis Tools. http:// report, IEEE 802.1 AVB TG.
archive.cnmat.berkeley.edu/CAST/. Timothy Place, Trond Lossius, Alexander
Fons Adriensen and A Space. 2005. Using Jensenius, Nils Peters, and Pascal Baltazar.
a DLL to Filter Time. In Proceedings of the 2008. Addressing Classes by Differentiating
Linux Audio Conference. Values and Properties in OSC. In NIME.
Audio Engineering Society. 2003. AES B. Rafaely. 2005a. Analysis and design
Recommended Practice for Digital Audio of spherical microphone arrays. Speech and
Engineering - Synchronization of Digital Audio Processing, IEEE Transactions on,
Audio Equipment in Studio Operations. 13(1):135143, Jan.
Technical Report 11, AES. B. Rafaely. 2005b. Phase-mode versus
Ross Bencina. 2009. OSCgroups. delay-and-sum spherical microphone array
http://www.audiomulch.com/~rossb/ processing. Signal Processing Letters, IEEE,
code/oscgroups/. 12(10):713716, Oct.
Nicolas Bouillot and et al. 2009. AES White J. Romkey. 1988. RFC1055 - Nonstandard for
Paper: Best Practices in Network Audio, transmission of IP datagrams over seria lines:
volume 57 of JAES. AES. SLIP. Technical report, IETF.
Eli Brandt and Roger Dannenberg. 1998. Andy Schmeder and Adrian Freed. 2008.
Time in Distributed Real-Time Systems. In Implementation and Applications of Open
Proceedings of the ICMC, pages 523526, San Sound Control Timestamps. In Proceedings
Francisco, CA. of the ICMC, pages 655658, Belfast, UK.
ICMA.
Chris Chafe, Michael Gurevich, Grace Leslie,
and Sean Tyan. 2004. Effect of time delay Andrew Schmeder. 2009a. An Exploration
on ensemble accuracy. In In Proceedings of Design Parameters for Human-Interactive
of the International Symposium on Musical Systems with Compact Spherical Loud-
Acoustics. speaker Arrays. In Ambisonics Symposium.
David Cortesi and Susan Thomas. 2001. Andrew Schmeder. 2009b. Efficient Gesture
REACT Real-Time Programmers Guide. Storage and Retrieval for Multiple Applica-
Number Document Number 007-2499-011. tions using a Relational Data Model of Open
Silicon Graphics, Inc. Sound Control. In Proceedings of the ICMC.
Sean Follmer, Bjorn Hartmann, and Pat David Wessel and Matthew Wright. 2002.
Hanrahan. 2009. Input Devices are like Problems and Prospects for Intimate Musical
Onions: A Layered Framework for Guiding Control of Computers. Computer Music
Device Designers. In Workshop of CHI. Journal, 26:1122.
Adrian Freed. 1996. Audio I/O Programming Matthew Wright, Amar Chaudhary, Adrian
on SGI Irix. http://cnmat.berkeley.edu/ Freed, Sami Khoury, David Wessel, and Ali
node/8775. Momeni. 2000. An xml-based sdif stream
Andy Hunt, Marcelo M. Wanderley, and relationships language. In International
Matthew Paradis. 2003. The Importance of Computer Music Conference, pages 186189,
Parameter Mapping in Electronic Instrument Berlin, Germany. International Computer
Design. Journal of New Music Research, Music Association.
32(4):429440, December. Matthew Wright, Ryan Cassidy, and Michael
I. Scott MacKenzie, Tatu Kauppinen, and Zbyszynski. 2004. Audio and Gesture
Miika Silfverberg. 2001. Accuracy measures Latency Measurements on Linux and OSX.
for evaluating computer pointing devices. In Proceedings of the ICMC, pages 423429.
In CHI 01: Proceedings of the SIGCHI Matthew Wright. 2002. Open Sound
conference on Human factors in computing Control 1.0 Specification. http:
systems, pages 916, New York, NY, USA. //opensoundcontrol.org/spec-1_0.
ACM.

You might also like