Sound
Why do you think people uses
SOUND
in multimedia project?
How do we hear sound?
Sound Waves: Sounds start as vibrations in the air.
Ear Catching: Your ear's outer part catches these
vibrations.
Vibrating Drum: The vibrations reach your
eardrum and make it vibrate too.
Tiny Bones: Three tiny bones in your ear pass on
these vibrations and make them stronger.
Fluid in Ear: The strong vibrations move into fluid
in your inner ear.
Hair Cells: In the inner ear, there are tiny hair
cells. They move and send signals to your brain.
Brain Magic: Your brain turns these signals into
what you hear – like music or someone talking.
So, in simple terms, sound starts as vibrations,
goes through your ear parts, gets stronger, turns
into signals in your ear, and your brain turns those
signals into the sounds you recognize!
How is Sound Created? Sound is created by vibrations.
• Vibration: Sound begins when something vibrates. This could be a guitar string, vocal cords, drumhead, or any object that can move
back and forth. The vibrating object will radiate sound waves outward in all direction
• Creation of Waves: The vibrations of the object disturb the air particles around it, creating waves of compression and rarefaction.
• Propagation: These waves travel through the air, carrying the energy of the vibrations with them.
• Reception by Ear: When these waves reach your ear, they cause your eardrum to vibrate.
• Transmission to Brain: The vibrating eardrum sets off a chain of events in the ear that eventually sends signals to the brain.
• Perception: Your brain interprets these signals as the sound that corresponds to the original vibrations.
• So, in essence, sound is a result of vibrating objects creating waves in the air, which our ears detect and our brains interpret as sound.
Properties
of sound waves:
Every Sound will vary in term of
1. Amplitude(or loudness)
Amplitude refers to the maximum extent of a
vibration or oscillation, measured from the
position of equilibrium. In the context of sound
waves, amplitude is a crucial characteristic that
influences the loudness or volume of a sound.
2. Frequency(or pitch)
Frequency refers to the number of oscillations
or cycles of a wave that occur in a unit of time.
In the context of sound waves, frequency is
associated with the pitch of the sound.
Amplitude
energy
loudness(volume)
Frequency
• Is the speed of the vibration
• Determines the pitch of the sound
• The unit of measurement for
frequency is Hertz, which represents
cycles per second
Digitization of sound
• Sound is a continuous wave that travel through the air
• The wave is made up of pressure differences.
• Sound is detected by measuring the pressure level at a location
• Sound wave has normal wave properties(reflection, refraction
diffraction)
Digital Sound
• Digital sound is made up of discrete value of '0' and '1' and is
approximation of analog sound.
• The digital nature of sound makes it possible for digital sound files to
be edited, processed and stored in the PC.
• Digitization of sound involves converting analog audio signals, which
are continuous and varying, into digital format, represented by
discrete binary code.
• The process begins with an analog sound source, such as a
microphone capturing voice or a musical instrument generating
sound.
• During the recording, these analog wave are converted into digital
signals through the process called sampling.
• An analog-to-digital converter (ADC) is used to transform the
continuous analog signal into a digital representation. The ADC
samples the analog signal at regular intervals, measuring its
amplitude at each point. These measurements are then converted
into binary code.
Sampling
• The sampling process entails chopping up the analog waves
into many parts and taking every part(called samples) at
regular intervals to approximate the original sound.
• Sampling process or sampling rate measures the number of
samples taken per second.
• Sampling rate ranges from 11025 to 44100 samples per
second and and is measured in Hertz
• A higher sampling rate implies that more samples are taken
during the given time interval and ultimately, the quality of
reconstruction is better.
• Sampling Rate
Is the number of samples taken per second (measured in KHz)
• Example
A digital audio that has the sampling rate of 44.1KHz has 44100 of
samples per second
Quantization
• Quantization is a process of representing the amplitude of each
sample as integers or numbers.
• How many numbers are used to represent the value of each
sample known as sample size or bit depth or resolution.
• Commonly used sample sizes are either 8 bits or 16 bits. The
larger the sample size, the more accurately the data will describe
the recorded sound.
• An 8-bit sample size provides 256 equal measurement units to
describe the level and frequency of the sound in that slice of time.
• A 16-bit sample size provides 65,536 equal units to describe the
sound in that sample slice of time.
• The value of each sample is rounded off to the nearest integer
(quantization) and if the amplitude is greater than the intervals
available, clipping of the top and bottom of the wave occurs
• Sampling Size
(Resolution)
Is the number of bits(bit
depth) used to describe
the sample
Examples
A 16-bit digital audio will
have 65536 of values to
choose for every sample
that is taken
• Quantization Error/Noise - The difference between
sample and the value assigned to it is known as
quantization error or noise.
• Signal to Noise Ratio (SNR) - Signal to Ratio refers
to signal quality versus quantization error. Higher the
Signal to Noise ratio, the better the voice quality.
Working with very small levels often introduces more
error.
Making Digital audio files
• There are 2 ways to make a digital audio files:
1 Record your sound through a microphone
(Plug a microphone into microphone jack on your
computer)
2 Digitize analog audio(Eg from cassette tape)
(Connect the analog playback device to your
computer, then use audio digitizing software)
Editing digital recordings
➢Trimming:
To remove "dead air" or blank space from the front of a recording and
any unnecessary extra time off at the end of a recording
➢Splicing:
To remove the extraneous noise that inevitably creep into a recording
by Cutting them
➢Assembly
To paste together recordings into longer recordings
➢Volume Adjustment(normalization)
To raise or lower volume level of each recordings to provide a
consistent level for the audio as a whole
Editing digital recordings
➢Format conversion
To save files in your choice of format to allow the recording to be
played on a particular platform
➢Resampling or Downsampling:
To reduce the number of samples of the recording which allows the
recording to have a lower resolution(When compressing an audio to
save space, this method will be used)
➢Fade-Ins and Fade-outs:
To smooth out the very beginning and the very end of a sound file
Editing digital recordings
➢Equalization
To modify the recording's frequency content so that it sounds brighter
or darker
➢Time stretching:
To alter the length (in time) of a sound file without changing its pitch
➢ Reversing Sound
To reverse all or portion of digital audio recording
Mono and stereo
recording
➢Monophonic (Mono)
Recording:
Use only a single channel
Both speakers will produce the
same copy of the audio signal
➢Stereo recording
Uses two channels
Each speaker will produce
different audio signal
Audio resolution
• Determines the accuracy with which sound can be digitized
Higher resolution
More bits stored as the sampling size
More accurate sound recorded
• Is measured in bits per sample
(8 bits=1 byte)
How to calculate the size of a digital recording
MIDI Audio
➢MIDI stands for Musical Instrumental Digital
Interface
➢It is not a digitized sound
➢It actually does not contain any sound, instead
it contains performance data(a set of
instructions)
➢Where it tells the computer how to generate a
certain sound
➢MIDI File:
Is a list of time-stamped commands that are
recordings of musical actions
➢So when this is sent to a MIDI playback device,
a sound is created
Digital Audio MIDI
Digital Audio refers to the reproduction and transmission A MIDI is a software for representing musical information
of sound stored in a digital format. in a digital format.
Abstract Representation of musical sound and sound
Digital Representation of physical sound waves.
effects.
Digital Audio comprises analog sound waves that are MIDI comprises a series of commands that represent
converted into a series of 0s and 1s. musical notes, volume, and other musical parameters.
Actual Sound is stored in a digital audio file. No actual sound is stored in the MIDI file.
Files are large in size and are loose. Files are small in size and compact.
The quality of sound is in proportion to the file size. The quality of sound is not in proportion to the file size.
They reproduce the exact sound in a digital format. They sound a little different from the original sound.
Digital audio is used for recording and playback of music, MIDI is used for creating and controlling electronic music,
sound effects, and voiceovers. such as synthesizers and drum machines.
Basic MIDI concepts
• MIDI (Music Instrument Digital Interface) is a standard that
manufacturers of electronic musical instruments have agreed upon.
• It is a set of specifications they use in building their instruments so
that the instruments of different manufacturers can, without
difficulty, communicate musical information between one another.
• MIDI is a protocol that enables computer, synthesizers, keyboards and
other musical device to communicate with each other.
• It is not a musical instrument. It is just a digital interface for
musical equipment
The MIDI system is a powerful ecosystem for creating and manipulating music. It involves
four key components:
1. Controllers:
• These are instruments or devices that generate MIDI data, like keyboards,
drum pads, wind controllers, or even guitars equipped with MIDI pickups.
• Controllers capture your musical performance, translating it into MIDI
messages containing information about notes played, their duration,
velocity, and other expressive details.
• Think of them as your "musical input" source.
• 1. Controllers:
• MIDI Keyboard:
• MIDI Drum Pad:
• MIDI Wind Controller:
• 2. Synthesizers:
• These are sound generators that receive and interpret MIDI data from controllers.
• Synthesizers use this data to create sounds based on their internal sound engines. They
can generate a vast array of sounds, from classic analog waveforms to complex digital
samples and textures.
• Think of them as the "sound engine" that transforms your MIDI data into audible music.
• 2. Synthesizers:
• Analog Synthesizer:
• Digital Synthesizer:
• Virtual Synthesizer (Software):
• Modular Synthesizer:
3. Sequencers:
• These are software or hardware tools that record and edit MIDI data.
• You can use sequencers to arrange notes, control parameters, automate
processes, and build complex musical compositions.
• Think of them as the "musical score editor" that allows you to assemble
and manipulate your musical ideas.
• Sequencers:
• Hardware Sequencer:
• Software Sequencer: FL studio interface
• Groovebox (Sequencer with built-in sounds):
4. MIDI Network:
• This is the communication backbone that connects all the components.
• MIDI data is transmitted between devices via cables, USB connections, or
even wirelessly.
• Think of it as the "musical conversation" that allows the controller, synthesizer,
and sequencer to interact and collaborate.
MIDI Network:
• MIDI Cables:
• USB MIDI Interface:
• Wireless MIDI Adapter:
• MIDI Network Hub or Router:
A MIDI interface has two different
components:
• Hardware connects the equipment. It specifies the physical
connection between musical instruments and deals with electronic
signals that are over the cable.
• A data format encodes the information travelling through the
hardware. The encoding includes the notion of beginning and end of
a note, basic frequency and sound volume.
• Through the MIDI interface, a computer can
control output of individual instruments. On
the other hand, the computer can receive,
store or process coded musical data through
MIDI Devices the same interface.
• The heart of any MIDI system is the MIDI
synthesizer device. A typical synthesizer
looks like a simple piano keyboard with a
panel full of buttons.
• Sound Generator
• The principal purpose of the generator is to produce an
Most audio signal that becomes sound when fed into a
loudspeaker. By varying the voltage oscillation of the
synthesizers audio signal, a sound generator changes the quality of
the sound – its pitch, loudness and tone – to create wide
have the variety of sounds and notes.
• Microprocessor
following • The microprocessor communicates with the keyboard to
common know what notes the musician is playing, and with the
control panel to know what commands the musician
components: wants to send to the microprocessor. The microprocessor
then specifies note and sound commands to the sound
generators; in other words, microprocessor sends and
receive MIDI messages.
• Keyboard
• The keyboard affords the musician's direct control of
the synthesizer. Pressing keys on the keyboard signals
the microprocessor knows what notes to play and
how long to play them.
• Control Panel
• The control panel controls those functions that are
not directly concerned with notes and durations. It
includes: a slider that sets the overall volume of the
synthesizer, a button that turns the synthesizer on
and off, and a menu that calls up different patches
for the sound generators to play.
• Auxiliary Controller
• They are available to give more control over
the notes played on the keyboard.
• Memory
• Synthesizer memory is used to store patches
for the sound generators and settings on the
control panel.
MIDI Messages
• MIDI messages transmit information between MIDI devices and
determine what kinds of musical events can be passed from device to
device. The format of MIDI messages consists of the status byte (the
first byte of any MIDI message), which describes the kind of message,
and data bytes (the following bytes).
Structure of a Status Byte:
• First bit (MSB): Always set to 1, distinguishing it from data bytes.
• Next 3 bits: Identify the type of MIDI message (e.g., Note On, Note Off, Program Change).
• Last 4 bits: Specify the MIDI channel (1 to 16) the message is directed to.
• Common Status Byte Types:
• Note On (1001nnnn): Tells the receiving instrument to play a specific note on the
specified channel.
• Note Off (1000nnnn): Tells the receiving instrument to stop playing a specific note on the
specified channel.
• Polyphonic Key Pressure (1010nnnn): Indicates pressure applied to a specific key.
• Control Change (1011nnnn): Modifies various parameters like volume, modulation, etc.
• Program Change (1100nnnn): Selects a new sound program on the receiving instrument.
Structure of a Data Byte:
Data Byte 1:
• Represents the note number
• Each bit represents a semitone
• 0 represents the lowest note (C0)
• 127 represents the highest note (G8)
• Value 00110000 corresponds to middle C (note number 60)
Data Byte 2:
• Represents the velocity of the key press
• Higher values result in louder sounds
• Values typically range from 0 (no pressure) to 127 (full pressure)
MIDI messages are divided into two types:
• 1. Channel Messages
• Channel messages go only to specified devices. There are two types of
channel messages:
• -Channel voice message send actual performance data between MIDI
devices, describing keyboard action, controller action and control panel changes.
They describe music by defining pitch, amplitude, duration and other
sound qualities.
• -Channel mode messages determine the way that a receiving MIDI
device responds to channel voice messages.
• 2. System Messages
• System messages go to all devices in a MIDI system because no
channel numbers are specified. There are three types of system
messages:
• - System real-time messages are very short and simple,
consisting only one byte. They are specially used for system
reset, timing clock etc.
• - System common messages are commands that prepare
sequencers and synthesizers to play a song. They are used for
song selection, tuning the synthesizers etc.
• - System exclusive messages allow MIDI manufacturers to create
customized MIDI messages to send between their MIDI devices.
Feature Channel Message System Message
Channel number Included Not included
Data byte(s) Up to 2 Variable
Structure Fixed Varies by type
Controls specific sound generation & Manages system functions &
Function
performance communication
MIDI and SMPTE Timing Standards
• MIDI reproduces traditional note length using MIDI clocks, which are
represented through timing clock messages.
• Using a MIDI clock, a receiver can synchronize with the clock cycles of
the sender.
• For example, a MIDI clock helps keep separate sequencers in the
same MIDI system playing at the same tempo.
• In Order To keep a standard timing reference, the MIDI specifications
state that 24 MIDI clocks equal one quarter note.
SMPTE Timing Standards
• The SMPTE timing standard,( Society of Motion Picture an Television Engineers) can be used.
• The SMPTE timing standards was originally developed by NASA as a way to mark incoming
data from different tracking stations so that receiving computers could tell exactly what time
each piece of data was created.
• In the film and video version promoted by the SMPTE, the SMPTE timing standard acts as a
very precise clock that stamps a time reading on each frame and fraction of a frame,
counting from the beginning of a film or video.
• To make the time readings precise, the SMPTE format consists of
hours:minutes:seconds:frames:bits (e.g.,30 frames per second), uses a 24-hour clock and
counts from 0 to 23.
• To divide time even more precisely, SMPTE breaks each frame into 80 bits.
• it is dividing time into segments as small as one twenty-five hundredth of a second.When
we mention "one twenty-five hundredth of a second," it refers to the capability of SMPTE to
represent time intervals as small as 1/2500th of a second. This level of accuracy is crucial in
situations where extremely precise synchronization is required.
MIDI Software
• Once a computer is connected to a MIDI system, a variety of MIDI
applications can run on it. Digital computers afford the composer or
sound designer unprecedented levels of control over the evolution
and combination of sonic events.
The software applications generally fall into four major categories:
• Music recording and performance applications
This category of applications provide functions such as recording of
MIDI messages as they enter the computer from other MIDI devices,
and possibly editing and playing back the messages in performance.
• Musical notations and printing applications
This category allows writing music using traditional musical notation.
The user can then play back the music using a performance program or
print the music on paper for live performance or publication.
• Synthesizer patch editors and librarians
These programs allow information storage of different synthesizer
patches in the computer's memory and disk drives, and editing of
patches in the computer.
• Music education applications
These software applications teach different aspects of music using the
computer monitor, keyboard and other controllers of attached MIDI
instruments.
Speech Generation
1. Basic Notions
2. Reproduced Speech output
3. Time dependent sound concatenation
4. frequency –dependent sound concatenation
Speech Generation
> Speech is a form of sound that can be understood and
generated by humans and also by machines.
> The human ear is almost sensitive in the range from 600 Hz to
6000 Hz.
> A machine can also support speech generation and recognition.
> Today, workstations and personal computers can recognize
25,000 possible words.
Basic Notions
• An important requirement for speech generation is real-time signal
generation. With such requirement met, a speech output system could
transform text into speech automatically without any lengthy
preprocessing.
• An example is the spoken time announcement of a telephone answering
service.
• Exactly, there are:
• Vowels – a speech sound created by the relatively free passage of breath
through the larynx(voice box) and oral cavity (an empty space), usually
forming the most prominent and central sound of a syllable.
• Consonants – a speech sound produced by a partial or complete
obstruction of the air stream by any of the various constrictions of the
speech organs.
Reproduced Speech output
• The easiest method of speech generation/output is to use
prerecorded speech and play it back in a timely fashion. The speech
can be stored as PCM (Pulse Code Modulation) samples.
• Further data compression methods, without using language typical
properties, can be applied to recorded speech.
1. Time dependent sound concatenation
2. Frequency –dependent sound concatenation
Reproduced Speech output
Time dependent sound concatenation
• Individual speech units are composed like building blocks, e.g. phones where
the composition can occur at different levels.
• Transitions between individual phones prove to be extremely problematic.
• The phones in their environment, i.e., the allophones (coarticulation ), are
considered in the second level.
• Creation of syllables as building blocks for words and sentences
• Prosody i.e. stress and melody course of a spoken phrase.
• Problem: Prosody is often context dependent.
• word “stairs” as concatenation of phones • Storing word “stairs” as a whole
2. Frequency‐dependent Sound
Concatenation
• Simulation of the vocal tract, e.g. Formant synthesis.
• i.e. Concatenating energy concentrations in the speech
signal‟s spectrum.
• Formant filtering of pulse and noise generators.
• Fixed sets of formant parameters for the generation of phones.
• Dynamic adjustment of parameters for coarticulation (2 or more
sounds together).
• Components of a speech synthesis system:
• Step 1: Generation of a Sound Script
Transcription from text to a sound script using a library containing (language
specific) letter-to-phone rules. A dictionary of exceptions is used for word with
a non-standard pronunciation.
• Step 2: Generation of Speech
The sound script is used to drive the time- or frequency-dependent sound
concatenation process.
Speech Analysis
❖Purpose of Speech Analysis:
-who is speaking: speaker identification for security purposes
-what is being said: automatic transcription of speech into text
-how was a statement said: understanding psychological factors of a speech pattern
(was the speaker angry or calm etc.)
❖The primary goal of speech analysis in multimedia systems is to correctly determine
individual words (speech recognition).
The system applies the following principle several times:
• In 1st step, the principle is applied to a sound pattern and/or word model. An
acoustical and phonetic analysis is performed.
• In 2nd step, certain speech units go through syntactical analysis; thereby, the
errors of the previous step can be recognized. In this case, syntactical analysis
provides additional decision help and the result is a recognized speech.
• The 3rd step deals with the semantics of the previously recognized language.
Here the decision errors of the previous step can be recognized and corrected
with other analysis methods. The result of this step is an understood speech.
Speech Transmission
Process of sending the speech/audio from sender to reciver with fundamental goal
to provide the same speech/audio (sound quality) an was generated at sender side.
Some Principles that are connected to speech generation and recognition :
Signal Form Coding
It is technique to achieve most efficient coding of the audio signal without considering
speech property & parameter.
Source coding in parametrized system
✓ Parametrized system works with source coding algorithm.
✓ It uses the speech/audio characteristics for data rate reduction.
E.g.: Channel vo-coder
Recognition / Synthesis Methods
➢Speech analysis (recognition) follows on sender side of speech
transmission system
➢Speech synthesis(generation) follows on the receiver side
➢Only the Characteristics of the speech elements are transmitter.
➢Used in data rate reduction.
Audio File Format
• Is a methodology in organising or compressing digitized sound's data into a
data file
• In Windows:
Digitized sounds are usually stored as WAV
• In Macintosh:
Digitized sounds are usually stored as AIFF or AFIC
• The most common file format to store audio is MP3
(It has a compression algorithm to help save space)
• To store audio and video together, MP4 is used
• The use of CD-ROM/XA format allows us to put several recordings of audio
onto a single CD-R disc
Adding sound to Project
Whenever you want to insert sounds inside your multimedia project, you need to:
➢ Determine the file formats that are compatible with your multimedia
authoring software and the delivery medium you will be using
➢ Study the sound playback capabilities offered by the end user's system
➢ Decide what kind of sound is needed (Eg Background music, sound effect) and
where these audio event will occur (Include inti your storyboard )
➢ Decide when and where you want to use digital or MIDI audio
➢ Acquire the source material either by creating it from scratch or purchasing it
➢ Edit the sounds to fit the project
➢ Test the sounds to be sure they are timed properly with your project
Tempo
• Tempo refers to the speed or pace at which a piece of music is played. It is a crucial element
in music that influences the overall feel and mood of a composition. Tempo is typically
measured in beats per minute (BPM), indicating the number of beats or pulses in one
minute.
• Here are some key points about tempo:
• BPM (Beats Per Minute): The tempo of a piece is often specified using BPM, which
represents the number of beats in a minute. A higher BPM indicates a faster tempo, while a
lower BPM suggests a slower pace.
• Musical Terms for Tempo:
• Allegro: Fast, quick, and lively.
• Moderato: Moderate or medium tempo.
• Andante: At a walking pace, usually slower than moderato.
• Adagio: Slow and stately.
• Presto: Very fast or quickly.
Tempo
• Metronome: Musicians often use a metronome, a device that produces a steady beat, to practice
and maintain a consistent tempo. The metronome is set to a specific BPM.
• Importance in Music:
• Tempo significantly influences the emotional impact of a piece. Faster tempos often create a sense
of energy, excitement, or urgency, while slower tempos may convey calmness, sadness, or
contemplation.
• Dynamic Changes: In some compositions, the tempo may change, adding variety and expressiveness. This
is often indicated by Italian terms in the sheet music, such as accelerando (getting faster) or ritardando
(slowing down).
• Example:
• A lively pop song might have a tempo around 120-140 BPM, creating an upbeat and energetic feel.
• A slow ballad or romantic piece may have a tempo of 60 BPM or even slower, evoking a more relaxed
and reflective mood.
• In summary, tempo is a fundamental aspect of music that defines its speed and character. Whether it's a fast-
paced dance track or a slow, melodic ballad, the tempo sets the rhythmic foundation for the
musical experience.
Here's how MIDI system work together:
• You play a note on your controller, generating MIDI data.
• This data is sent through the MIDI network to the synthesizer.
• The synthesizer interprets the data and triggers its sound engine to produce the corresponding sound.
• You can record your performance in a sequencer, creating a MIDI sequence.
• You can then edit and manipulate the sequence in the sequencer, adding notes, changing parameters, and arranging the music.
• You can playback the sequence to control the synthesizer again, creating a complete musical piece.
Additional elements:
• MIDI Effects: These are software or hardware units that manipulate MIDI data before it reaches the synthesizer. They can add
effects like reverb, delay, or distortion to your music.
• Sound Modules: These are specialized synthesizers that focus on specific sounds, like drums, strings, or pianos. They can be
integrated into your MIDI setup to expand your sonic palette.
Benefits of MIDI:
• Versatility: MIDI can control a wide range of instruments and devices, making it a flexible tool for various musical styles.
• Editability: MIDI data can be easily edited and manipulated, allowing for precise control over your music.
• Sharing & Collaboration: MIDI sequences can be easily shared and exchanged, facilitating collaboration between musicians.
• Cost-effectiveness: Many MIDI devices are relatively affordable, making them accessible to a wide range of musicians.
Overall, the MIDI system is a powerful and versatile tool that allows musicians to create, perform, and manipulate music in
countless ways. Understanding the components and their interactions is key to unlocking the full potential of this musical
ecosystem