Prof. Nitesh Guinde Associate Professor, Electronics & Telecommunication Department Goa College of Engineering
Prof. Nitesh Guinde Associate Professor, Electronics & Telecommunication Department Goa College of Engineering
FARMAGUDI, GOA
DEPARTMENT OF ELECTRONICS & TELECOMMUNICATION
ENGINEERING
2021 - 2022
by
A project submitted
in partial fulfilment of the requirements
for the degree of
Bachelor of Engineering
in
Electronics & Telecommunication Engineering
GOA UNIVERSITY
submitted by
———————————– ———————————
Internal Examiner External Examiner
(Guide Name)
———————————–
Head of Department,
Dr. H. G. Virani,
Professor, ETC Dept.
by
Yeslin Sequeira (P.R.No.:201807745)
Saideep Pradeep Naik (P.R.No.:201807328)
completed in the year 2019-2020 is approved as a partial fulfilment of the
requirements for the degree of BACHELOR OF ENGINEERING in Electronics &
Telecommunication Engineering and is a record of bona-fide work carried out
successfully under our guidance.
———————————–
Project Guide,
Dr. Nitesh Guinde
Associate Professor,
ETC Dept.
———————————– ———————————–
Head of Department Principal
Dr. H. G. Virani Dr. R.B. Lohani.
Professor, ETC Dept. Goa College of Engineering
We declare that the project work entitled GENERATING 3D AUDIO FOR HEADPHONES
submitted to Goa College of Engineering, in partial fulfilment of the requirement for
the award of the degree of B.E. in Electronics and Telecommunication Engineering is
a record of bona-fide project work carried out by me/us under the guidance of Prof.
Sangam Borkar. We further declare that the work reported in this project has not
been submitted and will not be submitted, either in part or in full, for the award of
any other degree or diploma in this institute or any other institute or university.
Date
Acknowledgement
The success of our work is incomplete unless we mention the names of our
respected teachers who made it possible, whose guidance and encouragement
served to beacon light and crowned our efforts with success.
It is our privilege to express our sincerest regards to our project guide, Prof. Nitesh
Guinde, Associate Professor, Goa College of Engineering for the valuable inputs, able
guidance, encouragement, whole-hearted cooperation and constructive criticism
throughout the duration of project. We deeply extend our sincere thanks to our
Principal, Goa College of Engineering Dr. R.B. Lohani and Dr. H.G. Virani, Professor,
Head, ETC Department, for encouraging and supporting us.
We are thankful to the teaching and non-teaching staff of the Electronics and
Telecommunication Department for their generous help directly or indirectly. We
take this opportunity to thank all our teachers who have directly or indirectly helped
our project. We pay our respects and love to our parents and all other family
members and friends for their love and encouragement throughout our career.
CONTENTS:
1. Chapter 1
1.1. Abstract
1.2. Introduction
1.3. Problem Statement
1.4. Objective
1.5. Literature Survey
2. Chapter 2
2.1. HRIR
2.2. Implementation of the project
2.3. Block Diagram
2.4. Code
1. 1. Abstract
In this report we will introduce concepts related to 3D audio. We will detail our signal
processing system. HRIRs will be analyzed. Practically measured HRIRs will be used in
our proof-of-concept MATLAB code to generate 3D audio for headphones.
1. 2. Introduction
Binaural audio is audio recorded from microphones inserted into human ears, or
lifelike dummies. This allows the audio to be affected by the outer ear, which gives
the human brain important information for discerning the direction of various sound
sources. When listening through headphones, the sound feels like it’s coming from
the real world. The aim of 3D audio is to produce audio similar to binaural audio
through digital signal processing.
We use an HRIR, which is a dataset that contains impulse responses measured from a
head-and-torso simulator (Bruel & Kjær Type 4128C), downloaded from
http://medi.uni-oldenburg.de/hrir/index.html to implement our proof-of-concept
MATLAB code. Note that we use HRIR interchangeably here to refer to the whole
dataset or an individual impulse response.
The work we have done so far is creating a MATLAB program for generating 3D
audio. The audio gets placed as desired in the virtual space when heard though
headphones. When a subject is told of the position of the sound, the effect is
convincing. However, unfortunately, in our blind tests, subjects had difficulty
discerning front and back positioning, therefore they couldn’t always accurately
predict the sound position. We used time domain convolution in the code.
1. 3. Problem Statement
Our end goal is to build a system in Verilog that generates 3D audio from an, as of
now undecided, N number of channels. Each channel will take monophonic audio
and position of sound as the input. An HRIR will have to be fed to the system. The
system will need mechanisms for interacting with the HRIR dataset and retrieving
requested impulse responses. The system will be completely real-time. The position
of sounds in the virtual audio space should be adjustable in real-time. Our current
approach for generating 3D audio is convolution in time domain, but we are looking
to shift to frequency domain as we suspect it will be more efficient, i.e., require less
calculations per sample generated.
1. 4. Objective
Our aim is to make a system to generate 3D audio. This type of audio is experientially
better for movies, tv and videogames in our opinion. There is a sense of immersion
and enjoyment one can feel when the audio matches the positioning of characters
and objects seen on screen. There already are 3D audio generation systems available
commercially in the market, such as Tempest 3D Audiotech in Sony’s PS5, Dolby
Atmos in various TVs, headphones, home theater systems, Xbox consoles, etc. The
aim of this project is just a learning experience for us and not meant to make any
advancement in the field.
1. 5. Literature Survey
Yang, Dayu, "3D Sound Synthesis using the Head Related Transfer Function. "
This project is very similar to ours. The author has performed the DSP processing in
the frequency domain, whereas we are performing it in time domain, as it is simpler
for us given our knowledge as of now. We are looking into implementing our system
in frequency domain, like the author, for efficiency.
2. 1. HRIR
HRIR stands for Head Related Impulse Response. It is a dataset containing left and
right ear impulse responses for various azimuth angle, elevation angle and distance.
As we said earlier, we will use it interchangeably to refer to the previously mentioned
or the individual impulse responses.
The following figure illustrates azimuth and elevation angles. 0 degrees azimuth is
directly in front of one’s face. Positive angles between 0 to 180 degrees all lie to the
right of the head, negative angles between 0 to -180 degrees lie to the left.
Consider a plane that is at ear level, parallel to the ground. An object on that plane
will have a 0 degrees elevation. Objects above that plane will have positive elevation
angle, objects below will have a negative elevation angle.
We are now going to observe a few HRIR plots. Below are a few plots of HRIRs for the
left and right ears for a few sound positions, captured from MATLAB.
We can see that for 80 cm distance, the impulse responses are louder for both the
left and right ear compared to 300 cm. The impulse responses are louder on the left
side if the object is on the left, vice versa when the object is on the right.
Above is a zoomed version of the first graph (dist = 300 cm, elev = 10 deg, azim = 45
deg). It shows the slight delay between the impulse responses. This happened due to
the sound reaching the right ear before the left ear, as the sound source is closer to
the right ear. The brain needs this slight delay to be able to tell the position of an
object.
To use HRIR to get 3D audio with a single sound source positioned at some (azimuth,
elevation, distance), we query the dataset for the impulse responses of both left and
right ear with said parameters. We convolute given monophonic audio with left and
right ear impulse response, giving us the left and right channels.
Listening to the audio through headphones should make us feel like the sound is
coming from said position. The effect may not always be successful in conveying the
positions accurately in our blind tests, there is difficulty in discerning front and back
positioning.
We have written code in MATLAB that generates 3D audio from some N number of
monophonic audio channels. The ‘N’ can be decided by the user. Each channel needs
specification of location of sound position in azimuth angle, elevation, distance. The
dataset we used allows only specific combinations of the 3 values to be used, which
is mentioned in the dataset’s documentation. The channel audio is specified by some
N audio files (mp3, wav, etc.), which must be provided by the user.
2. 3. Block Diagram
The above block diagram describes a system that generates 3D audio with only a
single audio source in the virtual audio space. f(t) is a monophonic audio signal. The
sound position is to be specified in (azimuth angle, elevation angle, distance). The
convention for describing these parameters is made clear with an illustration in the
HRIR section of this report.
There are 2 blocks in the system responsible for retrieving the left ear and right ear
HRIR. The sound position is used to query the dataset; the appropriate impulse
responses are retrieved from the dataset, i.e., hL(t), hR(t). These impulse responses
are convoluted in the 2 convolution blocks with f(t). The output of this system is the
left channel L(t) and right channel R(t) of the 3D audio.
For N such audio sources, we would need N number of the above system. The left
channel would all have to be added up, same with the right channels. The resulting
audio will have N number of audio sources in the virtual space.
2. 4. Code
Following is our MATLAB code which we used to convert a demo song provided in
popular music production software “FL Studio” to 3D audio. The song is called
“Future Bass” by Asher Postman. Individual elements from the song were extracted
from the project and exported as wav files. Each of these elements are placed in the
virtual audio space as an individual sound source.
clear;
clc;
close;
sound(spacial_sound*10, 48000);
[orig_audio,f_s_orig] = audioread(path_to_sound);
orig_audio = resample(orig_audio, 48000, f_s_orig);
end
( https://drive.google.com/file/d/1OQq-1wXZJ3IWZe_jqGP2kFlRWQeiEQJB/view?usp=sharing )