0% found this document useful (0 votes)
57 views

Prof. Nitesh Guinde Associate Professor, Electronics & Telecommunication Department Goa College of Engineering

This document summarizes a student project to generate 3D audio for headphones using head-related impulse responses (HRIRs). The students developed a MATLAB code that performs time-domain convolution of monophonic audio signals with left and right HRIR data to place virtual sound sources in 3D space. While initial listening tests were somewhat convincing, subjects had difficulty discerning front vs. back sound positioning. The goal is to improve the system by implementing it in Verilog and shifting processing to the frequency domain for increased efficiency. Analysis of HRIR data plots the effect of sound source distance and positioning on left and right impulse responses.

Uploaded by

Yeslin Sequeira
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
57 views

Prof. Nitesh Guinde Associate Professor, Electronics & Telecommunication Department Goa College of Engineering

This document summarizes a student project to generate 3D audio for headphones using head-related impulse responses (HRIRs). The students developed a MATLAB code that performs time-domain convolution of monophonic audio signals with left and right HRIR data to place virtual sound sources in 3D space. While initial listening tests were somewhat convincing, subjects had difficulty discerning front vs. back sound positioning. The goal is to improve the system by implementing it in Verilog and shifting processing to the frequency domain for increased efficiency. Analysis of HRIR data plots the effect of sound source distance and positioning on left and right impulse responses.

Uploaded by

Yeslin Sequeira
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

GOA COLLEGE OF ENGINEERING

FARMAGUDI, GOA
DEPARTMENT OF ELECTRONICS & TELECOMMUNICATION
ENGINEERING
2021 - 2022

GENERATING 3D AUDIO FOR HEADPHONES

by

Yeslin Sequeira (P.R.No.:201807745)


Saideep Pradeep Naik (P.R.No.:201807328)

A project submitted
in partial fulfilment of the requirements
for the degree of
Bachelor of Engineering
in
Electronics & Telecommunication Engineering
GOA UNIVERSITY

under the guidance of

Prof. Nitesh Guinde


Associate Professor,
Electronics & Telecommunication Department
Goa College of Engineering
CERTIFICATE
This is to certify that the project entitled

GENERATING 3D AUDIO FOR HEADPHONES

submitted by

Yeslin Sequeira (P.R.No.:201807745)


Saideep Pradeep Naik (P.R.No.:201807328)

has been successfully completed in the academic year 2019-2020 as a partial


fulfilment of the requirement for the degree of BACHELOR OF ENGINEERING in
Electronics & Telecommunication Department, at Goa College of Engineering,
Farmagudi.

———————————– ———————————
Internal Examiner External Examiner
(Guide Name)

———————————–
Head of Department,
Dr. H. G. Virani,
Professor, ETC Dept.

Place: Farmagudi, Ponda, Goa


Date:
PROJECT APPROVAL SHEET

The project entitled


GENERATING 3D AUDIO FOR HEADPHONES

by
Yeslin Sequeira (P.R.No.:201807745)
Saideep Pradeep Naik (P.R.No.:201807328)
completed in the year 2019-2020 is approved as a partial fulfilment of the
requirements for the degree of BACHELOR OF ENGINEERING in Electronics &
Telecommunication Engineering and is a record of bona-fide work carried out
successfully under our guidance.

———————————–
Project Guide,
Dr. Nitesh Guinde
Associate Professor,
ETC Dept.

———————————– ———————————–
Head of Department Principal
Dr. H. G. Virani Dr. R.B. Lohani.
Professor, ETC Dept. Goa College of Engineering

Place: Farmagudi, Ponda, Goa


Date:
Declaration

We declare that the project work entitled GENERATING 3D AUDIO FOR HEADPHONES
submitted to Goa College of Engineering, in partial fulfilment of the requirement for
the award of the degree of B.E. in Electronics and Telecommunication Engineering is
a record of bona-fide project work carried out by me/us under the guidance of Prof.
Sangam Borkar. We further declare that the work reported in this project has not
been submitted and will not be submitted, either in part or in full, for the award of
any other degree or diploma in this institute or any other institute or university.

Signature of candidate/s Name of


the Candidate

Date
Acknowledgement

The success of our work is incomplete unless we mention the names of our
respected teachers who made it possible, whose guidance and encouragement
served to beacon light and crowned our efforts with success.

It is our privilege to express our sincerest regards to our project guide, Prof. Nitesh
Guinde, Associate Professor, Goa College of Engineering for the valuable inputs, able
guidance, encouragement, whole-hearted cooperation and constructive criticism
throughout the duration of project. We deeply extend our sincere thanks to our
Principal, Goa College of Engineering Dr. R.B. Lohani and Dr. H.G. Virani, Professor,
Head, ETC Department, for encouraging and supporting us.

We are thankful to the teaching and non-teaching staff of the Electronics and
Telecommunication Department for their generous help directly or indirectly. We
take this opportunity to thank all our teachers who have directly or indirectly helped
our project. We pay our respects and love to our parents and all other family
members and friends for their love and encouragement throughout our career.
CONTENTS:

1. Chapter 1
1.1. Abstract
1.2. Introduction
1.3. Problem Statement
1.4. Objective
1.5. Literature Survey

2. Chapter 2
2.1. HRIR
2.2. Implementation of the project
2.3. Block Diagram
2.4. Code
1. 1. Abstract

In this report we will introduce concepts related to 3D audio. We will detail our signal
processing system. HRIRs will be analyzed. Practically measured HRIRs will be used in
our proof-of-concept MATLAB code to generate 3D audio for headphones.

1. 2. Introduction

Binaural audio is audio recorded from microphones inserted into human ears, or
lifelike dummies. This allows the audio to be affected by the outer ear, which gives
the human brain important information for discerning the direction of various sound
sources. When listening through headphones, the sound feels like it’s coming from
the real world. The aim of 3D audio is to produce audio similar to binaural audio
through digital signal processing.

We use an HRIR, which is a dataset that contains impulse responses measured from a
head-and-torso simulator (Bruel & Kjær Type 4128C), downloaded from
http://medi.uni-oldenburg.de/hrir/index.html to implement our proof-of-concept
MATLAB code. Note that we use HRIR interchangeably here to refer to the whole
dataset or an individual impulse response.

The previously mentioned simulator is a dummy with a human-like head with


realistic, rubbery pinna. The measured impulse responses are meant to capture what
the outer ear does to the sound signal’s spectrum, so that the effect can be
reproduced on any desired signal through convolution of 2 HRIRs (one of the left
ears, one of the right ears) with a monophonic audio signal. That is the working
principle of 3D audio.

The work we have done so far is creating a MATLAB program for generating 3D
audio. The audio gets placed as desired in the virtual space when heard though
headphones. When a subject is told of the position of the sound, the effect is
convincing. However, unfortunately, in our blind tests, subjects had difficulty
discerning front and back positioning, therefore they couldn’t always accurately
predict the sound position. We used time domain convolution in the code.
1. 3. Problem Statement

Our end goal is to build a system in Verilog that generates 3D audio from an, as of
now undecided, N number of channels. Each channel will take monophonic audio
and position of sound as the input. An HRIR will have to be fed to the system. The
system will need mechanisms for interacting with the HRIR dataset and retrieving
requested impulse responses. The system will be completely real-time. The position
of sounds in the virtual audio space should be adjustable in real-time. Our current
approach for generating 3D audio is convolution in time domain, but we are looking
to shift to frequency domain as we suspect it will be more efficient, i.e., require less
calculations per sample generated.

1. 4. Objective

Our aim is to make a system to generate 3D audio. This type of audio is experientially
better for movies, tv and videogames in our opinion. There is a sense of immersion
and enjoyment one can feel when the audio matches the positioning of characters
and objects seen on screen. There already are 3D audio generation systems available
commercially in the market, such as Tempest 3D Audiotech in Sony’s PS5, Dolby
Atmos in various TVs, headphones, home theater systems, Xbox consoles, etc. The
aim of this project is just a learning experience for us and not meant to make any
advancement in the field.

1. 5. Literature Survey

Yang, Dayu, "3D Sound Synthesis using the Head Related Transfer Function. "
This project is very similar to ours. The author has performed the DSP processing in
the frequency domain, whereas we are performing it in time domain, as it is simpler
for us given our knowledge as of now. We are looking into implementing our system
in frequency domain, like the author, for efficiency.
2. 1. HRIR

HRIR stands for Head Related Impulse Response. It is a dataset containing left and
right ear impulse responses for various azimuth angle, elevation angle and distance.
As we said earlier, we will use it interchangeably to refer to the previously mentioned
or the individual impulse responses.

The following figure illustrates azimuth and elevation angles. 0 degrees azimuth is
directly in front of one’s face. Positive angles between 0 to 180 degrees all lie to the
right of the head, negative angles between 0 to -180 degrees lie to the left.

Consider a plane that is at ear level, parallel to the ground. An object on that plane
will have a 0 degrees elevation. Objects above that plane will have positive elevation
angle, objects below will have a negative elevation angle.

We are now going to observe a few HRIR plots. Below are a few plots of HRIRs for the
left and right ears for a few sound positions, captured from MATLAB.
We can see that for 80 cm distance, the impulse responses are louder for both the
left and right ear compared to 300 cm. The impulse responses are louder on the left
side if the object is on the left, vice versa when the object is on the right.

Above is a zoomed version of the first graph (dist = 300 cm, elev = 10 deg, azim = 45
deg). It shows the slight delay between the impulse responses. This happened due to
the sound reaching the right ear before the left ear, as the sound source is closer to
the right ear. The brain needs this slight delay to be able to tell the position of an
object.

To use HRIR to get 3D audio with a single sound source positioned at some (azimuth,
elevation, distance), we query the dataset for the impulse responses of both left and
right ear with said parameters. We convolute given monophonic audio with left and
right ear impulse response, giving us the left and right channels.

Listening to the audio through headphones should make us feel like the sound is
coming from said position. The effect may not always be successful in conveying the
positions accurately in our blind tests, there is difficulty in discerning front and back
positioning.

We have read in a paper (DOI: 10.1109/ICASSP39728.2021.9414448) which says that


this is a common occurrence in 3D audio where the HRIR is a general one and not
tuned to the specific person. This downside is unfortunately expected. The effect can
be improved by personalized tuning of the HRIR, as each person has a unique HRIR.
The only way we are aware of is measurement in labs using expensive, intricate
setups, which is not feasible for our project. Hence, we did not experiment with
personalized HRIRs as of now.
2. 2. Implementation of Code

We have written code in MATLAB that generates 3D audio from some N number of
monophonic audio channels. The ‘N’ can be decided by the user. Each channel needs
specification of location of sound position in azimuth angle, elevation, distance. The
dataset we used allows only specific combinations of the 3 values to be used, which
is mentioned in the dataset’s documentation. The channel audio is specified by some
N audio files (mp3, wav, etc.), which must be provided by the user.

2. 3. Block Diagram

The above block diagram describes a system that generates 3D audio with only a
single audio source in the virtual audio space. f(t) is a monophonic audio signal. The
sound position is to be specified in (azimuth angle, elevation angle, distance). The
convention for describing these parameters is made clear with an illustration in the
HRIR section of this report.
There are 2 blocks in the system responsible for retrieving the left ear and right ear
HRIR. The sound position is used to query the dataset; the appropriate impulse
responses are retrieved from the dataset, i.e., hL(t), hR(t). These impulse responses
are convoluted in the 2 convolution blocks with f(t). The output of this system is the
left channel L(t) and right channel R(t) of the 3D audio.

For N such audio sources, we would need N number of the above system. The left
channel would all have to be added up, same with the right channels. The resulting
audio will have N number of audio sources in the virtual space.
2. 4. Code

Following is our MATLAB code which we used to convert a demo song provided in
popular music production software “FL Studio” to 3D audio. The song is called
“Future Bass” by Asher Postman. Individual elements from the song were extracted
from the project and exported as wav files. Each of these elements are placed in the
virtual audio space as an individual sound source.

clear;
clc;
close;

object_1 = getSpacialSound( 'song\lead.wav', 300, 0, 20, 40);


object_2 = getSpacialSound( 'song\chords.wav', 300, 0, -20, 40);
object_3 = getSpacialSound( 'song\drums.wav', 300, 0, -40, 40);
object_4 = getSpacialSound( 'song\hats.wav', 300, 0, -40, 40);
object_5 = getSpacialSound( 'song\hats_2.wav', 300, 0, -40, 40);
object_6 = getSpacialSound('song\saw_bass.wav', 300, 0, -180, 40);
object_7 = getSpacialSound('song\sub_bass.wav', 300, 0, 0, 40);
object_8 = getSpacialSound('song\toms.wav', 300, 0, 145, 40);
object_9 = getSpacialSound('song\toms_2.wav', 300, 0, 145, 40);
object_10 = getSpacialSound('song\vocals.wav', 300, 0, -135, 40);
object_11 = getSpacialSound('song\white_noise.wav', 300, 0, 0, 40);
object_12 = getSpacialSound('song\random_sounds.wav', 300, 0, 115, 40);
object_13 = getSpacialSound('song\sfx.wav', 300, 0, -30, 40);
object_14 = getSpacialSound('song\sfx_2.wav', 300, 0, 100, 40);
object_15 = getSpacialSound('song\track_15.wav', 300, 0, -100, 40);
object_16 = getSpacialSound('song\track_16.wav', 300, 0, 130, 40);

spacial_sound = object_1 + object_2 + object_3 + object_4 + object_5 +


object_6 + object_7 + object_8 + object_9 + object_10 + object_11 + object_12
+ object_13 + object_14 + object_15 + object_16;

sound(spacial_sound*10, 48000);

function spacial_sound = getSpacialSound(path_to_sound, dist, elev, angle,


duration)

sound_src = loadHRIR('anechoic', dist, elev, angle, 'in-ear');

[orig_audio,f_s_orig] = audioread(path_to_sound);
orig_audio = resample(orig_audio, 48000, f_s_orig);

spacial_left_chan = [conv(orig_audio(:,1), sound_src.data(:,1))];


spacial_right_chan = conv(orig_audio(:,1), sound_src.data(:,2));

spacial_sound = [[spacial_left_chan, spacial_right_chan];


zeros(sound_src.fs * duration - length(spacial_left_chan),2)];

end

You can listen to the result by scanning the following QR code:

( https://drive.google.com/file/d/1OQq-1wXZJ3IWZe_jqGP2kFlRWQeiEQJB/view?usp=sharing )

Please email [email protected] if you have trouble accessing the file.

You might also like