Conference of the Acoustical Society of New Zealand
Investigating the Lombard effect in a speaker’s voice in
noisy virtual environments under varying room acoustics
Alyssa D’Souza (1), Yusuke Hioka (1), Malcolm Dunn (2) and James Whitlock (2)
(1) Acoustics and Vibration Research Centre, The University of Auckland, New Zealand
(2) Marshall-Day Acoustics, Auckland, New Zealand
[email protected]
ABSTRACT
When multiple people converse in an indoor environment, achieving satisfactory communication is often challenging due to high level
noise caused by poor acoustic design. Communication in noisy environments gives rise to the Lombard effect, an involuntary reflex that
causes one to raise their voice in the presence of noise. This may produce more intelligible speech for listeners, but the increase in speaker
sound level eventually contributes to the Café effect occurring in the environment. To control the tendency of noise build-up, it is of interest
to investigate the Lombard effect under different acoustic conditions; however, it can be difficult to control variables in real environments,
which may affect test reproducibility. This study investigates if the Lombard effect can be simulated by replicating the dynamic changes
in sound level of speakers in a real environment, and to what extent speakers change their voice level when immersed into the simulated
environment. The study uses spatial sound reproduction to simulate environments with varying acoustics and examines speakers’ sound
level when the build-up occurs. It will provide a novel method that allows controlled study of speakers’ behaviour in noisy environments
and provides opportunity to investigate one speaker’s effect on the overall sound level within the simulated environment. The results
observed show that subjects performed similarly in different virtual acoustic environments. Further statistical analysis will inform
development of the simulator for accurately conveying room acoustic design.
1 INTRODUCTION to quantify the Lombard effect by measuring change in
voice sound level of a speaker in different acoustic
In noisy environments, an inappropriate noise level may environments. Such studies would involve human
cause adverse effects such as discomfort and disturbance participants who sit an experiment measuring their voice
of speech communication [1]. When a speaker is level while speaking in a noisy space.
communicating in noisy space, they tend to experience the
Lombard effect, an involuntary vocal reflex that causes Conducting testing in real environments can be
the speaker to raise their voice. It was first discovered by challenging due to the inability to reliably control
variables (i.e. occupancy, external noise) within the space.
Étienne Lombard in 1909, where he measured voice levels
This can compromise the repeatability of tests, and
while speaking in noise, resulting in higher sound levels consequentially compromises the reproducibility of the
than in quiet. The raised levels were coined as Lombard collected data. Additionally, it can be logistically costly
speech [2]. and inefficient to perform tests in real environments.
The expression of the Lombard effect depends on factors The use of virtual reality (VR) offers a solution to
such as the masking noise (i.e. sound level, type, overcome these challenges, by simulating the acoustics of
frequency) [3] [4] [5], communication scenario [6] and real rooms in a controlled laboratory environment.
visual cues available to the listener [7]. The effect is
Previous attempts of validation of acoustic VR have
referred to as a communicative phenomenon [8], as
successfully shown a clear indication of production of
producing intelligible speech for others influences
speakers similarly to the ability to hear their own voice in Lombard speech within a simulated environment [11]. To
noise [9]. utilise this solution, the simulation requires further
validation from real environments as a baseline
When multiple people in the same environment produce performance as well as acoustically different
Lombard speech, the problem further complicates. The environments, both of which have not yet been explored
build-up of their voice generates the phenomenon known in previous Lombard effect simulator research.
as the Café effect, which is the vicious cycle of noise
breeding more noise [10]. The Café effect is known to be By developing an acoustic VR system that simulates noisy
more likely to occur in rooms with longer reverberation spaces with varying acoustics and immersing participants
times, which commonly occurs in rooms with poorly into the virtual spaces, this study will investigate if the
designed acoustics. To optimise acoustics of these spaces, Lombard effect can be replicated by simulating dynamic
it is beneficial to understand how people communicate in changes in sound level of speakers in a real environment.
these environments and how their voice level varies It will also provide information about the effect of
overtime. Therefore, it is of interest to acoustic engineers reverberation on the simulator and explore the extent that
2nd – 4th September 2024, Christchurch
Conference of the Acoustical Society of New Zealand
speakers change their voice level in noisy virtual
environments.
2 THE LOMBARD EFFECT SIMULATOR
Simulator design
The Lombard effect (LE) simulator was developed using
the programming platform (Cycling '74 Max/MSP).
Figure 1 shows the overall workflow of the LE simulator.
The simulator utilises built-in functions for logical
expressions and playback and uses external plugins for
numerical processing. The simulator design follows the
assumption that when there are multiple talkers
communicating in a noisy environment, the sound level at
which one talker is speaking is the same as the sound level
which every other talker is speaking at. This assumption
is applied to all talkers in a room. Each talker is said to Figure 2. Schematic of Café environment (not to scale), the
experience the same noise level, causing them to speak at star represents the microphone/listener and purple symbols
an identical sound level. The system was run at the represent talker positions
sampling rate of 48 kHz.
Figure 1. Workflow of the Lombard effect simulator,
detailing voice measurement, audio processing and
playback
Impulse response measurement
The virtual talkers are simulated from different positions
using measured impulse responses (IR) from two
acoustically different environments, Café and Foyer, as
shown in Table 1. A third order Ambisonics microphone
(Zylia ZM-1) was placed at a position near the centre of
each environment representing a static listener/speaker Figure 3. Schematic of Foyer environment (not to scale,
(the test participant). A loudspeaker was placed at six cropped), the star represents the microphone/listener and
different locations, each representing one virtual talker purple symbols represent talker positions
(speakers producing noise), in both environments to Table 1. Environments tested on the Lombard effect
measure the impulse responses from virtual talker to the simulator
listener as shown in Figure 2 and Figure 3. The locations
Café Foyer
of virtual talkers and listener were selected by observing
typical seating arrangements in both environments. RT 0.7s 2.5s
Volume 700m3 2000m3
Virtual talker functionality
The sound level of each virtual talker is adjusted via a live
gain every five seconds, with respect to the speaker’s
voice level averaged over a five second interval. The
speaker’s voice level that is sent to the virtual talkers is
2nd – 4th September 2024, Christchurch
Conference of the Acoustical Society of New Zealand
restricted between 60 – 80 dB(A) to maintain the baseline
level of the virtual talkers and prevent the system from
reaching unsafe sound levels. The change in sound level
of the virtual talkers is exponentially increased and
decreased over a two second period using the slide
function given by Equation (1).
Figure 4. The 16-channel loudspeaker array schematic [13].
𝑦(𝑛) = 𝑦(𝑛 − 1) + )*𝑥(𝑛) − 𝑦(𝑛 − 1),/𝑠𝑙𝑖𝑑𝑒3, (1)
System calibration
where x(n) and y(n) denote input and output signals, Each loudspeaker was digitally calibrated through
respectively, and the slide is a constant that determines the Max/MSP to 60 dB(A) using a calibrated omnidirectional
decay speed of the effect of current input. In the current microphone (MiniDSP UMIK-2). The calibration of
study, the value was set to 96,000, which was heuristically virtual talkers was achieved by calculating levels using
found to produce the most natural envelope for increasing either Equations (2) or (3), and manually adjusting the
and decreasing noise. baseline gain values to these levels and ensuring there is
enough headroom to allow for a larger gain (at least 20
Virtual talker level calibration dB).
Each virtual talker’s baseline SPL is calibrated with 3 EXPERIMENTAL METHODOLOGY
respect to the distance between the listener/speaker
(microphone) and virtual talker (loudspeaker). Equation Aim
(2) below calculates the total (direct + reverberant) sound The aim of the experiment was to investigate the dynamic
pressure level: sound level changes in a participant’s voice when
% & immersed in noisy virtual environments with varying
𝐿! = 𝐿" + 10log#$ :&'( ! + )" ;, (2) acoustics. Participants also provided insight into their
perception of the virtual environments through
where r, Q and Rc denote the distance between the speaker questionnaire. The study was approved by the University
and virtual talker, amplification factor and room constant, of Auckland Human Participants Ethics Committee
respectively. Lp and Lw denote sound pressure level and (Reference Number UAHPEC27218).
sound power level, respectively. Equation (2) is used
when the amplification factor equalled 2 unless the Participants
measured IR is directed towards a wall (where Q is set to
Twelve female and four male participants (Mean age =
4). When the distance r is much greater than the
25.5 years old, SD = 3.2 years old) participated in the
reverberation radius, Equation (3) is used instead, which
experiment. All were native English speakers above the
calculates the reverberant sound pressure level alone:
age of 18 years old and self-reported no known hearing
𝐿! = 𝐿" + 10log#$ 𝛵 − 10log#$ 𝑉 + 14, (3) impairment. Participants received a gift voucher worth
NZD 20 for their participation.
where T and V denote the reverberation time (T60) and
Stimuli
room volume, respectively.
The virtual talker noise delivered to participants was the
All virtual talkers start at a speech level of 60 dB(A)
L1 English QNA Set of the ALLSSTAR Corpus from
(LS,A,1m) at their position, in accordance with the speech
SpeechBox2. Babble noise of 5 – 10 people was recorded
level of normal vocal effort specified in ISO 9921 [12].
through MATLAB at the Marshall-Day Acoustics
Sound Reproduction System Auckland office (T60 = 0.5 s, V = 200 m3) with an
omnidirectional microphone (MiniDSP UMIK-2) at a 2 m
The simulator uses third-order Ambisonics and is decoded distance from the group talking. Periods of silence were
onto a 16-channel loudspeaker array configured as shown removed from the audio recordings in a digital audio
in Figure 4 using SPARTA Suite1. The loudspeaker array workstation (Cockos REAPER). This was used as static
was installed in the listening room at the University of noise. All virtual talker noise recordings were normalised.
Auckland Acoustics Laboratory (T60 = 0.3 s).
Procedure
Participants completed a demographics questionnaire and
were seated at the centre of the 16-channel loudspeaker
array at a height of 1.51 m aligned with their ear level and
fitted with a headset microphone (Countryman E6 Earset).
1 2
Plugin information can be found at Speech data can be found at
https://leomccormack.github.io/sparta- https://speechbox.linguistics.northwestern.edu/#!/?goto=
site/docs/plugins/sparta-suite/ allsstar
2nd – 4th September 2024, Christchurch
Conference of the Acoustical Society of New Zealand
They were told at the start of the experiment to “talk to using a one-way ANOVA. The collected data regarding
[the researcher] as though you want [the researcher] to the overall room noise levels and measured voice levels
understand what you are saying” [14]. Participants were suggest the following preliminary findings:
made aware that the researcher would not participate in
the conversation and were instructed to maintain eye ● The difference in the measured LAmax between
contact with the researcher. environments of 2 dB(A) is not significant (p =
0.179) (Figure 5).
The chosen task had to allow participants the opportunity
to produce spontaneous speech as this is most likely ● The difference in the measured LAeq between
speech produced in noisy environments. Therefore, the environments of 1.4 dB(A) is not significant (p =
task used for this experiment was to speak about any 0.31) (Figure 6).
topic(s) of their choice for three minutes. Participants
were offered the option of answering a set of questions
relating to personal likes and dislikes and retelling past
events. Participants spoke to the researcher who was
standing at 1.5 m in front of them. The researcher did not
show emotion to the participant’s conversational material
but provided the participant with nodding at the end of
each sentence. The participant was instructed to stop
speaking at the three-minute mark. The recording
procedure was repeated twice for each acoustic
environment tested (Table 1).
The noise level of the simulated rooms was recorded
alongside the participant’s voice level (calibrated to SPL Figure 5. Maximum measured speech level of both rooms
at 1 m). Participants were instructed to begin speaking
whenever they felt comfortable. The simulated room noise
level when participants were first introduced to the room
was set to ~50 – 55 dB(A). This did not change for ~30
seconds to allow the participant’s voice to settle into the
virtual environment. The adjustable virtual talkers were
then introduced into the system, in which the participant’s
voice would control their sound level over the three-
minute recording period.
4 RESULTS AND DISCUSSION
A linear mixed effect model with a two-way interaction
between noise and room type was used to analyse the Figure 6. A-weighted equivalent continuous sound level of
results. The participant ID was added to the model as a both rooms
random effect. A significant two-way interaction from the
model analysis using the likelihood ratio comparison was
found (χ2(1) = 15.3, p < 0.001).
Figure 5 shows the distribution of the maximum
measured speech level (LAmax) of participants in each
simulated environment. It shows participants’ maximum
speech level distributed wider in Foyer than in Café but
other than that the distributions look similar. Similar trend
can be seen in Figure 6 which shows the distribution of
the A-weighted equivalent continuous sound level (LAeq)
of participants in each simulated environment. It shows
that the overall noise level of the Foyer was more widely Figure 7. Scatter plot of all participant speech levels
distributed. Figure 7 shows the linear regression fit which relative to virtual talker noise level experienced in both
estimates a participant’s voice level based on the noise rooms.
level of the virtual talkers in each room.
5 CONCLUSION
This study hypothesised that speech and room noise levels
in the Foyer would be greater than those in the Café This study investigated whether the Lombard effect can
because longer reverberation time is known to contribute be replicated by creating a simulation of the dynamic
to degrading speech intelligibility [10]. To test the changes in sound level of speakers in real environments.
hypothesis, post-hoc analysis was conducted in RStudio The results of this study show evidence of the Lombard
2nd – 4th September 2024, Christchurch
Conference of the Acoustical Society of New Zealand
effect in both virtual environments; however, further Speech, and Signal Processing. Proceedings. ICASSP99 (Cat.
statistical analysis is required to compare the real room No. 99CH36258) (Vol. 4, pp. 2083-2086). IEEE.
[9] Lane, H. L., Catania, A. C., & Stevens, S. S. (1961). Voice
data against the virtual results to validate whether the
level: Autophonic scale, perceived loudness, and effects of
Lombard effect of the real room has been replicated. The sidetone. The Journal of the Acoustical Society of
subjective perception of the participant’s experience in the America, 33(2), 160-167.
virtual environments will also be analysed. It was [10] Whitlock, J., & Dodd, G. (2006). Classroom acoustics—
hypothesised that the room with a higher reverberation controlling the cafe effect… is the Lombard effect the key.
Proceedings of ACOUSTICS, Christchurch, New Zealand, 20-
time would cause participants to exert more vocal effort
22.
to speak in that virtual environment, and therefore result [11] Hindmarsh, L., Wilson, M., Hioka, Y., Whitlock, J., &
in a higher Lombard slope and LAeq. Contrary to the Dunn, M. (2021, June). Using VR-Audio to Predict a Talker’s
hypothesis, the results show that participants performed Voice Level in Noisy Rooms. In Acoustical Society of New
similarly in both virtual environments that were recreated Zealand Conference
[12] ISO 9921. Ergonomics – Assessment of speech
by the Lombard effect simulator.
communication. Geneva, 2003.
[13] Au, E., Xiao, S., Hui, C. J., Hioka, Y., Masuda, H., &
This study focused only on a single participant’s Lombard Watson, C. I. (2021). Speech intelligibility in noise with varying
speech and effect on noise in a virtual environment. To spatial acoustics under Ambisonics-based sound reproduction
further immerse participants into the simulation, visual system. Applied Acoustics, 174, 107707.
VR could be introduced. Additionally, another participant [14] Bottalico, P., Passione, I. I., Graetzer, S., & Hunter, E. J.
could be added to the simulator with their voice level (2017). Evaluation of the starting point of the Lombard
measured alongside the other participant. This would effect. Acta Acustica United With Acustica, 103(1), 169-172.
provide insight on the communicative aspect of the
Lombard effect.
6 ACKNOWLEDGMENTS
The authors would like to thank Marshall-Day Acoustics
for supporting this research. We would like to thank the
owners of Kings Garden Café Henderson in Auckland for
allowing us to take measurements in their space. We
would also like to thank Dr Justine Hui and Clara Zhang
for their assistance with analysing the collected data.
Lastly, we thank Gian Schmid for his technical expertise
needed to conduct the testing for this research.
7 REFERENCES
[1] Pasquale Bottalico; Lombard effect, ambient noise, and
willingness to spend time and money in a restaurant. J.
Acoust. Soc. Am. 1 September 2018; 144 (3): EL209–
EL214.
[2] Lombard, E. (1911). Le signe de l’elevation de la voix. ANN.
MAL. OREIL. LARYNX, 37, 101- 199.
[3] Wakao, A., Takeda, K., & Itakura, F. (1996, October).
Variability of Lombard effects under different noise conditions.
In Proceeding of Fourth International Conference on Spoken
Language Processing. ICSLP'96 (Vol. 4, pp. 2009-2012).
IEEE.
[4] Whitlock, J. (2012). Understanding the Lombard Effect. New
Zealand Acoustics, 25, 14-16.
[5] Stowe, L. M., & Golob, E. J. (2013). Evidence that the
Lombard effect is frequency-specific in humans. The
Journal of the Acoustical Society of America, 134(1), 640-
647.
[6] Lane, H., & Tranel, B. (1971). The Lombard sign and the
role of hearing in speech. Journal of speech and hearing
research, 14(4), 677-709.
[7] Fitzpatrick, M., Kim, J., & Davis, C. (2015). The effect of
seeing the interlocutor on auditory and visual speech production
in noise. Speech Communication, 74, 37-51.
[8] Junqua, J. C., Fincke, S., & Field, K. (1999, March). The
Lombard effect: A reflex to better communicate with others in
noise. In 1999 IEEE International Conference on Acoustics,
2nd – 4th September 2024, Christchurch