0% found this document useful (0 votes)

30 views38 pages

TTS SRM Speech

Uploaded by

pratik665123

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

30 views38 pages

TTS SRM Speech

Uploaded by

pratik665123

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 38

Innovations in Text to Speech Synthesis

Department of Electronics and Communication Engineering

SRM IST

1
Introduction
• Technology - converts written text into spoken language.
• Process - transforms text input, such as written words and
sentences, into audible speech.

• TTS synthesis - to provide accessibility, enhance user experiences,

and improve communication.

2
TTS-18th Century
• First instance of a machine that
could produce synthesized
speech - late 18th century.

• French inventor Joseph Faber

created the “Euphonia” - used
bellows reeds and a keyboard
to produce a range of sounds,
including synthesized speech.

• Euphonia - imitate the human

voice to a certain extent
3
TTS architecture

4
TTS block diagram

5
Typical TTS system

6
TTS - working
• Front-end - Two major tasks.
• Text normalization / Pre-processing / Tokenization - Converts raw
text containing symbols like numbers and abbreviations into the
equivalent of written-out words.
• Text-to-phoneme or grapheme-to-phoneme conversion -
Assigns phonetic transcriptions to each word, and divides and marks
the text into prosodic units, like phrases, clauses, and sentences.
• Phonetic transcriptions and prosody information together make up the
symbolic linguistic representation that is output by the front-end.

7
TTS - Working
• Back - end
• Synthesizer - Converts the symbolic linguistic representation into
sound.
• In certain systems, this part includes the computation of the target
prosody (pitch contour, phoneme durations), which is then imposed on
the output speech.

8
TTS – Key aspects
• Input Text
• Can be in the form of documents, articles, messages, or any other
written content.

• Text Analysis
• First analyzes the input text to understand its linguistic structure,
including sentence boundaries, punctuation, and pronunciation.
• This process involves breaking down the text into smaller units,
such as words, phrases, and sentences.
9
TTS – Key aspects
• Linguistic Processing
• Linguistic and phonetic rules are used to process the text.
• This involves determining the appropriate pronunciation of words,
handling punctuation and numbers, and applying intonation
patterns

KPR Institute of Engineering and

18 October 2023 10
Technology, Coimbatore, Tamil Nadu, India
TTS – Key aspects
• Voice Selection
• TTS systems typically offer a selection of voices, each with its
own unique characteristics and accents.
• Users can choose the voice that best suits their preferences and
the context in which the speech will be used.

11
TTS – Key aspects
• Prosody
• Prosody refers to the rhythm, intonation, and stress patterns in
spoken language.
• TTS systems use prosody to make the synthesized speech
sound natural and expressive.
• This includes variations in pitch, timing, and emphasis to mimic
the way humans speak.

12
TTS – Key aspects
• Speech Synthesis
• Once the linguistic and phonetic processing is complete, the
TTS system generates the audio signal that represents the
spoken text.
• This can be done using different methods, including
concatenative synthesis (reusing pre-recorded speech
fragments) or parametric synthesis (generating speech from
mathematical models).

13
TTS – Key aspects
• Output
• The synthesized speech is the final output of the TTS system.
• It can be played through speakers, headphones, or integrated
into various applications and devices.

14
TTS – Types
• Concatenative synthesis
• Based on the concatenation (stringing together) of segments of
recorded speech; Produces the most natural-sounding
synthesized speech

• Formant synthesis
• Does not use human speech samples at runtime; Instead, the
synthesized speech output is created using additive
synthesis and an acoustic model
15
TTS model – Basic components

16
Linguistic Features
• Phonetics - the study of speech sounds in their physical aspects
• Phonology - the study of speech sounds in their cognitive aspects
• Morphology - the study of the formation of words
• Syntax - the study of the formation of sentences
• Semantics - the study of meaning
• Pragmatics - the study of language use

17
Acoustic Features
Frequency:
Relates to the individual pulsations produced by vocal cord vibrations for a
unit of time. The rate of vibration depends on the length, thickness, and
tension of the vocal cords, and thus is different for child, adult male and
female speech.
A speech sound contains two types of frequencies: fundamental frequency
(F0) which relates to vocal cord function and reflects the rate of vocal cord
vibration during phonation (pitch) and formant frequency which relates to vocal
tract configuration.
18
Acoustic Features
Time: Time as a property of speech sounds reflects the duration of a given
sound.
Amplitude: The amplitude is marked by darkness of the bands: the greater
the intensity of the sound energy presents in a given time and frequency, the
darker will be the mark at the corresponding point on the screen.
Formant: A formant is a concentration of acoustic energy around a particular
frequency in the speech wave. There are several formants, each at a different
frequency, roughly one in each 1000Hz band. That is formants occur at
roughly 1000Hz intervals.
19
TTS models – Data flow

18 October 2023 21
TTS – Applications
1. Accessibility Services
• For visually impaired individuals - allowing them to access digital content,
including websites, books, and documents.
• Screen readers use TTS to read aloud the content of computer screens
to users with visual impairments.

2. Voice Assistants and Virtual Agents

• Voice-activated virtual assistants like Siri, Alexa, and Google Assistant
use TTS to provide spoken responses and communicate with users.
• Virtual customer service agents and chatbots often employ TTS for
human-like interactions.
21
TTS – Applications
3. Audiobooks and E-Books

• TTS enables the conversion of written text into audio format, making
books and documents accessible to people who prefer to listen rather
than read.

4. Language Learning

• TTS is used in language learning applications to provide pronunciation

and fluency exercises for learners

22
TTS – Applications
5. GPS and Navigation
• Navigation systems use TTS to provide turn-by-turn directions and
• location information to drivers and pedestrians.
6. Assistive Technology
• TTS aids individuals with learning disabilities, such as
dyslexia, by reading text aloud to help them understand and learn
content.

7. Multilingual Support
• TTS can be used to translate and pronounce text in various
languages, making it valuable for travellers and
international business communication.
23
TTS – Applications
8. Audio Descriptions for Video Content
• TTS can be used to add audio descriptions for visually impaired
• individuals in movies, TV shows, and online video content.
9. Read-Aloud Software
• TTS is employed in software that reads documents, emails,
and web content aloud, assisting people with reading
difficulties or those who prefer auditory input.

10. Assisting the Elderly

• TTS can help older adults with vision or cognitive impairments by
reading reminders, notifications, and messages.
24
TTS – Applications
11. Accessibility in Web and Mobile Apps
• Many websites and mobile applications integrate TTS features to ensure
accessibility for all users.

12. Entertainment and Gaming

• TTS is used in video games for character voices and narration,
enhancing the gaming experience.

13. Voiceovers for Videos and Presentations

• TTS can be used to create voiceovers for videos and presentations when
human narration is not available or cost-effective.

25
TTS – Applications
14. Communication Devices
• TTS is integrated into augmentative and alternative communication
(AAC) devices for individuals with speech disabilities.
15. Real-Time Language Translation
• TTS can assist in real-time translation apps by converting translated text
into speech for communication between speakers of different languages.

16. Interactive Educational Content

• TTS is used in educational software and e-learning platforms to provide
interactive and engaging content

26
TTS – Applications
17. Broadcasting and Podcasting
• TTS technology can generate synthetic voices for news, weather, and
other segments in broadcasting and podcasting.

18. Voice User Interfaces (VUI)

• VUIs for smart devices, home automation, and vehicles use TTS to
provide users with voice-guided interactions.

27
TTS – Limitations
• Artificial or Robotic-Sounding Speech
• Inaccurate Pronunciation
• Lack of Emotion or Expression
• Limited Language Support
• Technical Limitations
• Unnatural Pausing or Pacing
• Background Noise or Statics
• Text to Speech API
28
TTS – Emerging trends
• Advances in Neural Text-to-Speech
• Voice Cloning
• Overdubbing
• Emotional TTS
• Multilingual TTS
• Singing TTS

29
TTS – Notable innovations
• Neural TTS (NTTS): The adoption of neural networks, particularly deep learning
techniques, has revolutionized TTS synthesis. NTTS models, like WaveNet and
Tacotron, have made voices sound significantly more natural and human-like.
• End-to-End Models: End-to-end TTS models combine text analysis and voice
synthesis into a single network. This simplifies the TTS process and often results
in more coherent and expressive speech.
• Expressive TTS: Greater control over the expressiveness of generated speech.
Users can modify parameters like pitch, speed, emotion, and accents, making TTS
voices highly customizable

30
TTS – Notable innovations
• Multilingual TTS: Handle multiple languages and dialects more effectively. This
includes models that can switch between languages in the same sentence and
offer higher-quality output.
• Zero-shot Learning: Can now generate speech in languages they were not
explicitly trained on. This is achieved by training on multiple languages and
leveraging multilingual embeddings.
• Voice Cloning and Customization: Can clone a specific voice or allow users to
customize a voice based on a few recorded samples. This has opened up
possibilities for personalizing TTS experiences

KPR Institute of Engineering and

18 October 2023 31
Technology, Coimbatore, Tamil Nadu, India
TTS – Notable innovations
• Emotional TTS: Convey different emotions in their speech, making them more
suitable for applications like virtual assistants, customer service bots, and
storytelling
• Real-Time TTS: Reduces the latency between text input and speech output. This
is critical for applications like live captioning and voice assistants.

• Low-Resource Languages: TTS models for languages with limited resources,

which can be especially beneficial for preserving and promoting linguistic diversity.
• Open-Source TTS Projects: Availability of open-source TTS projects and datasets
has democratized TTS development and research

32
TTS – Notable innovations
• Adaptive TTS: Capability to adapt to a user's voice, speech patterns, or
pronunciation, resulting in more personalized and natural-sounding output
• Reduced Data Requirements: Achieve impressive results with smaller datasets,
reducing the amount of training data required.
• Eco-Friendly TTS: More environmentally friendly by optimizing their energy
consumption during training and usage.

33
TTS – Demonstration
• https://ttsmaker.com/
• https://www.ibm.com/demos/live/tts-demo/self-service/home
• https://speechify.com/voiceover/?landing_url=https%3A%2F%2F2.zoppoz.workers.dev%3A443%2Fhttps%2Fspeechify.com%2
Fblog%2Fexamples-of-text-to-speech%2F

• https://paperswithcode.com/task/text-to-speech-synthesis

34
TTS – Market Research
• The global text-to-speech (tts) market was valued at $2.8 billion in 2021
• It is projected to reach $12.5 billion by 2031, growing at a CAGR of 16.3%
from 2022 to 2031.
• The TTS Market is segmented into Industry Vertical, Offering, Deployment
Model, Type, Language and Enterprise Size.

• Major players - Acapela Group, Amazon.Com, CereProc, Google, Inc., IBM

Corporation, iFlytek, iSpeech, LumenVox LLC, Microsoft Corporation,
NextUp Technologies, Nuance Communications, Readspeaker, Sestek,

35
TTS – 11 Best Text to Speech Tools in
2023 1. Murf
2. Descript
3.Speechify
4. Listnr
5. Synthesia
6. Speechelo
7. Notevibes
8. Fliki
9. FreeTTS
10. Synthesys
11. Lovo
36
TTS – Major companies & Products
• Amazon.com, Inc. – Amazon Polly
• Microsoft Corporation – Microsoft Azure
• Google LLC - Google Cloud Text-to-Speech API
• IBM Corporation - IBM Watson
• Nuance Communications, Inc. - SIRI

37
38

Text Tool Report
No ratings yet
Text Tool Report
32 pages
History 1492 - Lesson 3 of 4: The Columbian Exchange
No ratings yet
History 1492 - Lesson 3 of 4: The Columbian Exchange
31 pages
Design and Implementation of Text to Speech audio System
No ratings yet
Design and Implementation of Text to Speech audio System
5 pages
TTSCourseSlides-History
No ratings yet
TTSCourseSlides-History
28 pages
Neural Speech Synthesis
No ratings yet
Neural Speech Synthesis
63 pages
Text To Speech Converter 25,26,27
No ratings yet
Text To Speech Converter 25,26,27
10 pages
Text_to_Speech_Indian_Languages_TTS
No ratings yet
Text_to_Speech_Indian_Languages_TTS
9 pages
Rapha Dauda One to Five_043847
No ratings yet
Rapha Dauda One to Five_043847
41 pages
Get Troll Hunting Inside the World of Online Hate and its Human Fallout Ginger Gorman free all chapters
100% (2)
Get Troll Hunting Inside the World of Online Hate and its Human Fallout Ginger Gorman free all chapters
36 pages
Week 06
No ratings yet
Week 06
33 pages
Presentation 3
No ratings yet
Presentation 3
24 pages
Low_Resource_Text_to_speech_synthesis
No ratings yet
Low_Resource_Text_to_speech_synthesis
15 pages
Emotional_Speech_Synthesis_using_End-to-End_neural_TTS_models
No ratings yet
Emotional_Speech_Synthesis_using_End-to-End_neural_TTS_models
7 pages
RAPHA DAUDA CHAPTER ONE TO FOUR_034731
No ratings yet
RAPHA DAUDA CHAPTER ONE TO FOUR_034731
40 pages
Big Six: User Guide
No ratings yet
Big Six: User Guide
56 pages
ISM_Report_Final
No ratings yet
ISM_Report_Final
33 pages
TTShindi
No ratings yet
TTShindi
83 pages
Standard Form with Large Numbers - Lesson
No ratings yet
Standard Form with Large Numbers - Lesson
48 pages
text 2 speech article summery
No ratings yet
text 2 speech article summery
2 pages
Ecology Science Olympiad cheat sheet
No ratings yet
Ecology Science Olympiad cheat sheet
2 pages
2211.09536v3
No ratings yet
2211.09536v3
8 pages
Appendix 2 - BD-PTP-GEN-PM-CP-00020-02.06 - Variation Proposal - De-Scope of Internal Floating Roofs
No ratings yet
Appendix 2 - BD-PTP-GEN-PM-CP-00020-02.06 - Variation Proposal - De-Scope of Internal Floating Roofs
2 pages
Format of Mini_Project Report
No ratings yet
Format of Mini_Project Report
32 pages
Speech Processing System
No ratings yet
Speech Processing System
20 pages
Paper 5728
No ratings yet
Paper 5728
3 pages
C2. Assessment Workbook
No ratings yet
C2. Assessment Workbook
111 pages
Concatenative Text-to-Speech Synthesis System For Communication Recognition
No ratings yet
Concatenative Text-to-Speech Synthesis System For Communication Recognition
6 pages
Report
No ratings yet
Report
38 pages
Text to Speech Seminar
No ratings yet
Text to Speech Seminar
10 pages
Text To Speech
No ratings yet
Text To Speech
21 pages
Creating Human Like Voices With AI
No ratings yet
Creating Human Like Voices With AI
8 pages
Computer Expo
No ratings yet
Computer Expo
6 pages
imp tts
No ratings yet
imp tts
4 pages
Contoh Jurnal
No ratings yet
Contoh Jurnal
9 pages
Mini Project
No ratings yet
Mini Project
19 pages
Maternity Benefit Form
No ratings yet
Maternity Benefit Form
6 pages
LSD and Its Lysergamide Cousins: David E. Nichols, Ph.D.
No ratings yet
LSD and Its Lysergamide Cousins: David E. Nichols, Ph.D.
8 pages
Text-To-Speech System For Telangana State Languages
No ratings yet
Text-To-Speech System For Telangana State Languages
6 pages
Ee 2018
No ratings yet
Ee 2018
4 pages
(BCO200027) Understanding Lien and Its Types
No ratings yet
(BCO200027) Understanding Lien and Its Types
18 pages
IJRPR4449
No ratings yet
IJRPR4449
4 pages
Recruitment Business Proposal Template
No ratings yet
Recruitment Business Proposal Template
4 pages
Chapter-3: Theory of TTS
No ratings yet
Chapter-3: Theory of TTS
26 pages
Speechsynthesis
No ratings yet
Speechsynthesis
6 pages
Project Chapter One
No ratings yet
Project Chapter One
3 pages
Cement Mixing Works Method Statement
No ratings yet
Cement Mixing Works Method Statement
4 pages
Video Transcript - Explore The Text To Speech Technology
No ratings yet
Video Transcript - Explore The Text To Speech Technology
2 pages
ECE-6323 Deck 01x PDF
No ratings yet
ECE-6323 Deck 01x PDF
51 pages
Speech Synthesis Toward A Voice For All H. Timothy Bunnell
No ratings yet
Speech Synthesis Toward A Voice For All H. Timothy Bunnell
9 pages
Nomination Form Fellow 2019
No ratings yet
Nomination Form Fellow 2019
3 pages
unit 2 sound or audio system
No ratings yet
unit 2 sound or audio system
29 pages
Analysis of The Security of WinNT
No ratings yet
Analysis of The Security of WinNT
97 pages
Text-to-Speech (TTS) System
No ratings yet
Text-to-Speech (TTS) System
11 pages
TTS Notes
No ratings yet
TTS Notes
3 pages
Scoreapi
No ratings yet
Scoreapi
5 pages
Digital Speech Processing
No ratings yet
Digital Speech Processing
7 pages
Text To Speech: A Simple Tutorial: D.Sasirekha, E.Chandra
No ratings yet
Text To Speech: A Simple Tutorial: D.Sasirekha, E.Chandra
4 pages
Text To Speech Conversion
50% (2)
Text To Speech Conversion
13 pages
Synopsis
No ratings yet
Synopsis
11 pages
Arabic Text To Speech Synthesizer
No ratings yet
Arabic Text To Speech Synthesizer
14 pages
Interactive Learning Activities
No ratings yet
Interactive Learning Activities
2 pages
EEE 6211 Digital Speech Processing: Course Instructor Dr. Mohammad Ariful Haque Professor, Dept. of EEE, BUET
No ratings yet
EEE 6211 Digital Speech Processing: Course Instructor Dr. Mohammad Ariful Haque Professor, Dept. of EEE, BUET
16 pages
TEXT - TO - SPEECH - CONVERSION - 22215a1211
No ratings yet
TEXT - TO - SPEECH - CONVERSION - 22215a1211
8 pages
Ijisr 15 139 02 PDF
No ratings yet
Ijisr 15 139 02 PDF
7 pages
Text To Speech
No ratings yet
Text To Speech
5 pages
Text To Speech
No ratings yet
Text To Speech
5 pages
Folding Regular Polygons
100% (1)
Folding Regular Polygons
4 pages
Introduction To Digital Speech Processing
No ratings yet
Introduction To Digital Speech Processing
42 pages
Text To Speech: A Simple Tutorial: D.Sasirekha, E.Chandra
No ratings yet
Text To Speech: A Simple Tutorial: D.Sasirekha, E.Chandra
4 pages
LBRP
No ratings yet
LBRP
4 pages
Design and Implementation of Text To Speech Conversion For Visually Impaired People
No ratings yet
Design and Implementation of Text To Speech Conversion For Visually Impaired People
6 pages
Report On Whatsapp
No ratings yet
Report On Whatsapp
21 pages
Speech Synthesis
No ratings yet
Speech Synthesis
8 pages
UNIT 4 - MANAGING THE CUSTOMER EXPERIENCE IN TRAVEL AND TOURISM (Dragged)
No ratings yet
UNIT 4 - MANAGING THE CUSTOMER EXPERIENCE IN TRAVEL AND TOURISM (Dragged)
4 pages
Lectura 4 - KPMG Avoiding Major Project Failure
No ratings yet
Lectura 4 - KPMG Avoiding Major Project Failure
8 pages
The Main Principles of Text-to-Speech Synthesis System: January 2010
No ratings yet
The Main Principles of Text-to-Speech Synthesis System: January 2010
8 pages
Membaca Text Bahasa Inggris Lebih Mudah Dengan Text-To Speech (TTS)
No ratings yet
Membaca Text Bahasa Inggris Lebih Mudah Dengan Text-To Speech (TTS)
15 pages
Speech Technology
No ratings yet
Speech Technology
5 pages
Voice Morphing
100% (4)
Voice Morphing
5 pages
Design and Implementation of Text To Speech Conversion For Visually Impaired People
No ratings yet
Design and Implementation of Text To Speech Conversion For Visually Impaired People
6 pages
Sa - Jis g3118
No ratings yet
Sa - Jis g3118
2 pages
2015 NCAE Examiners Handbook Revised 7-15-15
100% (3)
2015 NCAE Examiners Handbook Revised 7-15-15
18 pages
Docker and K8s Master Program - Syllabus - Mithun Technologies - 2022
No ratings yet
Docker and K8s Master Program - Syllabus - Mithun Technologies - 2022
4 pages
Speech Recognition: Fundamentals and Applications
From Everand
Speech Recognition: Fundamentals and Applications
Fouad Sabry
No ratings yet
High-Quality Text-To-Speech Synthesis: An Overview
No ratings yet
High-Quality Text-To-Speech Synthesis: An Overview
21 pages
TWSH Revised Notification
No ratings yet
TWSH Revised Notification
1 page
Principles of Communication Systems by Taub and Schilling
73% (41)
Principles of Communication Systems by Taub and Schilling
119 pages
Text - To - Speech Converter: Bachelor of Engineering IN Computer Science & Engineering
57% (7)
Text - To - Speech Converter: Bachelor of Engineering IN Computer Science & Engineering
42 pages
Hula-Ka Uluwehi O Ke Kai
No ratings yet
Hula-Ka Uluwehi O Ke Kai
3 pages
PFT Form Updated
No ratings yet
PFT Form Updated
3 pages
Synopsis
No ratings yet
Synopsis
18 pages
Ccs369-Unit 4
No ratings yet
Ccs369-Unit 4
13 pages
Gobind Daryanani Principles of Active Network PDF
No ratings yet
Gobind Daryanani Principles of Active Network PDF
255 pages
3 DUMP Truck Inspection Form
100% (1)
3 DUMP Truck Inspection Form
2 pages
Activity Sheet Week 1 TLE 10 Needle Craft Genirose R.albaladejo
100% (1)
Activity Sheet Week 1 TLE 10 Needle Craft Genirose R.albaladejo
7 pages
Hidraulic Calculation For Hotel PDF
100% (1)
Hidraulic Calculation For Hotel PDF
8 pages

TTS SRM Speech

Uploaded by

TTS SRM Speech

Uploaded by

Innovations in Text to Speech Synthesis

Department of Electronics and Communication Engineering

• TTS synthesis - to provide accessibility, enhance user experiences,

• French inventor Joseph Faber

• Euphonia - imitate the human

KPR Institute of Engineering and

2. Voice Assistants and Virtual Agents

• TTS is used in language learning applications to provide pronunciation

10. Assisting the Elderly

12. Entertainment and Gaming

13. Voiceovers for Videos and Presentations

16. Interactive Educational Content

18. Voice User Interfaces (VUI)

KPR Institute of Engineering and

• Low-Resource Languages: TTS models for languages with limited resources,

• Major players - Acapela Group, Amazon.Com, CereProc, Google, Inc., IBM

You might also like