1st Review

The document outlines a project on Speech Emotion Detection using machine learning, focusing on classifying emotions from speech signals with a target accuracy of at least 90%. It details the project's objectives, proposed methodologies including traditional and deep learning techniques, and the significance of improving human-computer interaction through emotion recognition. The project aims to address challenges in real-world applications by developing a robust and accurate SER model that can handle various acoustic conditions.

Uploaded by

karmakarsourab886

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

29 views19 pages

1st Review

Uploaded by

karmakarsourab886

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 19

SRM INSTITUTE OF SCIENCE AND TECHNOLOGY

Ramapuram, Chennai – 600 089

SCHOOL OF COMPUTER SCIENCE AND ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
18CSP109L - PROJECT
18CSP111L - PROJECT for INTERNSHIP STUDENTS
BATCH NUMBER : 10

Speech emotion Detection Using Machine

Learning
Team Members Supervisor
RA2111003020006 MOHNISH SINGH NAME: Dr.Geetha TV
RA2111003020018 SOURAV KARMAKAR ASSISTANT PROFESSOR
RA2111003020029 PRANAMYA PATRIKAR
Department of Computer Science and
IV YEAR, CSE Engineering
SRMIST, RAMAPURAM CAMPUS SRMIST, RAMAPURAM

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

Agenda
• Abstract
• Scope and Motivation
• Introduction
• Literature Survey ( Table format-year should be in Chronological order)
• Objectives
• Problem Statement
• Proposed Work
• Architecture Diagram/Flow Diagram/Block Diagram
• Novel idea
• Modules
• Module Description
• Software & Hardware Requirements
• References (Base paper hard copy to be submitted to the supervisor.)
• Way forward towards Outcome (Research Paper/Patent)
ABSTRACT
Emotion recognition from speech signals is an important but challenging part of
Human-Computer Interaction (HCI). Many techniques have been explored to analyze
speech and classify emotions accurately. In recent years, machine learning approaches
have gained significant attention for this task due to their ability to identify patterns in
speech features. This paper provides an overview of various machine learning methods
used for speech emotion recognition, highlighting commonly used datasets, types of
emotions detected, key contributions in the field, and existing challenges.
In our work, we will implement traditional machine learning techniques such as K-
Nearest Neighbors (KNN) and Support Vector Machine (SVM), which are well-known
for their effectiveness in classification tasks. Additionally, we will explore deep
learning-based methods to enhance recognition accuracy by capturing complex speech
features. By comparing these approaches, we aim to analyze their performance and
determine the most efficient method for speech emotion recognition.

:
Scope and Motivation
• Our project aims to develop a Speech Emotion Recognition (SER) system that can
accurately classify emotions like anger, happiness, sadness, fear, surprise, and
disgust from speech using the RAVDESS dataset. we aim to achieve at least 90%
accuracy.
• the project sets the foundation for integrating speech-based emotion recognition
with other modalities, such as facial expressions, gestures, and physiological
signals, to enhance accuracy.
• Understanding emotions from speech can significantly enhance AI-human
interactions by making systems more empathetic and responsive. This technology
has applications in mental health diagnostics, stress detection, emotion-based
therapy, and user experience enhancement. By developing a high-accuracy SER
model, we can contribute to more natural human-computer communication,
improving AI assistants and supporting emotional well-being in various real-world
applications.
Introduction
The primary objective of Speech Emotion Recognition (SER) is to enhance human-to-
machine interaction by enabling systems to understand and respond to human emotions
effectively. While significant progress has been made in the field of Speech
Recognition (SR) over the years, there is still a need for improved emotion detection
capabilities to make interactions more natural and intuitive. Developing a reliable SER
system can lead to advancements in virtual assistants, customer support, healthcare,
and AI-driven applications, making them more empathetic and responsive. This project
focuses on classifying emotions from speech using the RAVDESS dataset, leveraging
feature extraction techniques like MFCC, Chroma, and Mel Spectrogram, and training
models such as CNN and LSTM to achieve at least 90% accuracy. With potential
applications in mental health monitoring, emotion-based AI systems, and real-time
human-computer interaction, this research aims to bridge the gap between machines
and human emotions, improving the overall user experience.
Literature Survey
S.No. Title of the Year Journal/ Inferences
Paper Conference
Name
1 IEEE 2009 Zeng, Z., Pantic, IEEE Access A comprehensive
Transactions on M., Roisman, G. survey on
Pattern Analysis I., & Huang, T. S methods for
and Machine emotion
recognition using
Intelligence audio, video, and
multimodal
approaches.

2 Speech emotion 2011 El Ayadi, M., Proceedings of A foundational

recognition using Kamel, M. S., & the IEEE paper that
hidden Markov Karray, F. International investigates the
models. Conference on use of hidden
System Sciences Markov models
(HICSS) (HMMs) for
speech emotion
recognition.
Literature Survey
S.no Title of the paper Year Author Journal/ Inference
Conference

3 Speech emotion 2020 Hossain, S., & 2015 IEEE This paper
recognition methods, Jassim, H Symposium on reviews different
datasets, and applications Computational methods for
Intelligence and speech emotion
recognition,
Data Mining applications in
(CIDM) real-world
systems.

4 The first detection of 2022 Batliner, A., et Proceedings of Discusses

emotion in speech using al the 15th emotion
a large database International detection in
Conference on speech with a
Artificial
Intelligence (ICAI) large, labeled
speech database
for training
emotion
classifiers.
Literature Survey
S.no Title of the paper Year Author Journal/ Inference
Conference

5 Speech Emotion 2023 Md. Imran Multimedia Tools This study

Recognition and Hossain, Md. and Applications addresses the
Classification Using Mojahidul challenges in
Hybrid Deep CNN and Islam, Tania accurately
LSTM Models Nahrin, Md. detecting
Rashed, Md. emotions .
Atiqur

6 Speech Emotion 2023 Yan Li, Yapeng EURASIP Journal Discusses

Recognition Based on Wang, Xu Yang, on Audio, Speech, emotion
Graph-LSTM Neural and Sio-Kei Im and Music detection in
Network Processing speech with a
large, labeled
speech database
for training
emotion
classifiers.
Literature Survey
S.no Title of the paper Year Author Journal/ Inference
Conference

7 Evaluating Raw 2023 Md. Imran arXiv preprint study

Waveforms with Deep Hossain,Md. investigates the
Learning Frameworks for Atiqur feasibility of
Speech Emotion feeding raw
Recognition audio
waveforms.

8 Capturing Spectral and 2023 Md. Maksudul arXiv preprint The authors
Long-Term Contextual Haque, Samiul propose an
Information for Speech Islam, and Abu ensemble model
Emotion Recognition Jobayer Md. combining (GCN)
Using Deep Learning Sadat for textual data
Techniques and the HuBERT
transformer for
audio signals to
address
limitations in
traditional SER
approaches.
Objectives
The main objective of this project is to develop a Speech Emotion Recognition (SER)
system that can accurately classify all the emotions present in the RAVDESS dataset
using machine learning and deep learning models. Our goal is to achieve at least 90%
overall accuracy or higher, ensuring high precision and reliability in emotion
classification. We will leverage advanced feature extraction techniques such as MFCC,
Chroma Features, and Mel Spectrogram to capture essential speech characteristics. By
using traditional ML models along with deep learning architectures like CNN and
LSTM, we aim to build a highly effective system that can accurately recognize
emotions, improving AI applications in customer service, mental health monitoring,
virtual assistants, and human-computer interaction. The final model will be evaluated
rigorously using accuracy, precision, recall, and F1-score to ensure its effectiveness in
real-world applications. Additionally, this project will explore potential deployment
options, making it accessible for various industries that rely on emotional analysis from
speech.
Problem Statement
• Speech Emotion Recognition (SER) systems often struggle to work effectively
in real-world situations. Factors like background noise, different recording
conditions, and variations in speaker accents and speaking styles make it
difficult for these systems to generalize well. Many existing models perform
well in controlled environments but fail when applied to real-life scenarios,
such as customer service calls, in-car voice assistants, or mental health
monitoring apps. This lack of reliability limits their practical use.
• To address this, our project aims to develop a more robust and accurate SER
model that can handle noisy environments and speaker variations while
maintaining high accuracy. By leveraging deep learning techniques and
extracting meaningful speech features, we aim to build a system that can
accurately detect emotions even in challenging acoustic conditions.
Proposed Work-Block Diagram
Proposed-Novel Idea
Our Novel Idea is to improve the speech emotion recognition by
• Combining Multiple Feature Extraction Techniques – Instead of relying solely on
MFCC, we will integrate Chroma Features and Mel Spectrogram to capture a more
comprehensive representation of speech emotions.
• Hybrid Deep Learning Model – We propose a CNN + LSTM architecture to
leverage CNN’s ability to extract spatial features from spectrograms and LSTM’s
strength in capturing temporal dependencies in speech signals. This hybrid model
aims to achieve higher accuracy (>90%) compared to traditional machine learning
models.
• Robust Noise-Resistant Training – We will introduce data augmentation techniques
(such as adding background noise, pitch shifting, and speed variations) to make the
model more resilient to real-world environments.
Proposed-Modules
The key modules are
1. Data Collection & Preprocessing Module
2. Feature Extraction Module
3. Model Training & Classification Module
4. Emotion Detection & Prediction Module
5. Deployment & Application Module
MODULES DESCRIPTION
• Data Collection & Preprocessing Module
The dataset is loaded and split into 90% training and 10% testing sets. Audio files are processed, and extracted
features are converted into NumPy arrays. StandardScaler is applied to normalize the feature values, ensuring
consistency in the model's learning process.
• Feature Extraction Module
Audio signals are transformed into meaningful features such as MFCC (Mel-Frequency Cepstral Coefficients), Spectrograms, and Chroma features. These
features capture frequency, pitch, and intensity variations, helping distinguish different emotions. The extracted features are structured for deep learning
model inputs.
• Model Training & Classification Module
A CNN-based deep learning model is built to classify emotions using extracted features. It consists of convolutional layers, pooling layers, batch
normalization, fully connected layers, and a softmax activation function. The model is trained for 50 epochs using the Adam optimizer and evaluated using
accuracy, confusion matrix, and classification reports.
Emotion Detection & Prediction Module
Emotion Detection in speech involves analyzing audio signals to classify emotions such as happiness, sadness, anger, or neutrality. This is achieved by
passing extracted features (MFCCs, spectrograms, or chroma features) through a deep learning model, typically a Convolutional Neural Network (CNN). The
model learns patterns in speech features and maps them to corresponding emotional states. After training, the model predicts emotions from new speech
inputs with a certain accuracy
Deployment & Application Module
The trained model is used to make predictions on the test dataset, determining the emotions in speech samples based on learned patterns. To evaluate its
performance, various metrics such as accuracy, classification report, and confusion matrix are calculated, providing insights into how well the model
distinguishes between different emotions.

DEPARTMENT OF COMPUTER SCIENCE AND

ENGINEERING - INTERNET OF THINGS
Software & Hardware Requirements
 Software Requirements
• Programming Language: Python (with libraries like TensorFlow, Keras)
Audio Processing Libraries: librosa
• Other libraries: Scikit-learn , Matplotlib, Seaborn, Pandas, NumPy
• DATASET:RAVDESS
 Hardware Requirements
• Processor: Intel i5 11th
• RAM: 16GB
Module 1-Outcome
• Data Collection & pre-processing is responsible for gathering audio data, cleaning
it, and preparing it for feature extraction.
• The database have successfully loaded the Audio Files from RAVDESS.
• Converted all the files to standard format of 16KHz
• Stored dataset in a structured format with labels.
• Background noise removed and Volume levels adjusted to maintain consistency.
• Removed the silenced segment form the audio and have done Normalization.
• Represented the pre-processing Part with waveform.
References
• El Ayadi, M., Kamel, M. S., & Karray, F. (2011). "Speech emotion recognition
using hidden Markov models." Speech Communication, 53(5), 720-737.
• Schuller, B., et al. (2011). "Speech emotion recognition: Two-level classification
approach." Speech Communication, 53(9), 1062-1070.
• Zeng, Z., Pantic, M., Roisman, G. I., & Huang, T. S. (2009). "A survey of affect
recognition methods: Audio, visual, and spontaneous expressions." IEEE
Transactions on Pattern Analysis and Machine Intelligence, 31(1), 39-58.
• Batliner, A., et al. (2003). "The first detection of emotion in speech using a large
database." Proceedings of the 15th International Conference on Artificial
Intelligence (ICAI), 402-407.
• Hughes, D. L., & Gish, H. (1992). "Speech recognition using speech emotion
features." Proceedings of the IEEE International Conference on Acoustics, Speech,
and Signal Processing (ICASSP), 537-540.
• Haque, M. M., Islam, S., & Sadat, A. J. M. (2023). Capturing Spectral and Long-
Term Contextual Information for Speech Emotion Recognition Using Deep
Learning Techniques. arXiv preprint.
References
• Li, Y., Wang, Y., Yang, X., & Im, S.-K. (2023). Speech Emotion Recognition Based
on Graph-LSTM Neural Network. EURASIP Journal on Audio, Speech, and Music
Processing, 2023(1), Article 18.
• Hossain, M. I., Islam, M. M., Nahrin, T., Rashed, M., & Rahman, M. A. (2024).
Speech Emotion Recognition and Classification Using Hybrid Deep CNN and
LSTM Models. International Journal of Research Publication and Reviews, 5(2),
105-113.
• Graph Neural Network-Based Speech Emotion Recognition: A Fusion of Skip
Graph Convolutional and Graph Attention Networks. Electronics, 13(3), 456-468.

Sentispeak Tone Mood Detector
No ratings yet
Sentispeak Tone Mood Detector
16 pages
Sample Poster Template CSE
No ratings yet
Sample Poster Template CSE
1 page
Speech Emotion Recognition with Deep Learning
No ratings yet
Speech Emotion Recognition with Deep Learning
10 pages
First Review PPT Template
No ratings yet
First Review PPT Template
14 pages
Speech Emotion Recognition Guide
No ratings yet
Speech Emotion Recognition Guide
14 pages
SER (Research Paper)
No ratings yet
SER (Research Paper)
5 pages
Review 3 PPT Final1)
No ratings yet
Review 3 PPT Final1)
51 pages
Speech Emotion Journal Phase 2-3
No ratings yet
Speech Emotion Journal Phase 2-3
6 pages
Sentiment Emotion Recognition
No ratings yet
Sentiment Emotion Recognition
6 pages
Project - I Review-2 Report SAMPLE
No ratings yet
Project - I Review-2 Report SAMPLE
16 pages
Speech Emotion Recognition Using Deep Learning
No ratings yet
Speech Emotion Recognition Using Deep Learning
5 pages
9 - Yogendra
No ratings yet
9 - Yogendra
5 pages
DL Research Paper PDF
No ratings yet
DL Research Paper PDF
15 pages
IJRPR4210
No ratings yet
IJRPR4210
12 pages
GROUP7 Researchpaper
No ratings yet
GROUP7 Researchpaper
9 pages
Speech Emotion Recognition Model
No ratings yet
Speech Emotion Recognition Model
19 pages
Speech and Text Emotion Recognition Using Machine Learning Batch Number - 08 First Review 2.0
No ratings yet
Speech and Text Emotion Recognition Using Machine Learning Batch Number - 08 First Review 2.0
12 pages
Final Report
No ratings yet
Final Report
27 pages
Exploring The Effectiveness of Advanced Machine Learning Models in Speech Emotion Recognition
No ratings yet
Exploring The Effectiveness of Advanced Machine Learning Models in Speech Emotion Recognition
6 pages
Cross-Accent Emotion Recognition
No ratings yet
Cross-Accent Emotion Recognition
19 pages
Sensors 23 06212 v2
No ratings yet
Sensors 23 06212 v2
20 pages
EMOTIONDETECTION (1) Mini Project
No ratings yet
EMOTIONDETECTION (1) Mini Project
5 pages
1 ST
No ratings yet
1 ST
23 pages
Deep Learning Structure For Emotion Prediction Using MFCC From Native Languages
No ratings yet
Deep Learning Structure For Emotion Prediction Using MFCC From Native Languages
13 pages
CS21B1051
No ratings yet
CS21B1051
27 pages
Real-Time Speech Emotion Recognition
No ratings yet
Real-Time Speech Emotion Recognition
40 pages
Speech Emotion Recognization
No ratings yet
Speech Emotion Recognization
65 pages
Speech Emotion Recognition Using Machine Learning
No ratings yet
Speech Emotion Recognition Using Machine Learning
8 pages
Analyzing Human Emotions Report
No ratings yet
Analyzing Human Emotions Report
2 pages
SER Documentation Satwik
No ratings yet
SER Documentation Satwik
47 pages
Reality
No ratings yet
Reality
11 pages
Deep Learning in Speech Emotion Recognition
No ratings yet
Deep Learning in Speech Emotion Recognition
4 pages
XEmoAccent Embracing Diversity in Cross-Accent Emotion Recognition Using Deep Learning
No ratings yet
XEmoAccent Embracing Diversity in Cross-Accent Emotion Recognition Using Deep Learning
18 pages
Group No 37
No ratings yet
Group No 37
19 pages
Multimodal Speech Emotion Recognition and Ambiguity Resolution
No ratings yet
Multimodal Speech Emotion Recognition and Ambiguity Resolution
9 pages
Speech Emotion Recognition Guide
No ratings yet
Speech Emotion Recognition Guide
86 pages
Enhancing Emergency Response Through Speech Emotion Recognition A Machine Learning Approach
No ratings yet
Enhancing Emergency Response Through Speech Emotion Recognition A Machine Learning Approach
5 pages
Project Report - 092046
No ratings yet
Project Report - 092046
5 pages
Efficient Speech Emotion Recognition: Presented By: Samir Kumar Majhi
No ratings yet
Efficient Speech Emotion Recognition: Presented By: Samir Kumar Majhi
12 pages
A Complete Phase 3
No ratings yet
A Complete Phase 3
14 pages
Speech Emotion Recognition Review
No ratings yet
Speech Emotion Recognition Review
19 pages
Documentation Batch
No ratings yet
Documentation Batch
38 pages
References With DOI Numbers
No ratings yet
References With DOI Numbers
9 pages
Set Conference Draft Paper - 223585
No ratings yet
Set Conference Draft Paper - 223585
6 pages
Speech Emotion Recognition Using Deep Learning: Nithya Roopa S., Prabhakaran M, Betty.P
No ratings yet
Speech Emotion Recognition Using Deep Learning: Nithya Roopa S., Prabhakaran M, Betty.P
4 pages
Human Speech Emotion Recognition Using Artificial Neural Networks Technique
No ratings yet
Human Speech Emotion Recognition Using Artificial Neural Networks Technique
7 pages
Group 110 Arun Kumar Review 2 Report
No ratings yet
Group 110 Arun Kumar Review 2 Report
14 pages
Can Large Language Models Aid in Annotating Speech Emotional Data Uncovering New Frontiers Research Frontier
No ratings yet
Can Large Language Models Aid in Annotating Speech Emotional Data Uncovering New Frontiers Research Frontier
12 pages
Chethana H N REPORT
No ratings yet
Chethana H N REPORT
12 pages
SMR6!
No ratings yet
SMR6!
14 pages
Winter Semester 2021-22 CSE4020-Machine Learning Digital Assignment-1
No ratings yet
Winter Semester 2021-22 CSE4020-Machine Learning Digital Assignment-1
20 pages
Emotion Recognition Using Speech Processing
No ratings yet
Emotion Recognition Using Speech Processing
5 pages
Wa0007
No ratings yet
Wa0007
6 pages
MiniProject 5
No ratings yet
MiniProject 5
11 pages
Speech Emotion Recognition Techniques
No ratings yet
Speech Emotion Recognition Techniques
11 pages
Sample Course End Project Report
No ratings yet
Sample Course End Project Report
27 pages
Speech Databases Speech Features and Classifiers in Speech Emotion Recognition A Review
No ratings yet
Speech Databases Speech Features and Classifiers in Speech Emotion Recognition A Review
31 pages
Forcible Entry Complaint: John Snow vs. Ramsay Bolton
No ratings yet
Forcible Entry Complaint: John Snow vs. Ramsay Bolton
3 pages
Al Sobhi Transport Service Invoice
No ratings yet
Al Sobhi Transport Service Invoice
7 pages
Ponniyinkjdvcxs Selvanlqsdcvvbn
No ratings yet
Ponniyinkjdvcxs Selvanlqsdcvvbn
29 pages
File
No ratings yet
File
23 pages
Structural Design of Swimming Pool
67% (3)
Structural Design of Swimming Pool
10 pages
LearnJam - LearningDesignPrinciples
No ratings yet
LearnJam - LearningDesignPrinciples
27 pages
Microprogramming in Computer Organization
No ratings yet
Microprogramming in Computer Organization
30 pages
(Ebook PDF) Recommended Contract Practices For Underground Construction 2nd Edition Full
100% (5)
(Ebook PDF) Recommended Contract Practices For Underground Construction 2nd Edition Full
132 pages
Wound Care Instructions
No ratings yet
Wound Care Instructions
3 pages
LW3188
No ratings yet
LW3188
2 pages
Revision 1 SC 10
No ratings yet
Revision 1 SC 10
2 pages
The Ethical Lessons From The Morandi Bridge Collapse in Genoa
No ratings yet
The Ethical Lessons From The Morandi Bridge Collapse in Genoa
1 page
JEE 100 Day Blockwise Plan
No ratings yet
JEE 100 Day Blockwise Plan
7 pages
SAEJ987 V 001
No ratings yet
SAEJ987 V 001
22 pages
Corporate Governance and Shareholder Interests
No ratings yet
Corporate Governance and Shareholder Interests
6 pages
English Exam Practice Guide
No ratings yet
English Exam Practice Guide
3 pages
Financial Decision Making.2
No ratings yet
Financial Decision Making.2
15 pages
Control and Coordination Diagram Based Question
No ratings yet
Control and Coordination Diagram Based Question
6 pages
CTP Battery System 1719411583
No ratings yet
CTP Battery System 1719411583
19 pages
Unit 4 - Topics1-3 - Activities
No ratings yet
Unit 4 - Topics1-3 - Activities
7 pages
603A Cat
No ratings yet
603A Cat
1 page
JSP Directives Guide for Practitioners
No ratings yet
JSP Directives Guide for Practitioners
27 pages
Pulp and Paper 1
No ratings yet
Pulp and Paper 1
11 pages
Alan Nicholas Final Project Ug
No ratings yet
Alan Nicholas Final Project Ug
62 pages
My Little Pony Resume
100% (1)
My Little Pony Resume
6 pages
1 Moroccan J Biol 17 2020 A Khabbach Et Al 1-35
No ratings yet
1 Moroccan J Biol 17 2020 A Khabbach Et Al 1-35
35 pages
Lowry Method for Protein Estimation
No ratings yet
Lowry Method for Protein Estimation
28 pages
35 - Ground Floor - Drainage Plan
No ratings yet
35 - Ground Floor - Drainage Plan
1 page
220 KV DC Tower Details (CH No.-12.000-12.100) : Weight Span (Hot) Weight Span (Cold)
No ratings yet
220 KV DC Tower Details (CH No.-12.000-12.100) : Weight Span (Hot) Weight Span (Cold)
1 page
Wazuh With ELK Guide
100% (1)
Wazuh With ELK Guide
11 pages

1st Review

Uploaded by

1st Review

Uploaded by

SRM INSTITUTE OF SCIENCE AND TECHNOLOGY

Ramapuram, Chennai – 600 089

Speech emotion Detection Using Machine

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

2 Speech emotion 2011 El Ayadi, M., Proceedings of A foundational

4 The first detection of 2022 Batliner, A., et Proceedings of Discusses

5 Speech Emotion 2023 Md. Imran Multimedia Tools This study

6 Speech Emotion 2023 Yan Li, Yapeng EURASIP Journal Discusses

7 Evaluating Raw 2023 Md. Imran arXiv preprint study

DEPARTMENT OF COMPUTER SCIENCE AND

You might also like