0% found this document useful (0 votes)

10 views23 pages

EDA Mini Project Report

The report on 'Red Wine Quality Assurance' presents an Exploratory Data Analysis (EDA) of a synthetic dataset to assess factors influencing red wine quality. Key findings reveal significant correlations between physicochemical properties like alcohol content and sulphates with wine quality, while also identifying negative impacts from volatile acidity. The project aims to support wine producers in quality assurance and lays the groundwork for future predictive modeling and enhancements in wine production.

Uploaded by

Aum

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views23 pages

EDA Mini Project Report

Uploaded by

Aum

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 23

REPORT ON

“Red Wine Quality Assurance”

SUBMITTED TO THE DEPARTMENT OF ARTIFICIAL INTELLIGENCE

& DATA SCIENCE

ALL INDIA SHRI SHIVAJI MEMORIAL SOCIETY'S INSTITUTE OF

INFORMATION TECHNOLOGY, PUNE

SUBMITTED BY

Aum Patil - 2361004

Abhijeet Bagal -2361006

UNDER THR GUIDANCE OF

Ms. Madhuri B. Thorat

Department of Artificial Intelligence and Data Science

ALL INDIA SHRI SHIVAJI MEMORIAL SOCIETY'S INSTITUTE OF

INFORMATION TECHNOLOGY, PUNE

ACADEMIC YEAR: 2024-25

Department of Artificial Intelligence and Data
Science
CERTIFICATE

This is to certify that the Project Report tilted

“Red Wine Quality Assurance”

Submitted by

Name Roll No
1. Aum Patil 2361004
2. Abhijeet Bagal 2361006

is a bonafide student of this institute and the work has been carried out by them under the
supervision of Ms. Madhuri B. Thorat and is approved for the partial fulfillment of the
Department of Artificial Intelligence and Data Science, AISSMS IOIT, Pune.

Ms. Madhuri B. Thorat Dr. R. A. Jamadar

Guide Head of the Department

(Department of AI & DS) (Department of AI & DS)

INDEX

Sr. no. Topic Name Page No

1
1. Abstract

2. Acknowledgement 2

3
3. Introduction

4. Problem Statement, Objective, DataSet 4

5-7
5. Proposed System(System Workflow and Workflow Diagram)

8
6. System Requirements

7. Implementation(Results and Output) 9-17

8. Future Scope 18

9. Conclusion 18

10. References 20
ABSTRACT

The quality of red wine is a critical factor in consumer satisfaction and market
value. This project presents a comprehensive Exploratory Data Analysis (EDA)
approach to assess the factors influencing the quality of red wine. Using a
synthetically generated dataset of 5,000 records simulating real-world wine
attributes, the project analyzes key physicochemical properties such as alcohol
content, volatile acidity, citric acid, sulphates, and pH levels. Python libraries
including Pandas, Seaborn, Matplotlib, and NumPy were used to process,
visualize, and interpret the data. The EDA reveals significant correlations
between certain features—most notably, a strong positive correlation of alcohol
and sulphates with wine quality, and a negative correlation with volatile acidity.
A variety of univariate, bivariate, and multivariate plots were used to uncover
data patterns, detect outliers, and identify quality-driving parameters. The
insights gained from this analysis not only demonstrate the power of EDA in
quality assurance but also provide a foundation for predictive modeling and
further optimization in wine production.
ACKNOWLEDGEMENT

I would like to express my sincere gratitude to all those who guided and
supported me throughout the completion of this project titled “Red Wine
Quality Assurance using EDA in Python.” First and foremost, I would like to
thank my mentor and faculty for their valuable insights, encouragement, and
constructive feedback during each phase of this project. Their continuous
guidance helped me refine my understanding of data analysis and apply it
effectively in a practical setting. I am also thankful to my peers and friends for
their collaborative discussions and support, which contributed to improving the
quality of this work. A special thanks to the open-source Python community and
the developers of the libraries such as Pandas, Matplotlib, Seaborn, and
NumPy, without which this project would not have been possible. Lastly, I
would like to thank my family for their unwavering support and motivation
throughout the duration of this project. This project has been a great learning
experience and has strengthened my interest and skills in data science and
analytics.
INTRODUCTION

In today's data-driven world, quality assurance has become an essential aspect

across industries, including the food and beverage sector. Among these, wine
production is a field where subtle variations in chemical composition can
significantly impact product quality. Evaluating and maintaining the quality of
red wine is a complex task influenced by various physicochemical attributes such
as alcohol content, acidity levels, sulphates, and sugar content.

This project, titled "Red Wine Quality Assurance using Exploratory Data
Analysis (EDA) in Python," aims to analyze and understand the relationship
between different measurable features of red wine and their impact on its overall
quality rating. The dataset used in this study is synthetically generated to
resemble real-world red wine data, ensuring scalability and reliability for
analysis.

EDA plays a critical role in data science by helping analysts uncover hidden
patterns, detect anomalies, check assumptions, and build a better understanding
of the data before applying advanced models. By employing visualization and
statistical techniques, this project identifies the most influential factors affecting
wine quality and provides meaningful insights for quality control and decision-
making.

The project utilizes Python's powerful libraries like Pandas, NumPy,

Matplotlib, and Seaborn to perform comprehensive data exploration. This
analysis lays the groundwork for future predictive modeling and supports
winemakers in producing consistently high-quality wine.
PROBLEM STATEMENT, OBJECTIVE
AND
DATASET

Problem Statement
To develop a data-driven solution that analyses the physicochemical properties of
red wine to identify the most influential factors affecting its quality. To
implement an end-to-end analytical model that enables wine producers to
interpret key variables such as acidity, alcohol content, and sulphates, thereby
supporting consistent quality assurance and informed decision-making in the
wine production process.

Objectives
The primary objectives of this project are as follows:
1. To perform Exploratory Data Analysis (EDA) on red wine data to
understand distribution, variance, and relationships between features.
2. To identify the key physicochemical factors that significantly influence
the quality of red wine.
3. To visualize the data using statistical plots and correlation heatmaps for
better interpretation.
4. To uncover hidden patterns and insights that can help in quality
assurance and optimization of wine production.

Dataset
• Name: Synthetic Red Wine Quality Dataset
• Size: 5,000 records × 12 features
• File Format: CSV (red_wine_quality_large.csv)
• Data Source: Custom-generated using Python's numpy and pandas
libraries based on statistical distributions of real-world wine features.
PROPOSED SYSTEM

The proposed system is designed to analyze red wine quality based on various
physicochemical parameters using Exploratory Data Analysis (EDA). The
system aims to discover meaningful patterns and relationships in the dataset to
assist wine producers in understanding and improving wine quality.
The system workflow is divided into the following key stages:

1. Dataset Generation
• A synthetic dataset of 5,000 entries is generated using Python, simulating
real-world red wine characteristics.
• Features such as acidity, alcohol content, sugar, sulphates, and quality
score are generated using statistically relevant distributions.

2. Data Collection & Import

• The generated dataset is stored in a CSV file and imported into the system
using the Pandas library.
• Initial inspection is done to verify column structure, data types, and
sample records.

3. Data Preprocessing
• Cleaning the dataset by checking for missing/null values.
• Descriptive statistics like mean, median, standard deviation are computed.
• Data types are validated and converted if necessary for analysis.
• Outlier detection using boxplots.
4. Exploratory Data Analysis (EDA)
• Univariate Analysis: Distribution of individual features using histograms
and count plots.
• Bivariate Analysis: Pairwise relationships using scatterplots, boxplots,
and violin plots.
• Multivariate Analysis: Interaction between multiple features using pair
plots and heatmaps.
• Correlation Matrix: Heatmap to highlight relationships between features
and wine quality.

5. Insight Extraction
• Patterns such as “higher alcohol content correlates positively with
quality” are derived.
• Features negatively impacting quality like high volatile acidity are
identified.
• Recommendations for improving wine quality are formed based on
findings.

6. Conclusion and Future Scope

• Final interpretation of the analysis to conclude which features
significantly affect quality.
• Scope for using the EDA results in future predictive modeling and quality
forecasting.
SYSTEM WORKFLOW DIAGRAM

Fig 1 : System Workflow Diagram

SYSTEM REQUIRMENTS

To successfully implement the Red Wine Quality Assurance EDA project using
Python, a system with moderate specifications is sufficient.

Hardware Requirements
• A computer with Intel i3 processor or AMD Ryzen (dual-core or
higher).
• Minimum 4 GB RAM (Recommended: 8 GB for better performance).
• At least 500 MB of free disk space for storing project files and the
dataset.
• Display of 13 inches or more with a resolution of 1366×768 or higher.
• Compatible with Windows 10, macOS, or Linux operating systems.

Software Requirements
• Python version 3.8 or higher.
• Jupyter Notebook, or an IDE like VS Code or PyCharm for writing and
running the code.
• Pandas library for data manipulation.
• NumPy library for numerical operations.
• Matplotlib for basic plotting and visualization.
• Seaborn for statistical visualizations like boxplots, heatmaps, etc.
• CSV file named red_wine_quality_large.csv for input data.

Optional Tools
• Google Colab for running the project in the cloud without installation.
• Streamlit for converting the EDA into an interactive dashboard (for
future scope).
• Git / GitHub for version control and sharing the project online.
IMPLEMENTATION

# SOURCE CODE AND OUTPUT

FUTURE SCOPE

The current project effectively demonstrates how Exploratory Data Analysis

(EDA) can uncover meaningful insights from wine quality data. However, this
approach can be further expanded and enhanced in multiple ways to support
more advanced applications in the wine industry and data science:

• Integration with Machine Learning Models:

The EDA results can be used as a foundation to build predictive models
using algorithms like Decision Trees, Random Forests, or Logistic
Regression to automatically classify wine quality based on input
parameters.
• Real-Time Quality Monitoring:
The project can be extended to work with IoT devices or sensors that
collect real-time wine production data, allowing for dynamic analysis and
immediate quality checks.
• Interactive Dashboards:
Tools like Streamlit, Dash, or Power BI can be used to convert the
analysis into interactive dashboards for winemakers, allowing them to
filter and view trends in real time without writing any code.
• Recommendation System:
A system can be developed to recommend optimal ranges for parameters
like pH, alcohol, and sulphates that yield higher wine quality, helping
producers improve their process.
• Dataset Enhancement with External Factors:
Additional features like grape variety, region, temperature, humidity
during fermentation, etc., can be added to improve the depth of analysis.
CONCLUSION

This project effectively applied Exploratory Data Analysis (EDA) techniques to

evaluate and understand the key factors influencing red wine quality. By
analysing a large dataset using Python tools like Pandas, Matplotlib, and
Seaborn, we identified that attributes such as alcohol content, volatile acidity,
and sulphates play a major role in determining wine quality. The visualizations
helped uncover patterns and correlations that can aid wine producers in
enhancing their production processes. Overall, the project highlights how data
analysis can support quality assurance and lays the foundation for future
developments like predictive modelling and interactive dashboards.
REFRENCES

1. Matplotlib Documentation
https://matplotlib.org/stable/contents.html/

2. Seaborn Statistical Data Visualization Documentation

https://seaborn.pydata.org/

3. NumPy Documentation
https://numpy.org/doc/

4. Kaggle – Data Science and Machine Learning Tutorials

https://www.kaggle.com/

Combined Synthetic Minority Oversampling Technique and Deep Neural Network For Red Wine Quality Prediction
No ratings yet
Combined Synthetic Minority Oversampling Technique and Deep Neural Network For Red Wine Quality Prediction
6 pages
AWS Lab Workbook v1.0
93% (14)
AWS Lab Workbook v1.0
127 pages
The Story Teller Test PDF
No ratings yet
The Story Teller Test PDF
3 pages
Wine Quality Classification
No ratings yet
Wine Quality Classification
36 pages
Wine Quality Analysis
No ratings yet
Wine Quality Analysis
27 pages
Data-Analysis-and-Modeling-in-R
No ratings yet
Data-Analysis-and-Modeling-in-R
12 pages
DWDM GLOB
No ratings yet
DWDM GLOB
20 pages
Wine Case Report
100% (2)
Wine Case Report
16 pages
Machine Learning Miniproject
No ratings yet
Machine Learning Miniproject
10 pages
Wine Quality Prediction Using Data Mining
No ratings yet
Wine Quality Prediction Using Data Mining
13 pages
Exploratory Data Analysis and Case
No ratings yet
Exploratory Data Analysis and Case
29 pages
Wine Quality Prediction: Implementation
No ratings yet
Wine Quality Prediction: Implementation
3 pages
A Beginner's Guide To ETL With Python - by Jesús Cantú - Medium
No ratings yet
A Beginner's Guide To ETL With Python - by Jesús Cantú - Medium
13 pages
Econometrics Project AARYAN BHANOT
No ratings yet
Econometrics Project AARYAN BHANOT
13 pages
Big Data Projecct
No ratings yet
Big Data Projecct
12 pages
WNSAA Onsite Case Wine
No ratings yet
WNSAA Onsite Case Wine
3 pages
8
No ratings yet
8
5 pages
Project Report AS
No ratings yet
Project Report AS
32 pages
mahima2020
No ratings yet
mahima2020
8 pages
Wine Quality
100% (1)
Wine Quality
2 pages
Wine Quality Synopsis
No ratings yet
Wine Quality Synopsis
3 pages
Lab Rep
No ratings yet
Lab Rep
9 pages
Wine Quality Predictions
No ratings yet
Wine Quality Predictions
13 pages
VinQCheck: An Intelligent Wine Quality Assessment
No ratings yet
VinQCheck: An Intelligent Wine Quality Assessment
9 pages
Wine_Quality_Prediction_Report
No ratings yet
Wine_Quality_Prediction_Report
2 pages
Wine Quality Dataset
No ratings yet
Wine Quality Dataset
2 pages
In Vino Veritas Data Mining and Machine Learning Final Project
No ratings yet
In Vino Veritas Data Mining and Machine Learning Final Project
11 pages
Pred analytics
No ratings yet
Pred analytics
5 pages
Prediction of Wine Quality Using Machine Learning
100% (1)
Prediction of Wine Quality Using Machine Learning
12 pages
ML PR
No ratings yet
ML PR
32 pages
Wine Quality Questions
No ratings yet
Wine Quality Questions
2 pages
Wine Quality Prediction Using Machine Learning Algorithms
100% (1)
Wine Quality Prediction Using Machine Learning Algorithms
4 pages
Report Revathy
No ratings yet
Report Revathy
13 pages
802d6ef0-42ef-4fe8-9c76-ec1eb8d8fbc8
No ratings yet
802d6ef0-42ef-4fe8-9c76-ec1eb8d8fbc8
34 pages
Grupo Turing - Processo Seletivo 2019.1: Exemplo de Análise de Dados - Red Wine Quality
No ratings yet
Grupo Turing - Processo Seletivo 2019.1: Exemplo de Análise de Dados - Red Wine Quality
7 pages
ML Miniproject
No ratings yet
ML Miniproject
19 pages
Humair+Arshad+Wine+Quality+Revised
No ratings yet
Humair+Arshad+Wine+Quality+Revised
16 pages
ML Mini Report
No ratings yet
ML Mini Report
6 pages
Wine Quality Prediction GHAR
No ratings yet
Wine Quality Prediction GHAR
19 pages
grkfinal123
No ratings yet
grkfinal123
22 pages
WINE QUALITY PREDICTOR ppt
0% (1)
WINE QUALITY PREDICTOR ppt
9 pages
DT-1 Project Report
No ratings yet
DT-1 Project Report
12 pages
Finaldocmp
No ratings yet
Finaldocmp
40 pages
Copy of 5th Sem Mini Project Synopsis 2
No ratings yet
Copy of 5th Sem Mini Project Synopsis 2
2 pages
Full Text 2
No ratings yet
Full Text 2
18 pages
ML Project Report
No ratings yet
ML Project Report
12 pages
Homework #1 - Hida Efri Nurfina
No ratings yet
Homework #1 - Hida Efri Nurfina
13 pages
Mastering Apache Pinot: Real-Time Analytics at Scale
From Everand
Mastering Apache Pinot: Real-Time Analytics at Scale
Robert Johnson
No ratings yet
Honours LY Project
No ratings yet
Honours LY Project
31 pages
An Investigation of Wine Quality Testing Using Machine Learning Techniques
No ratings yet
An Investigation of Wine Quality Testing Using Machine Learning Techniques
8 pages
College Project by Muhannad-3
No ratings yet
College Project by Muhannad-3
20 pages
Red Wine Mine
100% (1)
Red Wine Mine
32 pages
DA
No ratings yet
DA
4 pages
Statistics and Probability PROJECT 2
No ratings yet
Statistics and Probability PROJECT 2
8 pages
The Classification of White Wine and Red Wine Acco
No ratings yet
The Classification of White Wine and Red Wine Acco
5 pages
20250210-35078-w15z3q
No ratings yet
20250210-35078-w15z3q
10 pages
xstkfinal
No ratings yet
xstkfinal
29 pages
Pandas Essentials for Data Analysis: Definitive Reference for Developers and Engineers
From Everand
Pandas Essentials for Data Analysis: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Red Wine Quality Prediction Using Machine Learning
No ratings yet
Red Wine Quality Prediction Using Machine Learning
4 pages
Using Chemical Composition to Predict Red Wine Quality via Multiple Linear Regression
No ratings yet
Using Chemical Composition to Predict Red Wine Quality via Multiple Linear Regression
12 pages
wine 9
No ratings yet
wine 9
20 pages
AI Projects
No ratings yet
AI Projects
41 pages
Engine Removal and Installation: General
No ratings yet
Engine Removal and Installation: General
4 pages
Intrapersonal Intelligence Enhancement
No ratings yet
Intrapersonal Intelligence Enhancement
5 pages
Closure Activities
No ratings yet
Closure Activities
5 pages
Minimum Electrical Clearance As Per BS:162
No ratings yet
Minimum Electrical Clearance As Per BS:162
4 pages
Visvesvaraya Technological University, Belgavi. Karnataka, India
No ratings yet
Visvesvaraya Technological University, Belgavi. Karnataka, India
6 pages
PL SQL Quick Reference
No ratings yet
PL SQL Quick Reference
50 pages
Microbiology and Molecular Biology Reviews-2017-Milani-e00036-17.full PDF
No ratings yet
Microbiology and Molecular Biology Reviews-2017-Milani-e00036-17.full PDF
67 pages
Design of Shafts: ME 423: Machine Design Instructor: Ramesh Singh
No ratings yet
Design of Shafts: ME 423: Machine Design Instructor: Ramesh Singh
31 pages
8 Story Residential Building
No ratings yet
8 Story Residential Building
9 pages
16-10-2024
No ratings yet
16-10-2024
20 pages
Unit Planning Poetry Unit 2 Grade 10 Q-1 Alberto
No ratings yet
Unit Planning Poetry Unit 2 Grade 10 Q-1 Alberto
5 pages
Vacancy Announcement
No ratings yet
Vacancy Announcement
7 pages
Fundamental of MATLAB: Part: I Introduction To MATLAB
100% (1)
Fundamental of MATLAB: Part: I Introduction To MATLAB
3 pages
Parameter and Error List Zanotti GM-Uniblock Zanotti GS-Split
No ratings yet
Parameter and Error List Zanotti GM-Uniblock Zanotti GS-Split
16 pages
CUET 2023 General Test Question Paper June 18 Shift 3 728df4258
No ratings yet
CUET 2023 General Test Question Paper June 18 Shift 3 728df4258
69 pages
List of Books - Pharmacology: Authorname Book Title Publisher Edition
No ratings yet
List of Books - Pharmacology: Authorname Book Title Publisher Edition
2 pages
Physics Mini Project
No ratings yet
Physics Mini Project
4 pages
Mikrokontroller At89S8252: Expanded Mode
No ratings yet
Mikrokontroller At89S8252: Expanded Mode
7 pages
Managing Mental Health in The Workplace
100% (2)
Managing Mental Health in The Workplace
7 pages
Chapter 2
No ratings yet
Chapter 2
22 pages
Around The World Lesson
No ratings yet
Around The World Lesson
4 pages
Adobe Scan Dec 06, 2024 (1)
No ratings yet
Adobe Scan Dec 06, 2024 (1)
2 pages
Social Capital and Participation Theories PDF
No ratings yet
Social Capital and Participation Theories PDF
56 pages
DeepSeek texte
No ratings yet
DeepSeek texte
4 pages
Pe - 1966-08
100% (1)
Pe - 1966-08
118 pages
Otto Cycle Vs Diesel Cycle
0% (1)
Otto Cycle Vs Diesel Cycle
32 pages
Time in A Block Universe
No ratings yet
Time in A Block Universe
25 pages
Group Theory II
No ratings yet
Group Theory II
36 pages