0% found this document useful (0 votes)
10 views23 pages

EDA Mini Project Report

The report on 'Red Wine Quality Assurance' presents an Exploratory Data Analysis (EDA) of a synthetic dataset to assess factors influencing red wine quality. Key findings reveal significant correlations between physicochemical properties like alcohol content and sulphates with wine quality, while also identifying negative impacts from volatile acidity. The project aims to support wine producers in quality assurance and lays the groundwork for future predictive modeling and enhancements in wine production.

Uploaded by

Aum
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views23 pages

EDA Mini Project Report

The report on 'Red Wine Quality Assurance' presents an Exploratory Data Analysis (EDA) of a synthetic dataset to assess factors influencing red wine quality. Key findings reveal significant correlations between physicochemical properties like alcohol content and sulphates with wine quality, while also identifying negative impacts from volatile acidity. The project aims to support wine producers in quality assurance and lays the groundwork for future predictive modeling and enhancements in wine production.

Uploaded by

Aum
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

REPORT ON

“Red Wine Quality Assurance”

SUBMITTED TO THE DEPARTMENT OF ARTIFICIAL INTELLIGENCE


& DATA SCIENCE

ALL INDIA SHRI SHIVAJI MEMORIAL SOCIETY'S INSTITUTE OF


INFORMATION TECHNOLOGY, PUNE

SUBMITTED BY

Aum Patil - 2361004


Abhijeet Bagal -2361006

UNDER THR GUIDANCE OF

Ms. Madhuri B. Thorat

Department of Artificial Intelligence and Data Science

BY

ALL INDIA SHRI SHIVAJI MEMORIAL SOCIETY'S INSTITUTE OF


INFORMATION TECHNOLOGY, PUNE

ACADEMIC YEAR: 2024-25


Department of Artificial Intelligence and Data
Science
CERTIFICATE

This is to certify that the Project Report tilted

“Red Wine Quality Assurance”

Submitted by

Name Roll No
1. Aum Patil 2361004
2. Abhijeet Bagal 2361006

is a bonafide student of this institute and the work has been carried out by them under the
supervision of Ms. Madhuri B. Thorat and is approved for the partial fulfillment of the
Department of Artificial Intelligence and Data Science, AISSMS IOIT, Pune.

Ms. Madhuri B. Thorat Dr. R. A. Jamadar

Guide Head of the Department

(Department of AI & DS) (Department of AI & DS)


INDEX

Sr. no. Topic Name Page No

1
1. Abstract

2. Acknowledgement 2

3
3. Introduction

4. Problem Statement, Objective, DataSet 4

5-7
5. Proposed System(System Workflow and Workflow Diagram)

8
6. System Requirements

7. Implementation(Results and Output) 9-17

8. Future Scope 18

9. Conclusion 18

10. References 20
ABSTRACT

The quality of red wine is a critical factor in consumer satisfaction and market
value. This project presents a comprehensive Exploratory Data Analysis (EDA)
approach to assess the factors influencing the quality of red wine. Using a
synthetically generated dataset of 5,000 records simulating real-world wine
attributes, the project analyzes key physicochemical properties such as alcohol
content, volatile acidity, citric acid, sulphates, and pH levels. Python libraries
including Pandas, Seaborn, Matplotlib, and NumPy were used to process,
visualize, and interpret the data. The EDA reveals significant correlations
between certain features—most notably, a strong positive correlation of alcohol
and sulphates with wine quality, and a negative correlation with volatile acidity.
A variety of univariate, bivariate, and multivariate plots were used to uncover
data patterns, detect outliers, and identify quality-driving parameters. The
insights gained from this analysis not only demonstrate the power of EDA in
quality assurance but also provide a foundation for predictive modeling and
further optimization in wine production.
ACKNOWLEDGEMENT

I would like to express my sincere gratitude to all those who guided and
supported me throughout the completion of this project titled “Red Wine
Quality Assurance using EDA in Python.” First and foremost, I would like to
thank my mentor and faculty for their valuable insights, encouragement, and
constructive feedback during each phase of this project. Their continuous
guidance helped me refine my understanding of data analysis and apply it
effectively in a practical setting. I am also thankful to my peers and friends for
their collaborative discussions and support, which contributed to improving the
quality of this work. A special thanks to the open-source Python community and
the developers of the libraries such as Pandas, Matplotlib, Seaborn, and
NumPy, without which this project would not have been possible. Lastly, I
would like to thank my family for their unwavering support and motivation
throughout the duration of this project. This project has been a great learning
experience and has strengthened my interest and skills in data science and
analytics.
INTRODUCTION

In today's data-driven world, quality assurance has become an essential aspect


across industries, including the food and beverage sector. Among these, wine
production is a field where subtle variations in chemical composition can
significantly impact product quality. Evaluating and maintaining the quality of
red wine is a complex task influenced by various physicochemical attributes such
as alcohol content, acidity levels, sulphates, and sugar content.

This project, titled "Red Wine Quality Assurance using Exploratory Data
Analysis (EDA) in Python," aims to analyze and understand the relationship
between different measurable features of red wine and their impact on its overall
quality rating. The dataset used in this study is synthetically generated to
resemble real-world red wine data, ensuring scalability and reliability for
analysis.

EDA plays a critical role in data science by helping analysts uncover hidden
patterns, detect anomalies, check assumptions, and build a better understanding
of the data before applying advanced models. By employing visualization and
statistical techniques, this project identifies the most influential factors affecting
wine quality and provides meaningful insights for quality control and decision-
making.

The project utilizes Python's powerful libraries like Pandas, NumPy,


Matplotlib, and Seaborn to perform comprehensive data exploration. This
analysis lays the groundwork for future predictive modeling and supports
winemakers in producing consistently high-quality wine.
PROBLEM STATEMENT, OBJECTIVE
AND
DATASET

Problem Statement
To develop a data-driven solution that analyses the physicochemical properties of
red wine to identify the most influential factors affecting its quality. To
implement an end-to-end analytical model that enables wine producers to
interpret key variables such as acidity, alcohol content, and sulphates, thereby
supporting consistent quality assurance and informed decision-making in the
wine production process.

Objectives
The primary objectives of this project are as follows:
1. To perform Exploratory Data Analysis (EDA) on red wine data to
understand distribution, variance, and relationships between features.
2. To identify the key physicochemical factors that significantly influence
the quality of red wine.
3. To visualize the data using statistical plots and correlation heatmaps for
better interpretation.
4. To uncover hidden patterns and insights that can help in quality
assurance and optimization of wine production.

Dataset
• Name: Synthetic Red Wine Quality Dataset
• Size: 5,000 records × 12 features
• File Format: CSV (red_wine_quality_large.csv)
• Data Source: Custom-generated using Python's numpy and pandas
libraries based on statistical distributions of real-world wine features.
PROPOSED SYSTEM

The proposed system is designed to analyze red wine quality based on various
physicochemical parameters using Exploratory Data Analysis (EDA). The
system aims to discover meaningful patterns and relationships in the dataset to
assist wine producers in understanding and improving wine quality.
The system workflow is divided into the following key stages:

1. Dataset Generation
• A synthetic dataset of 5,000 entries is generated using Python, simulating
real-world red wine characteristics.
• Features such as acidity, alcohol content, sugar, sulphates, and quality
score are generated using statistically relevant distributions.

2. Data Collection & Import


• The generated dataset is stored in a CSV file and imported into the system
using the Pandas library.
• Initial inspection is done to verify column structure, data types, and
sample records.

3. Data Preprocessing
• Cleaning the dataset by checking for missing/null values.
• Descriptive statistics like mean, median, standard deviation are computed.
• Data types are validated and converted if necessary for analysis.
• Outlier detection using boxplots.
4. Exploratory Data Analysis (EDA)
• Univariate Analysis: Distribution of individual features using histograms
and count plots.
• Bivariate Analysis: Pairwise relationships using scatterplots, boxplots,
and violin plots.
• Multivariate Analysis: Interaction between multiple features using pair
plots and heatmaps.
• Correlation Matrix: Heatmap to highlight relationships between features
and wine quality.

5. Insight Extraction
• Patterns such as “higher alcohol content correlates positively with
quality” are derived.
• Features negatively impacting quality like high volatile acidity are
identified.
• Recommendations for improving wine quality are formed based on
findings.

6. Conclusion and Future Scope


• Final interpretation of the analysis to conclude which features
significantly affect quality.
• Scope for using the EDA results in future predictive modeling and quality
forecasting.
SYSTEM WORKFLOW DIAGRAM

Fig 1 : System Workflow Diagram


SYSTEM REQUIRMENTS

To successfully implement the Red Wine Quality Assurance EDA project using
Python, a system with moderate specifications is sufficient.

Hardware Requirements
• A computer with Intel i3 processor or AMD Ryzen (dual-core or
higher).
• Minimum 4 GB RAM (Recommended: 8 GB for better performance).
• At least 500 MB of free disk space for storing project files and the
dataset.
• Display of 13 inches or more with a resolution of 1366×768 or higher.
• Compatible with Windows 10, macOS, or Linux operating systems.

Software Requirements
• Python version 3.8 or higher.
• Jupyter Notebook, or an IDE like VS Code or PyCharm for writing and
running the code.
• Pandas library for data manipulation.
• NumPy library for numerical operations.
• Matplotlib for basic plotting and visualization.
• Seaborn for statistical visualizations like boxplots, heatmaps, etc.
• CSV file named red_wine_quality_large.csv for input data.

Optional Tools
• Google Colab for running the project in the cloud without installation.
• Streamlit for converting the EDA into an interactive dashboard (for
future scope).
• Git / GitHub for version control and sharing the project online.
IMPLEMENTATION

# SOURCE CODE AND OUTPUT


FUTURE SCOPE

The current project effectively demonstrates how Exploratory Data Analysis


(EDA) can uncover meaningful insights from wine quality data. However, this
approach can be further expanded and enhanced in multiple ways to support
more advanced applications in the wine industry and data science:

• Integration with Machine Learning Models:


The EDA results can be used as a foundation to build predictive models
using algorithms like Decision Trees, Random Forests, or Logistic
Regression to automatically classify wine quality based on input
parameters.
• Real-Time Quality Monitoring:
The project can be extended to work with IoT devices or sensors that
collect real-time wine production data, allowing for dynamic analysis and
immediate quality checks.
• Interactive Dashboards:
Tools like Streamlit, Dash, or Power BI can be used to convert the
analysis into interactive dashboards for winemakers, allowing them to
filter and view trends in real time without writing any code.
• Recommendation System:
A system can be developed to recommend optimal ranges for parameters
like pH, alcohol, and sulphates that yield higher wine quality, helping
producers improve their process.
• Dataset Enhancement with External Factors:
Additional features like grape variety, region, temperature, humidity
during fermentation, etc., can be added to improve the depth of analysis.
CONCLUSION

This project effectively applied Exploratory Data Analysis (EDA) techniques to


evaluate and understand the key factors influencing red wine quality. By
analysing a large dataset using Python tools like Pandas, Matplotlib, and
Seaborn, we identified that attributes such as alcohol content, volatile acidity,
and sulphates play a major role in determining wine quality. The visualizations
helped uncover patterns and correlations that can aid wine producers in
enhancing their production processes. Overall, the project highlights how data
analysis can support quality assurance and lays the foundation for future
developments like predictive modelling and interactive dashboards.
REFRENCES

1. Matplotlib Documentation
https://matplotlib.org/stable/contents.html/

2. Seaborn Statistical Data Visualization Documentation


https://seaborn.pydata.org/

3. NumPy Documentation
https://numpy.org/doc/

4. Kaggle – Data Science and Machine Learning Tutorials


https://www.kaggle.com/

You might also like