MCEN3030 Project1 Wine-Chemistry HZ4jcSg

This document outlines a project for MCEN 3030 at the University of Colorado, focusing on using linear regression to analyze a dataset of 1599 red wines to predict wine quality based on 11 quantitative variables. The project involves performing a Variance Inflation Factor Analysis to address multicollinearity, executing linear regression to determine coefficients, and evaluating the model's predictive accuracy through residual analysis. Deliverables include tables of VIF values, regression coefficients, residual plots, and code used in the analysis.

Uploaded by

breathernzuki

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views3 pages

MCEN3030 Project1 Wine-Chemistry HZ4jcSg

Uploaded by

breathernzuki

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

Linear Regression to Predict Wine Quality © 2025 University of Colorado

MCEN 3030 Summer 2025

In an engineering design process we often have quantitative informa-

tion about scientific/engineering parameters. Examples: the modulus
of elasticity of the foam used on a steering wheel and the grip size
of that wheel. But what modulus do humans prefer? What size?
Conservation of mass/momentum/energy/etc. can’t tell us! In this
project we will use modeling tools to help us connect qualitative
perception and quantitative measurements.1 1
Our department now offers a “Design
I found an interesting data set, not about steering wheels but of Coffee”, a “Design of Chocolate”,
and a “Design of Beer” course – I think
about wine “quality”. Wine-making is an ancient technology, but we this project is really relevant!
have modern tools to help us understand it. Why has the wine from
a particular region been so well-regarded for hundreds of years? Variables/column labels:
1. Fixed Acidity
Maybe it has something to do with the acidity, or the amount of
2. Volative Acidity
sugar, or the alcohol level – all of these variables are included in the 3. Citric Acid
data set we will use in this project. 4. Residual Sugar
5. Chlorides
Included are 11 quantitative measurements as well as an assess- 6. Free Sulfur Dioxide
ment of “quality” for 1599 red wines. We will focus on this “quality”, 7. Total Sulfur Dioxide
as an output value,2 and it presumably is a function of the other 11 8. Density
9. pH
variables. Wine is nice if it is at least a little acidic and obviously 10. Sulphates
most folks say the alcohol is a positive attribute. Sulfur Dioxide helps 11. Alcohol
12. Quality
with preservation and with activating the yeast, but some people in- 2
The min value of quality in this data
sist it negatively impacts the flavor.3 Can we have too much acidity, set is 3, the max is 8.
too much alcohol, too much sulfur? Too little? For sure. So what is 3
Many people react negatively to
the recipe for the best wine? We shall see! sulphates too, e.g. they get headaches.

The central aspect of this project is performing a linear regression

on the data set. We will assume each of the 11 “inputs” (1-11 in the
list to the right) has a linear effect on the quality (Q, variable 12) such
that we can write

Q = a0 + a1 x1 + a2 x2 + a3 x3 + . . . . (1)

At least that is going to be the starting point of our discussion... it

might be the case that not all of these variables are independent, so
let’s think about that first.

Step 1
Step 0 is to finish the homework prob-
A concern: In this data set, citric acid and pH are two separate vari- lems! You should be able to carry over
ables, but are we really able to adjust them independently? What your linear regression code here, and
add to it.
about free sulfur dioxide and total sulfur dioxide, is there a corre-
lation? We want to make sure we are correctly characterizing what
we have control over, and if we can’t actually control these levers
independently, our model is not going to be meaningful.
mcen 3030 linear regression 2

Start by performing a Variance Inflation Factor Analysis on the

input variables.4 We will say our threshold for concern is 5: If any 4
See the reading on Canvas.
VIFn > 5, we will simply toss out the variable xn with the highest
VIF, and then re-run the analysis. You do not have to write code that
automatically does this in one click... you can calculate, interpret the
results, recalculate, ..., and go from there. Report the VIF scores in the
initial analysis and the follow-up analysis/analyses in a table, and
comment on which variables are removed.

Step 2

Now that we have eliminated “multicollinearity” from the data set,

let’s go ahead and perform the linear regression. Determine the
coefficients A = [ a0 , a1 , ..., a N ] T in Eq. (1) and report them in a ta-
ble. Note that our model includes some baseline offset a0 that is not
associated with any variable.5 If the variable was removed, include 5
Make sure to be careful about your
“–” in the table (remove the variable before running the analysis). variable indexing! We include an a0
here, but by MATLAB’s convention, you
Also report the R2 -value for the fit in the text. will likely have A(1) = a0 . And if we
have used VIF to eliminate a variable,
that is going to throw off the indexing
Step 3 as well!

Is it a good predictor? What if a wine was predicted to have a quality

of 7.4, yet it actually was 4? Someone is going to be mad!6 We can 6
And someone is going to try to charge
a lot of money for mediocre wine
plot the residual ei for each wine, labeling each based on their order
“created by scientists to have the
in the list (first row is called wine “1”, second is “2”, etc.) to get an optimum chemistry”.
idea of how the model has done. Comment on whether the data
looks appropriately noisy around ei = 0, or if it seems we have
missed a dependence.7 Discuss: What is the worst overprediction 7
The evidence: patterns in ei .
(e.g. maybe wine 97 was predicted to be 7.3, and is actually a 3)?
What is the worst underprediction? How many are overpredicted
and underpredicted by a quality of 1.5 or more?

Deliverables

• A table that includes the VIF values for a first, and possibly a
second, third, fourth, ..., VIF analysis. In your report, comment on
which variable(s) you remove, if any, and speculate based on your
chemistry knowledge if that removal is reasonable.

• A table that reports the fits for A = [ a0 , ...] T and a comment in the
report about the R2 -value.

• A plot of the residual for each wine, and comments on the distri-
bution around ei = 0.
mcen 3030 linear regression 3

• A plot of the histogram8 of ei . Additionally, write a small piece of 8

histogram(e,edges) will do the job,
code that determines the worst overprediction, worst underpredic- where edges is a vector of the left-side
of each “bin”: edges= -3 : 0.5 : 3 should
tion, and the number that are overpredicted by 1.5 or more, and be good.
the number that are underpredicted by 1.5 or more. Program it,
not a manual search!
• All code used in this problem should be included as an appendix.
Your linear regression code may use the built-in inverse inv or
under-divide \.

Using Chemical Composition To Predict Red Wine Quality Via Multiple Linear Regression
No ratings yet
Using Chemical Composition To Predict Red Wine Quality Via Multiple Linear Regression
12 pages
Report Revathy
No ratings yet
Report Revathy
13 pages
Syndicate 6 - Assignment 1
No ratings yet
Syndicate 6 - Assignment 1
4 pages
Econometrics Project AARYAN BHANOT
No ratings yet
Econometrics Project AARYAN BHANOT
13 pages
Wine Quality Prediction with SVR
100% (1)
Wine Quality Prediction with SVR
6 pages
Pinot Noir Wine Quality Regression Analysis
No ratings yet
Pinot Noir Wine Quality Regression Analysis
2 pages
Wine Quality Predictions
No ratings yet
Wine Quality Predictions
13 pages
Red Wine Mine
100% (1)
Red Wine Mine
32 pages
Wine Quality Analysis
No ratings yet
Wine Quality Analysis
27 pages
Predicting Red Wine Quality Using Data
No ratings yet
Predicting Red Wine Quality Using Data
3 pages
Prediction of Wine Quality Using Machine Learning
100% (1)
Prediction of Wine Quality Using Machine Learning
12 pages
Business Analytics
No ratings yet
Business Analytics
17 pages
Pred Analytics
No ratings yet
Pred Analytics
5 pages
Wine Quality Prediction Using Data Mining
No ratings yet
Wine Quality Prediction Using Data Mining
13 pages
ML Predicts Red Wine Quality
No ratings yet
ML Predicts Red Wine Quality
12 pages
Regression Analysis of Fertility and Hiking
No ratings yet
Regression Analysis of Fertility and Hiking
8 pages
Wine Prediction
100% (1)
Wine Prediction
13 pages
R Project
No ratings yet
R Project
22 pages
Finaldocmp
No ratings yet
Finaldocmp
40 pages
Wine Quality Prediction Using Machine Learning Algorithms
100% (1)
Wine Quality Prediction Using Machine Learning Algorithms
4 pages
An Internship Project Report On: Avanthi'S Research and Technological Academy
No ratings yet
An Internship Project Report On: Avanthi'S Research and Technological Academy
34 pages
Data Analysis and Modeling in R
No ratings yet
Data Analysis and Modeling in R
12 pages
Statistics and Probability PROJECT 2
No ratings yet
Statistics and Probability PROJECT 2
8 pages
Insights on AI Logistic Regression for Wine Quality
No ratings yet
Insights on AI Logistic Regression for Wine Quality
2 pages
Wine Quality Prediction Using Regression
No ratings yet
Wine Quality Prediction Using Regression
28 pages
Wine Quality Dataset
No ratings yet
Wine Quality Dataset
9 pages
Logistic Regression for Red Wine Quality
100% (1)
Logistic Regression for Red Wine Quality
10 pages
BAM3 Lesson03.1 LinearRegression
No ratings yet
BAM3 Lesson03.1 LinearRegression
22 pages
Regression
No ratings yet
Regression
90 pages
Wine Quality Prediction Using ML
No ratings yet
Wine Quality Prediction Using ML
12 pages
Wine Quality Prediction Project
No ratings yet
Wine Quality Prediction Project
32 pages
Wine Quality Prediction Report
No ratings yet
Wine Quality Prediction Report
2 pages
Machine Learning On Wine Quality: Prediction and Feature Importance Analysis
No ratings yet
Machine Learning On Wine Quality: Prediction and Feature Importance Analysis
5 pages
DWDM Glob
No ratings yet
DWDM Glob
20 pages
QM - Ii Assignment - 3: Submitted By: Group 2 (Sec-B)
No ratings yet
QM - Ii Assignment - 3: Submitted By: Group 2 (Sec-B)
6 pages
w15z3q
No ratings yet
w15z3q
10 pages
Machine Learning Miniproject
No ratings yet
Machine Learning Miniproject
10 pages
Wine Quality Questions
No ratings yet
Wine Quality Questions
2 pages
Wine Quality Analysis Insights
100% (2)
Wine Quality Analysis Insights
16 pages
Wine Quality Synopsis
No ratings yet
Wine Quality Synopsis
3 pages
Lab Rep
No ratings yet
Lab Rep
9 pages
Wine
No ratings yet
Wine
15 pages
S Selection Nofimp Portant Fe Machi Eatures A Ne Learn and Pred Ning Tech Dicting W Hniques Wine Qual Lity Using G
No ratings yet
S Selection Nofimp Portant Fe Machi Eatures A Ne Learn and Pred Ning Tech Dicting W Hniques Wine Qual Lity Using G
8 pages
Example For Multiple Linear Regression
No ratings yet
Example For Multiple Linear Regression
10 pages
Project Report AS
No ratings yet
Project Report AS
32 pages
Wine Quality Prediction with Machine Learning
No ratings yet
Wine Quality Prediction with Machine Learning
8 pages
Wine Quality Predictor
0% (1)
Wine Quality Predictor
9 pages
Eda Report
No ratings yet
Eda Report
3 pages
A Beginner's Guide To ETL With Python - by Jesús Cantú - Medium
No ratings yet
A Beginner's Guide To ETL With Python - by Jesús Cantú - Medium
13 pages
Wine Quality Classification
No ratings yet
Wine Quality Classification
36 pages
EDA Mini Project Report
No ratings yet
EDA Mini Project Report
23 pages
Beer Data Analysis Guide for Students
No ratings yet
Beer Data Analysis Guide for Students
14 pages
Project CST 383
No ratings yet
Project CST 383
1,083 pages
Datamining Exp5 Datanormalisation
No ratings yet
Datamining Exp5 Datanormalisation
14 pages
HW04
No ratings yet
HW04
3 pages
Wine
No ratings yet
Wine
2 pages
Endothermic and Exothermic Reactions Worksheet
No ratings yet
Endothermic and Exothermic Reactions Worksheet
5 pages
Water Quality Testing Standards AASHTO T26
No ratings yet
Water Quality Testing Standards AASHTO T26
2 pages
ECC Durability in Harsh Environments
No ratings yet
ECC Durability in Harsh Environments
8 pages
9th Grade Electrolysis
No ratings yet
9th Grade Electrolysis
8 pages
Scouring Processes in Textile Treatment
No ratings yet
Scouring Processes in Textile Treatment
36 pages
2425 Chem S5 Exam1 Mark SCH
No ratings yet
2425 Chem S5 Exam1 Mark SCH
5 pages
160FC Data Sheet
No ratings yet
160FC Data Sheet
1 page
Is.15438.2004 Forms of Sulphur
No ratings yet
Is.15438.2004 Forms of Sulphur
22 pages
Corrosion Science: S.Y. Arman, B. Ramezanzadeh, S. Farghadani, M. Mehdipour, A. Rajabi
No ratings yet
Corrosion Science: S.Y. Arman, B. Ramezanzadeh, S. Farghadani, M. Mehdipour, A. Rajabi
10 pages
Flame Photometry
No ratings yet
Flame Photometry
8 pages
Doubt-Schedule - CNP - 18 07 25 - 19 07 25
No ratings yet
Doubt-Schedule - CNP - 18 07 25 - 19 07 25
2 pages
uPVC Schedule 80 Pressure Pipe Guide
No ratings yet
uPVC Schedule 80 Pressure Pipe Guide
20 pages
Study of VO 2 Thin Film Synthesis by Atomic Layer Deposition
No ratings yet
Study of VO 2 Thin Film Synthesis by Atomic Layer Deposition
11 pages
Apulian Red-Figured Pottery Analysis
No ratings yet
Apulian Red-Figured Pottery Analysis
10 pages
Elective Science 8 Seatwork 1.1 DIRECTION: Match The Glassware in Column A With Its Proper Name in Column B and Its Use in
No ratings yet
Elective Science 8 Seatwork 1.1 DIRECTION: Match The Glassware in Column A With Its Proper Name in Column B and Its Use in
1 page
3.4 Nitrogen and Its Compounds
No ratings yet
3.4 Nitrogen and Its Compounds
11 pages
SILRES® BS 3003, SILRES® BS 4004 Emulsions For Hydrophobic Facade Impregnation 7769-EN
No ratings yet
SILRES® BS 3003, SILRES® BS 4004 Emulsions For Hydrophobic Facade Impregnation 7769-EN
2 pages
3 Phy
No ratings yet
3 Phy
1 page
CBSE Class 10 Science Lab Manual - Types of Reactions
No ratings yet
CBSE Class 10 Science Lab Manual - Types of Reactions
25 pages
Food Waste-Based Carbonization For Biochar Product
No ratings yet
Food Waste-Based Carbonization For Biochar Product
18 pages
Tungstek - General Technical Information
No ratings yet
Tungstek - General Technical Information
28 pages
Thar Flyer 202206 en
No ratings yet
Thar Flyer 202206 en
2 pages
Formulation of Peeling
No ratings yet
Formulation of Peeling
12 pages
ChemLab - Syllabus
No ratings yet
ChemLab - Syllabus
4 pages
SS en 14105 2020 en PDF
No ratings yet
SS en 14105 2020 en PDF
11 pages
Solar System Planets Diagram and Comprehension Questions
No ratings yet
Solar System Planets Diagram and Comprehension Questions
5 pages
Chapter 5-Gas Chromatography 2014
No ratings yet
Chapter 5-Gas Chromatography 2014
58 pages
PB1702 Raymond Flash Drying System
No ratings yet
PB1702 Raymond Flash Drying System
2 pages
Class X Science Sample Question Paper
No ratings yet
Class X Science Sample Question Paper
163 pages
Industrial Coatings Spectra Library
No ratings yet
Industrial Coatings Spectra Library
18 pages