
Data Structure
Networking
RDBMS
Operating System
Java
MS Excel
iOS
HTML
CSS
Android
Python
C Programming
C++
C#
MongoDB
MySQL
Javascript
PHP
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
Use ML for Wine Quality Prediction
This tutorial will take a wine quality dataset from online sources such as Kaggle. The preferred dataset is the "Wine Quality Dataset," available at "https://www.kaggle.com/datasets/yasserh/wine-quality-dataset."
The dataset contains a .csv file comprising various categories of wine, such as 'fixed acidity,' 'volatile acidity,' 'pH,' 'density,' and more. From this dataset, the field name 'quality' was dropped at the initial stage, and further, the model was trained.
Here is the Python code to predict the wine quality.
Importing the necessary libraries.
import pandas as pd import numpy as np from sklearn.model_selection import train_test_split from sklearn.linear_model import LinearRegression from sklearn.metrics import mean_squared_error import matplotlib.pyplot as plt
Import the wine quality dataset
wine = pd.read_csv('/Users/someswarpal/Downloads/WineQT.csv')
Drop the column named quality.
X = wine.drop(columns=['quality']) y = wine['quality']
Split the data into testing and training sets.
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
Create a linear regression model
model = LinearRegression()
Train the model
model.fit(X_train, y_train)
Make predictions on the training sets.
y_pred = model.predict(X_test)
Evaluate the model
mse = mean_squared_error(y_test, y_pred) print("Mean Squared Error:", mse)
Calculate the mean quality for each category
mean_quality = wine.groupby('quality')['quality'].mean()
Output
Mean Squared Error: 0.38242835212919696
Find the category with the highest mean quality
best_quality = mean_quality.idxmax() best_mean_quality = mean_quality.max()
Print the summary for best Wine.
print("Summary of Wine Quality:") print("----------------------------") print("Best Wine Quality Category:", best_quality) print("Mean Quality Score:", best_mean_quality)
Output
Summary of Wine Quality: ---------------------------- Best Wine Quality Category: 8 Mean Quality Score: 8.0
Find the category with the lowest mean quality
worst_quality = mean_quality.idxmin() worst_mean_quality = mean_quality.min()
Print the summary for worst Wine
Example
print("Summary of Wine Quality:") print("----------------------------") print("Worst Wine Quality Category:", worst_quality) print("Mean Quality Score:", worst_mean_quality)
Output
Summary of Wine Quality: ---------------------------- Worst Wine Quality Category: 3 Mean Quality Score: 3.0
Conclusion
In conclusion, the code analyzes and displays data from a collection about wine quality in several ways. It starts by reading the dataset and separating it into input features (X) and the goal variable (y). The training set is then used to make and train a linear regression model. On the test set, predictions are then made, and the mean squared error is used to measure how well the model works.
The code also determines each category's average quality in the dataset and finds the category whose average quality is the best. Scatter plots, histograms, box plots, bar charts, line plots, correlation heatmaps, and pie charts are some of the images that can be made. These pictures show how different things affect the quality of the wine.
Overall, the code thoroughly studies the wine quality dataset, from modeling and evaluating the data to showing how the data are distributed and how they relate to each other. It shows how to use famous libraries for data analysis and visualization, such as Pandas, NumPy, sci-kit-learn, matplotlib, and Seaborn, to make the analysis process more accessible and give helpful information for understanding the dataset.