PROJECT PROPOSAL
Group 1
1. Title:
“Industrial Machine Failure Prediction by using probability and statistical methods”
2. Dataset Description:
• Dataset: Predictive Maintenance Dataset
• Source: Kaggle
• Size: The dataset contains 10,000 rows and 10 columns
• Type: The dataset contains numerical, categorical and binary data:
o Numerical: Air temperature, Process temperature, Rotational speed, Torque,
Tool wear
o Categorical: Product ID, Type, Failure Type
o Binary: Target (0 or 1)
3. General Exploration Plan:
➢ Data Cleaning & Preprocessing
• Check for missing values and null values.
• Convert categorical variables into numerical format using one-hot encoding.
• Normalize or standardize numerical features if necessary to improve model
performance.
• Remove unnecessary columns if they are not useful for the analysis.
➢ Summary Statistics:
• Use summary statistics (mean, median, mode, standard deviation) to understand
the distribution of machine data.
➢ Visualizations:
• Create Histograms & Boxplots for showing the distribution of the data set by using
ggplot2 and baseR.
• Use scatter plots to explore relationships between different variables.
• Produce bar charts for categorical variables, such as machine types and failure
types.
• Create heat map to check the corelation between all the columns with respect to
each other.
• Use Pie charts to understand the proportion of different failure types.
• Create density plot to analyse the probability distribution of the process
temperature and other numerical variables based on failure.
➢ Challenges:
• The dataset contains multiple failure types, which may require separate modelling
approaches.
• If failure cases are less compared to non-failures, it could affect model
performance.
• Outliers in sensor readings could impact analysis.
• Some features may have a significant number of missing values, which could affect
analysis.
4. Questions to answer
i. Which variables impact the most to cause machine failures?
- Understanding key variables such as temperature, torque, wear that lead to failures
can help to optimize maintenance schedules.
ii. Are there patterns in the failure types?
- Analysing various failure types can help in finding root causes and enhancing machine
reliability.
iii. Can we predict machine failures based on the sensor readings?
- We can use different types of machine learning algorithms and train them on machine
data to predict machine failures.
iv. How does tool wear affect machine failure?
- Checking whether excessive wear is a main contributing factor to breakdowns.
v. Are there any major differences in failure rates between different machine types?
- This question is important because it can help identify if certain machine types are
more prone to failures, allowing for targeted improvements.