Major Project
Major Project
VIII SEM
INTERNSHIP ( 18EEI85 )
Learning the basics of Machine Learning and Artificial Intelligence and thereby applying this knowledge
to automate processes helps us save time and develop different data science applications in order to
recognize the images displayed on the screen using the image processing or solve various data science
problems. Also, by leaning to work on various platforms such as Kaggle, which allows users to find and
publish data sets, explore and build models in web-based data science environment. This helps us write
programs base don the customer requirements for different applications.
INTRODUCTION
The campus placements are held before the final examination. These placements not only benefit the
students, but they also benefit the companies and colleges. However, as a student, you need to prepare
well for the campus placements. Getting the best companies for the placement drive is the main job of
your college.
The main objective of the campus placement activity is to get students a job right after they have finished
their education. Since the campus placement is conducted before the final exam, students have an added
incentive to do well in their final exam. Students are also under less pressure. Good placements also allow
colleges to claim 100 percent placement which is an excellent form of advertising for them. Companies
get to snatch up talented people who will do some great work for their business.
ABOUT COMPANY
Industrial technologies is a renewed name in automation industry in southern and central part of India.
Industrial technologies have its presence in Bangalore, Chennai, Nagpur and Bhopal and has MOUs
with various engineering colleges and institutes. Industrial technologies engaged in providing
engineering consultancy and services to all automation industries especially heavy industries such as
steel, cement, power plants and process industries.
MACHINE LEARNING AND ITS WORKING
Machine learning (ML), a subset of artificial intelligence (AI), is the area of computational science
that focuses on analysing and interpreting patterns and structures in data to enable learning,
reasoning, and decision making outside of human interaction. Simply put, machine learning allows
the user to feed a computer algorithm an immense amount of data and have the computer analyse
and make data-driven recommendations and decisions based on only the input data.
The Machine Learning process starts with inputting training data into the selected algorithm.
Training data being known or unknown data to develop the final Machine Learning algorithm. The
type of training data input does impact the algorithm, and that concept will be covered further
momentarily. New input data is fed into the machine learning algorithm to test whether the
algorithm works correctly. The prediction and results are then checked against each other.
OBJECTIVE AND DETAILS OF DATA
To analyze the factors that lead to ENGINEERING PLACEMENTS and make prediction
The primary source of data for this project was from Kaggle user SONALI SINGH dataset is comprised of 2966
TITLE OF THE
records with 8 attributes
PROJECT
Some important metadata that is of particular interest is:
a. Age
b. Internships
GROUP
c. CGPA
MEMBERS
d. Gender
e. Stream
f. Hostel
g. HistoryOfBacklogs
h. PlacedOrNot
Software Requirements
Anaconda Software, Anaconda is a free and open-source distribution of the programming languages
Python and R. The distribution comes with the Python interpreter and various packages related to
machine learning and data science
Modules required
Numpy : This library is used for processing large multi-dimensional array and matrix formation by
using a large col- lection of high-level mathematical functions and formulas.
Pandas : Pandas is a Python library that is mainly used for data analysis
Seaborn : Seaborn is an amazing Python visualization library built on top of mat- plotlib. It gives us
the capability to create amplified data visuals
Sklearn: It provides a selection of efficient tools for machine learning and statistical modeling
including classification, regression, clustering and dimensionality reduction via a consistence
interface in Python
Matplotlib : Matplotlib is a Python library that is used for data visualization
Data Collection
First step involves gathering data from various sources such as databases, files. Before starting the data collection
process, it’s important to articulate the problem to solve with an ML model.
Data Pre-Processing :
Pre-processing refers to the transformations applied to our data before feeding it to the algorithm. Data Preprocessing is
a technique that is used to convert the raw data into a clean data set. It includes :
Finding Missing Data : If dataset contains some missing data, then it may create a huge problem for the machine
learning model
By deleting the particular row : Just delete the specific row or column which consists of null values
By Calculating the mean : Calculate the mean of that column or row which contains any missing value and will put it
on the place of missing value
Splitting the Dataset into the Training set and Test set : This is one of the crucial steps of data preprocessing.
The dataset is divided as training and testing set.
Training Set : A subset of dataset to train the machine learning model, and we already know the output.
Test set : A subset of dataset to test the machine learning model, and by using the test set, model predicts the output.
EDA
#printing the concise summary of the dataset
df.info( )
Preprocessing of this dataset includes doing analysis on the independent variables like
checking for null values in each column and then replacing or filling them with supported
appropriate data types, so that analysis and model fitting is not hindered from its way to
accuracy
Data Exploration
• Understanding Machine Learning, it’s models and how to apply them in real world
• Knowledge of various models such as Linear Regression, Random Forest Algorithm and their visualization using
feature scaling and plots.
• Prediction project was given. Successfully completed the project by predicting the Outcome.
SOURCE OF DATA AND REFERENCES