0% found this document useful (0 votes)

5 views

DATASCIENCE_INTERNSHIP[1]

Data science is an interdisciplinary field that utilizes scientific methods to extract insights from data, combining statistics, computer science, and domain expertise. Key components include data collection, cleaning, and exploratory analysis, with tools like Python, Pandas, and Matplotlib being essential. The field is crucial for enhancing decision-making, predictive analytics, and operational efficiency, and it continues to evolve with advancements in AI and big data.

Uploaded by

mdnishadh001

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views

DATASCIENCE_INTERNSHIP[1]

Uploaded by

mdnishadh001

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 43

*DATA SCIENCE

* What is DATA SCIENCE

* Data science is an interdisciplinary field that uses scientific
methods, processes, algorithms, and systems to extract
knowledge and insights from structured and unstructured
data.
* It combines aspects of statistics, computer science, and
domain expertise to analyze and interpret complex data.
*key components and processes
involved in data science

*Data Collection
*Data Cleaning and Preprocessing
*Exploratory Data Analysis (EDA)
*Why is Data Science
Important?
* Enhances decision-making
* Predictive analytics for future trends
* Personalization and improved customer experience
* Operational efficiency
* Image/Graphic: Infographic with data science applications in
different industries (e.g., healthcare, finance, retail)
*Key Tools and Technologies

* Programming Languages: Python, R

* Libraries: Pandas, NumPy, Scikit-learn, TensorFlow
* Tools: Jupyter Notebook, Tableau, PowerBI
* Image/Graphic: Logos or icons of the mentioned tools
* Numpy Array Indexing
* import numpy as np
* a = np.array([1, 2, 3, 4])
* print(a[0] + a[2])

* Output : 4

* import numpy as np
* a = np.array([[1,2,3,4,5], [6,7,8,9,10]])
* print( a[1, 4] + a[0,2])

* Output : 13
* import numpy as np
* a = np.array([[[1, 2, 3], [4, 5, 6]], [[7, 8, 9],
[10, 11, 12]]])
* print(a[0, 1, 2])

* Output : 6
* import numpy as np
*Array Slicing
* a= np.array([1, 2, 3, 4, 5, 6, 7])
* print(a[4:])

* Output : [5 6 7]

* import numpy as np
* a= np.array([[1, 2, 3, 4, 5], [6, 7, 8, 9, 10]])
* print(a[1, 1:4])

* Output : [7 8 9]

* import numpy as np
* arr = np.array([1, 2, 3, 4, 5, 6, 7]) #step slicing
* print(arr[::2])

* Output : [1 3 5 7]
* import numpy as np
* a = np.array([[1, 2, 3, 4, 5], [6, 7, 8, 9, 10]])
* print(a[0:2, 1:4]) #First slicing represents 0 to 2
index 2nd Represents values

* Output : [[2 3 4]
* [7 8 9]]
*Copy and view:
* import numpy as np
* a = np.array([1, 2, 3, 4, 5])
* x = a.copy()
* a[0] =0
* print(a)
* print(x)
* Output : [0 2 3 4 5]
* [1 2 3 4 5]
* ------------------------------------------------------------------------------------------------------------
* import numpy as np
* a = np.array([1, 2, 3, 4, 5])
* x = a.view()
* a[0] = 0
* print(a)
* print(x)
* Output : [0 2 3 4 5]
* [0 2 3 4 5]
*Array Shape:
* import numpy as np
* a = np.array([[1, 2, 3, 4], [5, 6, 7, 8]])
* print(a.shape)

* Output : (2,4)
*Array Reshape: 1D to 2D
and 1D to 3D
* import numpy as np
* a= np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12])
* b= a.reshape(4, 3)
* print(b)

* Output : [[1 2 3]
* [4 5 6]
* [7 8 9]
* [10 11 12]]
* import numpy as np
*For Loop and Array
*
*
*
a= np.array([[1, 2, 3], [4, 5, 6]])
for x in a:
for y in x:
iterating:
* print(y)
* Output : 1
* 2
* 3
* 4
* 5
* 6
------------------------------------------------------------------------------------------------------------------------------------------
------------------------
* import numpy as np
* a= np.array([[1,2,3],[4,5,6]])
* for x in np.nditer(a):
* print(x)
* Output : 1
* 2
* 3
* 4
* 5
* 6
* import numpy as np *Array Concatenation
* a1 = np.array([[1, 2], [3, 4]])
* a2 = np.array([[5, 6], [7, 8]])
* a = np.concatenate((a1, a2), axis=1) #axis represents row
* print(a)

* Output : [[1 2 5 6]
* [3 4 7 8]]
* -----------------------------------------------------------------------------------------------------------
* import numpy as np
* a1 = np.array([[1, 2], [3, 4]])
* a2 = np.array([[5, 6], [7, 8]])
* a = np.concatenate((a1, a2))
* print(a)

* Output : [[1 2]
* [3 4]
* [5 6]
* [7 8]]
*Array Sort:

*import numpy as np
*a = np.array([[13, 9, 4], [15, 12, 1]])
*print(np.sort(a))

*Output :[[4 9 13]

* [1 12 15]]
*Pandas
➢ Pandas is a Python library used for working with data sets.
➢ It has functions for analyzing, cleaning, exploring, and manipulating data.
➢ The name "Pandas" has a reference to both "Panel Data", and "Python Data Analysis“
➢ Pandas allows us to analyze big data and make conclusions based on statistical theories.
➢ Pandas can clean messy data sets, and make them readable and relevant

*Pandas installation:
* In command prompt type pip install pandas

* import pandas

* import pandas as pd #as keyword is used to import the pandas as pd

*
* import pandas as pd
*Pandas DataFrames:
* data = {
* "name": ["Arun", "praveen", "Raja"],
* 'Marks': [100, 70, 82]
*}
* new = pd.DataFrame(data)
* print(new)

* Output : name marks

* 0 arun 100
* 1 Praveen 70
* 2 Raja 82
*Index
* import pandas as pd
* data = {
* "name": ["Arun", "praveen", "Raja"],
* 'Marks': [100, 70, 82]
*}
* new = pd.DataFrame(data,index=[“one”,” two”,
“three”])
* print(new)

* Output : name marks

* one arun 100
* two Praveen 70
* three Raja 82
*Duplicate values
* import pandas as pd

* data = {'A': [1, 2, 3, 5],

* 'B': ['apple', 'banana', 'apple','apple']}

* df = pd.DataFrame(data)

* print(df)

* datas = df[df.duplicated(subset='B',keep=False)]
* df = datas.value_counts(subset='B')
* print(df)
*Remove Duplicates

* import pandas as pd

* # Sample DataFrame
* data = {'A': [1, 2, 3, 5],
* 'B': ['apple', 'banana', 'apple','apple']}

* data = pd.DataFrame(data)

* df = data.drop_duplicates(subset='B')
* print(df)
*Test Case
* Question:
* +----+---------+
* | id | email |
* +----+---------+
* | 1 | [email protected]|
* | 2 | [email protected] |
* | 3 | [email protected] |
* +----+---------+
* id is the primary key (column with unique values) for this
table.
* Each row of this table contains an email. The emails will not
contain uppercase letters.
* Output:
* output: [email protected] is repeated two times.
*Pandas Series:
* import pandas as pd
* a = [1, 2, 8]
* new = pd.Series(a, index = ["x", "y", "z"])
* print(new)

* Output : x 1
* y 2
* z 8
* dtype: int64
* import pandas as pd
* a = [1, 2, 8]
* new = pd.Series(a, index = ["x", "y", "z"])
* print(new)

* Output : x 1
* y 2
* z 8
* dtype: int64
* ------------------------------------------------------------------------------------------------------------
* import pandas as pd
* names = {"name1": "Arun", "name2": "Raja", "name3": "Praveen"}
* new = pd.Series(names)
* print(new)

* Output : name1 Arun

* name2 Raja
* name3 Praveen
* dtype: object
* import pandas as pd
data = {
"Duration":{
"0":60,
"1":60,
"2":60,
"3":45,
"4":45,
"5":60
},
"Pulse":{
"0":110,
"1":117,
"2":103,
"3":109,
"4":117,
"5":102
},
"Maxpulse":{
"0":130,
"1":145,
"2":135,
"3":175,
"4":148,
"5":127
},
"Calories":{
"0":409,
"1":479,
"2":340,
"3":282,
"4":406,
"5":300
}
}
df = pd.DataFrame(data)
print(df)
*Read csv file
* import pandas as pd
df = pd.read_csv('data.csv')

print(df))
df.head()
df.info()
df.isnull()
df.isnull().sum()
data = data.drop([‘column name’], axis = 1)
*Matplotlib:
* Matplotlib is a plotting library for the python programming
language and it’s numerical mathematic extension of Numpy.

* Matplotlib is a low level graph plotting library in python that

serves as a visualization utility
* In command prompt type pip install matplotlib

* import matplotlib
*Pyplot

* import matplotlib.pyplot as plt

import numpy as np

xpoints = np.array([0, 6])

ypoints = np.array([0, 250])

plt.plot(xpoints, ypoints)
plt.show()
*Plotting x and y points

* import matplotlib.pyplot as plt

import numpy as np

xpoints = np.array([1, 8])

ypoints = np.array([3, 10])

plt.plot(xpoints, ypoints)
plt.show()
*Markers

* import matplotlib.pyplot as plt

import numpy as np

ypoints = np.array([3, 8, 1, 10])

plt.plot(ypoints, marker = 'o')

plt.show()
*Display Multiple Plots
* import matplotlib.pyplot as plt
import numpy as np
#plot 1:
x = np.array([0, 1, 2, 3])
y = np.array([3, 8, 1, 10])
plt.subplot(1, 2, 1)
plt.plot(x,y)
#plot 2:
x = np.array([0, 1, 2, 3])
y = np.array([10, 20, 30, 40])
plt.subplot(1, 2, 2)
plt.plot(x,y)
plt.show()
*Machine Learning
* Machine learning is a branch of AI that focuses
on the development of algorithms that can
learn from and make predictions or decisions
based on data.
* ML algorithms learn patterns and relationships
from labeled or unlabeled data to perform
specific tasks without being explicitly
programmed.
*Types of Algorithms:

*Supervised Learning
*Unsupervised Learning
*Reinforcement Learning
*Supervised Learning
Supervised learning is a type of machine learning algorithm where
the system is trained on labeled data to make predictions or decisions based
on the input data. In supervised learning, the algorithm learns from a set of
input data (often called "training data") that is already labeled with the
correct output or target variable. The algorithm tries to learn a mapping
function that can predict the correct output for any new input data it
encounters.
Supervised learning can be divided into two main categories:
1.regression
2.classification.
* REGRESSION *
* In regression, the output variable is continuous, and the algorithm tries to find a
relationship between the input variables and the output variable. For example,
given data on the size of a house, the number of bedrooms, and the location, a
regression model can predict the price of the house.
* CLASSIFICATION
* In classification, the output variable is categorical, and the algorithm tries to
classify the input data into different categories. For example, given a dataset of
images of cats and dogs, a classification model can predict whether a new
image is of a cat or a dog.
*Unsupervised Learning
* Unsupervised learning is a type of machine learning algorithm
where the system learns to identify patterns and relationships in the
input data without any explicit supervision or labels. In unsupervised
learning, the input data is not labeled, and the algorithm tries to find
a hidden structure or clustering in the data.
*Reinforcement learning
*Model Evaluation

*Accuracy, Precision, Recall, F1 Score

*Confusion Matrix
*Cross-validation
*Accuracy:
* Accuracy is the ratio of correctly predicted instances to
the total instances.
Number of Correct Predictions
* Accuracy=
Totall number of predictions
* Accuracy can be misleading in imbalanced
datasets.
* High accuracy doesn't always mean good
performance (e.g., in cases where one class
dominates).
* Does not account for the costs of different types
of errors (false positives vs. false negatives).
*Precision
* Precision is the ratio of correctly predicted positive observations to
the total predicted positives.
* True positives(TP)

* Precision=
* True positives(TP)+False positives(FP)

• Indicates how many of the predicted positive instances are actually positive.
• Crucial in situations where the cost of false positives is high (e.g., medical
diagnosis, spam detection).
• Helps in assessing the relevance of positive predictions in information retrieval
systems.
*Confusion Matrix
* A Confusion matrix is an N x N matrix used for evaluating the
performance of a classification model, where N is the total number
of target classes. The matrix compares the actual target values with
those predicted by the machine learning model.
*Conclusion
* Data science is a powerful tool that transforms data into
actionable insights, driving better decision-making and efficiency.
* The data science process involves several key steps from problem
definition to deployment.
* A variety of tools and technologies support data science activities,
making it a versatile and dynamic field.
* The future of data science looks promising with advancements in
AI, big data, and automation, along with an increased focus on
ethics and interdisciplinary collaboration.

Ilovepdf Merged (2) Merged
No ratings yet
Ilovepdf Merged (2) Merged
65 pages
Cheat Sheet: Python For Data Science
No ratings yet
Cheat Sheet: Python For Data Science
4 pages
Cheat Sheet: Python For Data Science
No ratings yet
Cheat Sheet: Python For Data Science
4 pages
NumPy & Pandas
No ratings yet
NumPy & Pandas
27 pages
Numpy Basics Introduction To
No ratings yet
Numpy Basics Introduction To
35 pages
21BECE30036 Prac 1
No ratings yet
21BECE30036 Prac 1
10 pages
EXP1-siddhant gupta (23_SE_148)
No ratings yet
EXP1-siddhant gupta (23_SE_148)
17 pages
Commands SQL, Python (BASICS)
No ratings yet
Commands SQL, Python (BASICS)
7 pages
Section 7
No ratings yet
Section 7
33 pages
FDS RECORD-1-4
No ratings yet
FDS RECORD-1-4
18 pages
DV Lab2 Updated
No ratings yet
DV Lab2 Updated
12 pages
Unit 4 Numpy
No ratings yet
Unit 4 Numpy
14 pages
fods lab
No ratings yet
fods lab
36 pages
dav 2 unit
No ratings yet
dav 2 unit
55 pages
NUMPY
No ratings yet
NUMPY
16 pages
Data Science
No ratings yet
Data Science
109 pages
Enthought: Introduction To Numerical Computing With Numpy
No ratings yet
Enthought: Introduction To Numerical Computing With Numpy
39 pages
Numpy Notes
No ratings yet
Numpy Notes
7 pages
Report
No ratings yet
Report
18 pages
Learninng Plan
No ratings yet
Learninng Plan
6 pages
Data Science Python Cheat Sheet
No ratings yet
Data Science Python Cheat Sheet
25 pages
4 Introduction to Python Part 3(1)
No ratings yet
4 Introduction to Python Part 3(1)
62 pages
Numpy Handbook
No ratings yet
Numpy Handbook
16 pages
Python NumPy Cheat Sheet
No ratings yet
Python NumPy Cheat Sheet
1 page
Numpy Cheat Sheet
50% (2)
Numpy Cheat Sheet
1 page
Data Science Cheat Sheet: KEY Imports
No ratings yet
Data Science Cheat Sheet: KEY Imports
1 page
Module 6 NumPY and Pandas
No ratings yet
Module 6 NumPY and Pandas
12 pages
DSE UNIT 3
No ratings yet
DSE UNIT 3
12 pages
Python For Data Science
No ratings yet
Python For Data Science
4 pages
3rd Semester DDM AI DAA DEV Print Pages For Spiral Record 25-1-24 - Removed
No ratings yet
3rd Semester DDM AI DAA DEV Print Pages For Spiral Record 25-1-24 - Removed
28 pages
FOD Record Sem 1
No ratings yet
FOD Record Sem 1
25 pages
Fundamental - Python
No ratings yet
Fundamental - Python
3 pages
Essential Guide To Data Science For Petroleum Engineers
No ratings yet
Essential Guide To Data Science For Petroleum Engineers
150 pages
05-Unit-V Python Lecture Notes
No ratings yet
05-Unit-V Python Lecture Notes
14 pages
Numpy Basics
No ratings yet
Numpy Basics
66 pages
Python For Data Analysis
67% (3)
Python For Data Analysis
39 pages
Data Science Notes
No ratings yet
Data Science Notes
44 pages
Data Science - Unit II
100% (2)
Data Science - Unit II
173 pages
Week 4- Introduction to Python #3
No ratings yet
Week 4- Introduction to Python #3
47 pages
22mbada303 Module 4
No ratings yet
22mbada303 Module 4
32 pages
dv_lab_manual_modified
No ratings yet
dv_lab_manual_modified
31 pages
Numpy
No ratings yet
Numpy
20 pages
Numpy Cheat Sheet
No ratings yet
Numpy Cheat Sheet
1 page
Ot Lab 6
No ratings yet
Ot Lab 6
13 pages
NUPLE
No ratings yet
NUPLE
10 pages
Learning_NumPy_and_pandas
No ratings yet
Learning_NumPy_and_pandas
3 pages
LAB 2 DWM
No ratings yet
LAB 2 DWM
13 pages
CS3361-Data Science Lab Manual - B.rethina Kumar
No ratings yet
CS3361-Data Science Lab Manual - B.rethina Kumar
36 pages
FINAL FDS MANUAL print
No ratings yet
FINAL FDS MANUAL print
55 pages
NumPy and Pandas Tutorial
No ratings yet
NumPy and Pandas Tutorial
8 pages
Numpy (Numerical Python)
No ratings yet
Numpy (Numerical Python)
80 pages
Numpy Cheat Sheet
No ratings yet
Numpy Cheat Sheet
1 page
Data Analysis Tools
No ratings yet
Data Analysis Tools
26 pages
Python Numpy
No ratings yet
Python Numpy
4 pages
4 Introduction to Python Part 3 (2)
No ratings yet
4 Introduction to Python Part 3 (2)
48 pages
Numpy&pandas
No ratings yet
Numpy&pandas
17 pages
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet
150+ C Pattern Programs
From Everand
150+ C Pattern Programs
Hernando Abella
No ratings yet
Python For Beginners
From Everand
Python For Beginners
Célio Azevedo
No ratings yet
Computer Engineering Laboratory Solution Primer
From Everand
Computer Engineering Laboratory Solution Primer
Karan Bhandari
No ratings yet
PChem3 Python Tutorial5
No ratings yet
PChem3 Python Tutorial5
18 pages
Pi Camera
No ratings yet
Pi Camera
214 pages
Northeastern Graduate Resume
No ratings yet
Northeastern Graduate Resume
1 page
PPS Unit-4
No ratings yet
PPS Unit-4
120 pages
Python Programming in Control System: 191EEC501T
No ratings yet
Python Programming in Control System: 191EEC501T
14 pages
Python Practicals
No ratings yet
Python Practicals
20 pages
Python Notes by Jobhunter Team
No ratings yet
Python Notes by Jobhunter Team
255 pages
ML Assignments
No ratings yet
ML Assignments
2 pages
Python Introduction
No ratings yet
Python Introduction
29 pages
Advanced Python
No ratings yet
Advanced Python
48 pages
Brochure
No ratings yet
Brochure
14 pages
Solucion-Parcial-.. - Jupyter Notebook
No ratings yet
Solucion-Parcial-.. - Jupyter Notebook
11 pages
Two Dimensional Array in Python - Stack Over Ow
No ratings yet
Two Dimensional Array in Python - Stack Over Ow
4 pages
Kamikaze
No ratings yet
Kamikaze
16 pages
COVID - Alemanha
No ratings yet
COVID - Alemanha
5 pages
Cheat Sheet
No ratings yet
Cheat Sheet
10 pages
WS#3 Python Data Science Toolbox NAVAL
No ratings yet
WS#3 Python Data Science Toolbox NAVAL
3 pages
Informatics Practices - File As Per Cbse
No ratings yet
Informatics Practices - File As Per Cbse
29 pages
Class12 Pandas Notes
No ratings yet
Class12 Pandas Notes
23 pages
Python Program (Journal)
No ratings yet
Python Program (Journal)
67 pages
Python First Programming
No ratings yet
Python First Programming
9 pages
EDU4SDS Lecture 1 Python Intro
No ratings yet
EDU4SDS Lecture 1 Python Intro
15 pages
NumPy Cheat Sheet
No ratings yet
NumPy Cheat Sheet
1 page
NUS Python Analytics Brochure
No ratings yet
NUS Python Analytics Brochure
14 pages
PME Coding
No ratings yet
PME Coding
2 pages
Numerical Sage
No ratings yet
Numerical Sage
43 pages
CS3361 Data Science Lab Manual (II CYS)
100% (1)
CS3361 Data Science Lab Manual (II CYS)
40 pages
MDA File
No ratings yet
MDA File
37 pages
SciPy - Curve Fitting - GeeksforGeeks
No ratings yet
SciPy - Curve Fitting - GeeksforGeeks
11 pages
Download full Python for Probability, Statistics, and Machine Learning 2nd Edition José Unpingco ebook all chapters
100% (1)
Download full Python for Probability, Statistics, and Machine Learning 2nd Edition José Unpingco ebook all chapters
55 pages

DATASCIENCE_INTERNSHIP[1]

Uploaded by

DATASCIENCE_INTERNSHIP[1]

Uploaded by

*DATA SCIENCE

* What is DATA SCIENCE

* Programming Languages: Python, R

*Output :[[4 9 13]

* import pandas as pd #as keyword is used to import the pandas as pd

* Output : name marks

* Output : name marks

* data = {'A': [1, 2, 3, 5],

* Output : name1 Arun

* Matplotlib is a low level graph plotting library in python that

* import matplotlib.pyplot as plt

xpoints = np.array([0, 6])

* import matplotlib.pyplot as plt

xpoints = np.array([1, 8])

* import matplotlib.pyplot as plt

ypoints = np.array([3, 8, 1, 10])

plt.plot(ypoints, marker = 'o')

*Accuracy, Precision, Recall, F1 Score

You might also like