0% found this document useful (0 votes)

20 views36 pages

Python Programming For Data Science

This document provides an overview of Python programming for data science, focusing on the Pandas library. It covers key data structures like Series and DataFrame, hierarchical indexing, and operations for merging, joining, and concatenating data. Additionally, it discusses data input/output tools and data preparation techniques, including handling missing data, removing duplicates, and managing outliers.

Uploaded by

manjulakudari18

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views36 pages

Python Programming For Data Science

Uploaded by

manjulakudari18

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 36

Python Programming for Data Science

Module-4(continued)
1. Overview of Series and DataFrame
 Series: A one-dimensional labeled array capable of
holding any data type. It has an index that provides labels
for each element.
import pandas as pd
# Creating a Series
s = pd.Series([10, 20, 30], index=['a', 'b', 'c'])
print(s)
 DataFrame: A two-dimensional labeled data structure
with columns that can hold different data types. It is
similar to a SQL table or a spreadsheet.
# Creating a DataFrame
df = pd.DataFrame({
'A': [1, 2, 3],
'B': [4, 5, 6]
}, index=['row1', 'row2', 'row3'])
print(df)
2. Functionalities on Indexes: Hierarchical Indexing
Hierarchical indexing allows you to have multiple levels of
indexing on a DataFrame or Series, providing a way to work
with higher-dimensional data in lower-dimensional structures.
# Creating a DataFrame with hierarchical indexing
arrays = [['A', 'A', 'B', 'B'], [1, 2, 1, 2]]
index = pd.MultiIndex.from_arrays(arrays, names=('letter',
'number'))
df_hierarchical = pd.DataFrame({'value': [1, 2, 3, 4]},
index=index)
print(df_hierarchical)
3. Operations Between Data Structures: Merging, Joining,
and Concatenating
 Merging: Similar to SQL joins, you can merge
DataFrames based on a common key.
df1 = pd.DataFrame({'key': ['A', 'B', 'C'], 'value1': [1, 2, 3]})
df2 = pd.DataFrame({'key': ['B', 'A', 'D'], 'value2': [4, 5, 6]})
merged_df = pd.merge(df1, df2, on='key', how='inner')
print(merged_df)
 Joining: Used to combine two DataFrames based on their
indexes.
df1 = pd.DataFrame({'value1': [1, 2]}, index=['A', 'B'])
df2 = pd.DataFrame({'value2': [3, 4]}, index=['B', 'C'])

joined_df = df1.join(df2, how='outer')

print(joined_df)
 Concatenating: Concatenate along a particular axis.
df1 = pd.DataFrame({'A': [1, 2]})
df2 = pd.DataFrame({'A': [3, 4]})
concatenated_df = pd.concat([df1, df2], axis=0)
print(concatenated_df)
Pandas Data Structures: Series and DataFrame
1. Series:
A Series is a one-dimensional labeled array that can hold any
data type (integer, float, string, Python objects, etc.). It is
similar to a column in a spreadsheet or a database.
 Key Features:
o Has a label-based index to access data.
o Homogeneous data (all elements in a Series are of the
same data type).
o Can be created from lists, dictionaries, numpy arrays,
or scalar values.
 Creation Examples:
import pandas as pd
# From a list
s = pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])
# From a dictionary
s = pd.Series({'a': 10, 'b': 20, 'c': 30})
# From a scalar
s = pd.Series(5, index=['a', 'b', 'c'])

2. DataFrame:
A DataFrame is a two-dimensional labeled data structure with
rows and columns, similar to a spreadsheet or SQL table.
 Key Features:
o Heterogeneous data: Columns can contain different
data types.
o Row and column indexes for easy data manipulation.
o Supports various file formats for input and output,
e.g., CSV, Excel, JSON.
 Creation Examples:
# From a dictionary of lists
data = {'Name': ['Alice', 'Bob', 'Charlie'], 'Age': [25, 30, 35],
'City': ['NY', 'LA', 'SF']}
df = pd.DataFrame(data)
# From a list of dictionaries
data = [{'a': 1, 'b': 2}, {'a': 3, 'b': 4}]
df = pd.DataFrame(data)

Functionalities on Indexes
Pandas provides extensive functionalities for handling and
manipulating indexes in both Series and DataFrame:
1. Indexing Basics
 Explicit Indexing: Using labels.
s['a'] # Accessing value by label
df.loc[0] # Accessing row by label
 Implicit Indexing: Using integer positions.
s[0] # Accessing value by position
df.iloc[0] # Accessing row by position
2. Setting and Resetting Index
 Set a new column as the index:
df.set_index('Name', inplace=True)
 Reset the index to default:
df.reset_index(inplace=True)
3. Multi-Indexing (Hierarchical Indexing)
 Create a MultiIndex:
arrays = [['A', 'A', 'B', 'B'], [1, 2, 1, 2]]
index = pd.MultiIndex.from_arrays(arrays, names=('Group',
'Subgroup'))
df = pd.DataFrame({'Data': [10, 20, 30, 40]}, index=index)
 Access specific levels:
df.loc['A'] # Access all subgroups under 'A'
4. Index Alignment
 Index alignment occurs automatically in operations:
s1 = pd.Series([1, 2], index=['a', 'b'])
s2 = pd.Series([3, 4], index=['b', 'c'])
result = s1 + s2 # Automatic alignment
5. Modifying Indexes
 Rename indexes:
df.rename(index={0: 'zero', 1: 'one'}, inplace=True)
df.reindex([0, 2, 4], fill_value=0)
6. Sorting Index
 Sort by index values:
df.sort_index(axis=0, ascending=True)
7. Handling Duplicate Indexes
 Check for duplicates:
df.index.is_unique
 Drop duplicates:
df = df.loc[~df.index.duplicated(keep='first')]

4. Function Application and Mapping: Applying Functions

for Data Transformation
You can apply functions to DataFrames and Series easily using
apply() and map().
 Using apply(): To apply a function along an axis of the
DataFrame.
df = pd.DataFrame({'A': [1, 2, 3]})
df['B'] = df['A'].apply(lambda x: x ** 2)
print(df)
 Using map(): For transforming values of a Series.
s = pd.Series([1, 2, 3])
s_mapped = s.map(lambda x: x * 2)
print(s_mapped)
Exploring Hierarchical Indexing
Hierarchical indexing (also called multi-level indexing) in
pandas allows you to work with data that has multiple
levels of row or column labels. This is especially useful
when dealing with higher-dimensional data in a tabular
format.
Key Features:
1. Creation: Use MultiIndex objects to create hierarchical
indexing.
2. Selection: Access subsets of data using .loc[], .iloc[], or
slicing.
3. Reorganization: Change the structure with methods like
stack() and unstack().
Example:
import pandas as pd
import numpy as np
# Creating a DataFrame with a MultiIndex
index = pd.MultiIndex.from_tuples(
[('A', 1), ('A', 2), ('B', 1), ('B', 2)],
names=['Letter', 'Number']
)
data = pd.DataFrame({'Value': [10, 20, 30, 40]},
index=index)
print(data)
# Accessing specific data
print(data.loc['A']) # All rows with 'A' as Letter
print(data.loc[('A', 1)]) # Specific entry

Operations Between Data Structures

Pandas provides versatile tools for merging, joining, and
concatenating data structures. These operations are
essential for combining datasets.
1. Merging: Combines dataframes based on keys or indices.
o pd.merge(): Performs SQL-like joins.
2. Joining: Merges datasets on their indices.
o DataFrame.join(): Convenient for joining on the
index.
3. Concatenation: Stacks datasets along a specific axis.
o pd.concat(): Combines datasets along rows or
columns.
4. Combining: Aligns datasets and performs element-wise
operations.
o combine_first(): Fills missing data with another
dataset.
Example:
# Merging
df1 = pd.DataFrame({'key': ['A', 'B', 'C'], 'value1': [1, 2,
3]})
df2 = pd.DataFrame({'key': ['A', 'B', 'D'], 'value2': [4, 5,
6]})
merged = pd.merge(df1, df2, on='key', how='outer')

# Concatenation
concat_df = pd.concat([df1, df2], axis=0,
ignore_index=True)
print("Merged DataFrame:")
print(merged)
print("\nConcatenated DataFrame:")
print(concat_df)

Function Application and Mapping

Functions can be applied across pandas objects to
transform or manipulate data efficiently. These include
element-wise transformations, row/column-wise
operations, or applying user-defined functions.
1. Element-wise:
o applymap(): Applies a function element-wise to
DataFrame.
o map(): Applies a function to Series elements.
2. Row/Column-wise:
o apply(): Applies a function along a specified axis.
Example:
# Applying functions
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})

# Element-wise
squared = df.applymap(lambda x: x ** 2)

# Row-wise
row_sum = df.apply(lambda row: row.sum(), axis=1)

print("Squared DataFrame:")
print(squared)
print("\nRow Sums:")
print(row_sum)
Module-5
Introduction to Pandas I/O Tools
Pandas provides powerful tools for data input and output
(I/O) to read from and write to various file formats. These
tools allow seamless interaction with diverse data storage
formats, facilitating robust data analysis.

Reading CSV and Textual Files

Key Functions:
 pd.read_csv(): Reads CSV files.
 pd.read_table(): Reads tabular data from delimited text
files.
 DataFrame.to_csv(): Writes a DataFrame to a CSV file.
Example:
import pandas as pd
# Reading a CSV file
df = pd.read_csv('data.csv')

# Writing to a CSV file

df.to_csv('output.csv', index=False)
Execute by creating CSV files in the working directory
Reading/Writing HTML Files
Key Functions:
 pd.read_html(): Parses tables from HTML documents.
 DataFrame.to_html(): Exports a DataFrame to an HTML
table.
Example:
# Reading tables from an HTML file
html_tables = pd.read_html('https://example.com/table-
page.html')
# Writing a DataFrame to an HTML file
df.to_html('output.html')

Reading Data from XML Files

Key Functions:
 pd.read_xml(): Reads data from XML files.
 DataFrame.to_xml(): Exports a DataFrame to an XML
file.
Example:
# Reading data from an XML file
xml_data = pd.read_xml('data.xml')

# Writing a DataFrame to an XML file

df.to_xml('output.xml')

Reading Data from Excel Files

Key Functions:
 pd.read_excel(): Reads Excel files (.xls, .xlsx).
 DataFrame.to_excel(): Writes a DataFrame to an Excel
file.
Example:
# Reading an Excel file
df = pd.read_excel('data.xlsx', sheet_name='Sheet1')

# Writing to an Excel file

df.to_excel('output.xlsx', index=False)

Reading JSON Data

Key Functions:
 pd.read_json(): Reads JSON data.
 DataFrame.to_json(): Exports a DataFrame to JSON
format.
Example:
# Reading JSON data
df = pd.read_json('data.json')

# Writing a DataFrame to a JSON file

df.to_json('output.json', orient='records')

Pickle Serialization
Pickle is a binary format for serializing and de-serializing
Python objects, allowing you to save and load Pandas
DataFrames efficiently.
Key Functions:
 pd.to_pickle(): Saves a DataFrame to a pickle file.
 pd.read_pickle(): Loads a DataFrame from a pickle file.
Example:
# Writing to a pickle file
df.to_pickle('data.pkl')

# Reading from a pickle file

df = pd.read_pickle('data.pkl')

Summary Table of I/O Functions

File Reading
Writing Function
Format Function
CSV/Text pd.read_csv() DataFrame.to_csv()
HTML pd.read_html() DataFrame.to_html()
XML pd.read_xml() DataFrame.to_xml()
Excel pd.read_excel() DataFrame.to_excel()
JSON pd.read_json() DataFrame.to_json()
Pickle pd.read_pickle() DataFrame.to_pickle()
Pandas Data Manipulation: Data Preparation
Data preparation is a critical step in the data analysis
workflow. It involves cleaning, transforming, and
preprocessing the data to ensure it is ready for analysis.
Pandas offers a comprehensive set of tools to handle
common data preparation tasks.

1. Handling Missing Data

Techniques:
 Detect missing data with isnull() and notnull().
 Remove missing data with dropna().
 Fill missing values with fillna() or interpolate().
Examples:
import pandas as pd
import numpy as np

# Example DataFrame with missing values

df = pd.DataFrame({
'A': [1, 2, np.nan, 4],
'B': [np.nan, 2, 3, 4],
'C': ['a', 'b', 'c', np.nan]
})

# Detect missing values

print(df.isnull())

# Drop rows with missing values

cleaned_df = df.dropna()

# Fill missing values

filled_df = df.fillna({'A': 0, 'B': df['B'].mean(), 'C':
'unknown'})
print(filled_df)

2. Removing Duplicates
Duplicates can distort analyses and should be removed or
handled appropriately.
Key Functions:
 duplicated(): Identifies duplicate rows.
 drop_duplicates(): Removes duplicate rows.
Example:
# Example DataFrame with duplicates
df = pd.DataFrame({'A': [1, 1, 2, 2], 'B': [3, 3, 4, 5]})

# Detect duplicates
print(df.duplicated())
# Drop duplicate rows
unique_df = df.drop_duplicates()
print(unique_df)

3. Data Transformation
Scaling and Normalization:
 Normalize values to a range [0, 1] or standardize them.
String Operations:
 Clean or preprocess textual data using .str methods.
Example:
# Scaling numeric data
df = pd.DataFrame({'A': [10, 20, 30], 'B': [100, 200,
300]})
df['A_scaled'] = (df['A'] - df['A'].min()) / (df['A'].max() -
df['A'].min())

# String manipulation
text_df = pd.DataFrame({'C': [' hello ', 'WORLD!',
'pandas ']})
text_df['C_cleaned'] = text_df['C'].str.strip().str.lower()
print(text_df)

4. Handling Outliers
Outliers can skew results and may need removal or
transformation.
Techniques:
 Identify outliers using statistical methods (e.g., IQR or z-
scores).
 Replace or cap extreme values.
Example:
# Example DataFrame with outliers
df = pd.DataFrame({'A': [1, 2, 3, 1000]})

# Replace outliers with median

q1 = df['A'].quantile(0.25)
q3 = df['A'].quantile(0.75)
iqr = q3 - q1
outlier_threshold = 1.5 * iqr
df['A'] = df['A'].mask((df['A'] < (q1 - outlier_threshold)) |
(df['A'] > (q3 + outlier_threshold)), df['A'].median())
print(df)

5. Feature Encoding
Key Methods:
 Convert categorical variables using one-hot encoding
(pd.get_dummies()).
 Map or replace values (map(), replace()).
Example:
# One-hot encoding
df = pd.DataFrame({'Category': ['A', 'B', 'C']})
encoded_df = pd.get_dummies(df, columns=['Category'])

# Mapping values
df['Category_mapped'] = df['Category'].map({'A': 1, 'B':
2, 'C': 3})
print(df)

6. Data Integration
Combine or reshape datasets for analysis.
Techniques:
 Merge datasets with pd.merge().
 Reshape data using melt() or pivot().
Example:
python
Copy code
# Reshaping data with melt
df = pd.DataFrame({'ID': [1, 2], 'Math': [90, 80], 'Science':
[85, 95]})
melted = pd.melt(df, id_vars='ID', var_name='Subject',
value_name='Score')
print(melted)
7. Data Filtering and Subsetting
Extract meaningful subsets of data.
Key Methods:
 Use boolean indexing for conditional filtering.
 Use .query() for concise filtering.
Example:
# Conditional filtering
df = pd.DataFrame({'A': [1, 2, 3, 4], 'B': [10, 20, 30, 40]})
filtered_df = df[df['A'] > 2]

# Using query
filtered_df_query = df.query('A > 2')
print(filtered_df_query)
Concatenating Data: Combining Datasets
Concatenation is the process of combining datasets along
rows or columns. Pandas provides the pd.concat()
function for this purpose.

1. Concatenating Along Rows

Combines datasets vertically, stacking rows.
Example:
import pandas as pd
# Example DataFrames
df1 = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})
df2 = pd.DataFrame({'A': [5, 6], 'B': [7, 8]})

# Concatenating along rows

result = pd.concat([df1, df2], axis=0, ignore_index=True)
print(result)

2. Concatenating Along Columns

Combines datasets horizontally, aligning by index.
Example:
# Example DataFrames
df1 = pd.DataFrame({'A': [1, 2]})
df2 = pd.DataFrame({'B': [3, 4]})

# Concatenating along columns

result = pd.concat([df1, df2], axis=1)
print(result)

3. Concatenation with Keys

Adds hierarchical indexing to differentiate datasets.
Example:
result = pd.concat([df1, df2], keys=['First', 'Second'])
print(result)

4. Concatenation with Different Indices

Handles missing values by default, or fills them using the
join parameter.
Example:
df1 = pd.DataFrame({'A': [1, 2]}, index=[0, 1])
df2 = pd.DataFrame({'B': [3, 4]}, index=[1, 2])

# Concatenate with outer join

result = pd.concat([df1, df2], axis=1, join='outer')
print(result)

Data Transformation: Sorting, Filtering, and

Replacing Values
1. Sorting Data
Sorting allows you to reorder rows or columns based on
values or indices.
Key Functions:
 sort_values(): Sorts by column values.
 sort_index(): Sorts by index.
Example:
# Sorting by column values
df = pd.DataFrame({'A': [3, 1, 2], 'B': [6, 4, 5]})
sorted_df = df.sort_values(by='A')

# Sorting by index
sorted_index_df = df.sort_index()
print(sorted_df)
print(sorted_index_df)

2. Filtering Data
Filters rows based on conditions.
Techniques:
 Boolean indexing.
 .query() for concise filtering.
Example:
# Filtering rows
filtered_df = df[df['A'] > 1]

# Using query
filtered_query_df = df.query('A > 1')
print(filtered_query_df)

3. Replacing Values
Replace values to standardize or clean data.
Key Functions:
 replace(): Replace specific values.
 fillna(): Replace missing values.
Example:
# Replacing specific values
df = pd.DataFrame({'A': [1, 2, -999], 'B': [3, -999, 5]})
cleaned_df = df.replace(-999, np.nan)

# Replacing multiple values

multi_replaced_df = df.replace({-999: np.nan, 1: 10})
print(multi_replaced_df)

4. Applying Custom Transformations

Apply custom transformations row-wise, column-wise, or
element-wise using .apply() or .applymap().
Example:
# Applying a function to a column
df['A_transformed'] = df['A'].apply(lambda x: x * 2)

# Applying a function to all elements

df_transformed = df.applymap(lambda x: x * 2 if
pd.notnull(x) else x)
print(df_transformed)
Summary Table
Operation Function Description
Combines
datasets along
Concatenation pd.concat()
rows or
columns.
Sorts rows
Sorting sort_values() based on
column values.
Sorts rows or
Sorting by
sort_index() columns by
Index
index.
Filters rows
Boolean
Filtering based on
indexing
conditions.
Replaces
Replacing
replace() specific values
Values
with new ones.
Replaces
Filling Missing missing data
fillna()
Values with specified
values.
Discretization and Binning: Grouping Continuous
Data
Discretization involves converting continuous data into
discrete intervals, or "bins." This is useful for simplifying
data or preparing it for certain types of analysis (e.g.,
histograms, categorical modeling).

1. Creating Bins
Use pd.cut() or pd.qcut() to create bins:
 pd.cut(): Bins data into equal-width intervals.
 pd.qcut(): Bins data into equal-frequency intervals.
Example:
import pandas as pd
import numpy as np

# Example DataFrame
data = pd.DataFrame({'Value': [0.5, 1.5, 2.5, 3.5, 4.5]})

# Equal-width binning
data['Equal_Width_Bin'] = pd.cut(data['Value'], bins=3,
labels=['Low', 'Medium', 'High'])

# Equal-frequency binning
data['Equal_Frequency_Bin'] = pd.qcut(data['Value'],
q=3, labels=['Low', 'Medium', 'High'])
print(data)

2. Custom Bins
You can define custom bin edges and labels.
Example:
# Custom binning
bin_edges = [0, 2, 4, 5]
bin_labels = ['Low', 'Medium', 'High']
data['Custom_Bin'] = pd.cut(data['Value'],
bins=bin_edges, labels=bin_labels)
print(data)

Permutation: Reordering Data

Permutation involves shuffling or reordering data. This is
particularly useful for testing and sampling.

1. Randomly Permuting Rows

Use sample() to randomly shuffle rows.
Example:
# Example DataFrame
df = pd.DataFrame({'A': [1, 2, 3, 4], 'B': [5, 6, 7, 8]})

# Shuffle rows
shuffled_df = df.sample(frac=1,
random_state=42).reset_index(drop=True)
print(shuffled_df)

2. Permuting Columns
You can reorder columns using np.random.permutation().
Example:
python
Copy code
# Shuffle columns
shuffled_columns =
df[np.random.permutation(df.columns)]
print(shuffled_columns)

String Manipulation: Text Data Operations

Pandas provides a variety of tools to manipulate text data
efficiently, accessible through the .str accessor.

1. Cleaning Strings
 .strip(): Removes leading/trailing whitespace.
 .lower(), .upper(): Changes case.
 .replace(): Replaces substrings.
Example:
# Example DataFrame
text_data = pd.DataFrame({'Text': [' Hello ', 'World!', '
pandas ']})

# Clean and standardize text

text_data['Cleaned_Text'] =
text_data['Text'].str.strip().str.lower()
print(text_data)

2. Splitting and Joining

 .split(): Splits strings into lists.
 .join(): Joins lists into strings.
Example:
# Splitting strings
text_data['Split_Text'] =
text_data['Cleaned_Text'].str.split()

# Joining strings
text_data['Joined_Text'] =
text_data['Split_Text'].apply(lambda x: '-'.join(x))
print(text_data)

3. Finding and Extracting Patterns

 .contains(): Checks if a pattern exists.
 .extract(): Extracts matching substrings.
Example:
# Finding patterns
text_data['Has_World'] =
text_data['Cleaned_Text'].str.contains('world')

# Extracting patterns
text_data['First_Word'] =
text_data['Cleaned_Text'].str.extract(r'(\w+)')
print(text_data)

4. Replacing Patterns
 .replace(): Replaces substrings using regex.
Example:
# Replace patterns
text_data['Replaced_Text'] =
text_data['Cleaned_Text'].str.replace('world', 'Earth',
regex=True)
print(text_data)

Summary Table
Operation Function Description

Discretization pd.cut(), pd.qcut() Group

continuous
Operation Function Description
data into
bins.
Shuffle
Random Row
sample(frac=1) rows of a
Permutation
DataFrame.
Shuffle
Column
np.random.permutation(columns) column
Permutation
order.
Clean and
Cleaning Strings .str.strip(), .str.lower() standardize
text data.
Split strings
Splitting/Joining
.str.split(), .join() or join lists
Text
into strings.
Find or
extract
Pattern
.str.contains(), .str.extract() substrings
Matching
matching a
pattern.
Replace
Replacing substrings
.str.replace()
Patterns or patterns
in text data.
Data Aggregation: Aggregating Data
Data aggregation involves computing summary statistics
or applying functions to grouped data. Pandas provides
robust methods to aggregate data efficiently.

1. Aggregation with groupby

The groupby method allows grouping data by one or more
columns and applying aggregation functions.
Example:
import pandas as pd

# Example DataFrame
data = pd.DataFrame({
'Category': ['A', 'A', 'B', 'B', 'C'],
'Values': [10, 20, 30, 40, 50]
})

# Group by 'Category' and calculate sum

grouped = data.groupby('Category').agg({'Values': 'sum'})
print(grouped)

2. Built-in Aggregation Functions

Common aggregation functions include:
 sum(), mean(), max(), min(), count(), std()
Example:
# Multiple aggregations
grouped = data.groupby('Category').agg({'Values': ['sum',
'mean']})
print(grouped)

3. Custom Aggregations
You can use custom functions with agg().
Example:
# Custom aggregation: Calculate range (max - min)
custom_agg = data.groupby('Category').agg({'Values':
lambda x: x.max() - x.min()})
print(custom_agg)

4. Aggregation with Multiple Columns

You can aggregate different columns with different
functions.
Example:
data = pd.DataFrame({
'Category': ['A', 'A', 'B', 'B', 'C'],
'Values1': [10, 20, 30, 40, 50],
'Values2': [5, 10, 15, 20, 25]
})

# Apply different aggregations

grouped = data.groupby('Category').agg({
'Values1': 'sum',
'Values2': 'mean'
})
print(grouped)

Group Iteration: Iterating Over Grouped Data

Iterating over grouped data allows you to process or
analyze each group independently.

1. Iterating Through Groups

Use groupby to create groups and iterate using a loop.
Example:
# Example DataFrame
data = pd.DataFrame({
'Category': ['A', 'A', 'B', 'B', 'C'],
'Values': [10, 20, 30, 40, 50]
})

# Iterating through groups

for name, group in data.groupby('Category'):
print(f"Group: {name}")
print(group)
2. Applying Functions to Groups
Apply functions directly to groups using .apply().
Example:
python
Copy code
# Function to find range within groups
def range_func(group):
return group['Values'].max() - group['Values'].min()

group_ranges =
data.groupby('Category').apply(range_func)
print(group_ranges)

3. Accessing Specific Groups

Retrieve a specific group using get_group().
Example:
# Get specific group
group_b = data.groupby('Category').get_group('B')
print(group_b)

Summary Table
Operation Function Description
Groups data by
Grouping Data groupby() one or more
keys.
sum(),
Aggregates data
Built-in mean(),
using common
Aggregations max(),
functions.
count()
agg() with a Apply custom
Custom
custom operations to
Aggregations
function grouped data.
Iterating for name, Iterates over
Through group in each group in a
Groups groupby() groupby object.
Accessing Access a
Specific get_group() specific group
Groups by name.
Applying Applies a
Functions to apply() function to each
Groups group.

Pandas Worksheets ALL
100% (1)
Pandas Worksheets ALL
8 pages
Data Wrangling and Analysis
100% (1)
Data Wrangling and Analysis
36 pages
Pandas Dataframe Export The CSV File
No ratings yet
Pandas Dataframe Export The CSV File
9 pages
Pandas
No ratings yet
Pandas
94 pages
Python Data Frame New
No ratings yet
Python Data Frame New
32 pages
Data Wrangling With Python and Pandas
No ratings yet
Data Wrangling With Python and Pandas
7 pages
On Data Handling Using Pandas-I
100% (2)
On Data Handling Using Pandas-I
64 pages
Acer Altos G310 Service Manual
No ratings yet
Acer Altos G310 Service Manual
122 pages
On Data Handling Using Pandas-I
100% (2)
On Data Handling Using Pandas-I
63 pages
SUN2000 - (50KTL-M060KTL-M065KTL-M070KTL-INM070KTL-C175KTL-C1) MODBUS Interface Definitions
No ratings yet
SUN2000 - (50KTL-M060KTL-M065KTL-M070KTL-INM070KTL-C175KTL-C1) MODBUS Interface Definitions
32 pages
Class 12 Practical File
No ratings yet
Class 12 Practical File
29 pages
Class XII IP Key Points (Python Pandas)
No ratings yet
Class XII IP Key Points (Python Pandas)
5 pages
Introduction To Pandas in Data Analytics
No ratings yet
Introduction To Pandas in Data Analytics
12 pages
Notes On PLC and Industrial Networks
No ratings yet
Notes On PLC and Industrial Networks
13 pages
Install and Configure V-Sphere 6.7
No ratings yet
Install and Configure V-Sphere 6.7
4 pages
01 - SAP Analytics - US-USI-MX Knowledge Series - Modeling With CDS Views Day 1
No ratings yet
01 - SAP Analytics - US-USI-MX Knowledge Series - Modeling With CDS Views Day 1
60 pages
Pandas
No ratings yet
Pandas
29 pages
FDS Module 2 Notes
No ratings yet
FDS Module 2 Notes
24 pages
Barun Kumar Singh - PCC CS-301
No ratings yet
Barun Kumar Singh - PCC CS-301
33 pages
How To Upgrade To Moodle 3.5
No ratings yet
How To Upgrade To Moodle 3.5
8 pages
Unit 4
No ratings yet
Unit 4
36 pages
Pandas
No ratings yet
Pandas
5 pages
Loki Temp PPT Pandas 2
No ratings yet
Loki Temp PPT Pandas 2
31 pages
Pandas
No ratings yet
Pandas
4 pages
Pandas
No ratings yet
Pandas
12 pages
All Document Reader 1715619870900
No ratings yet
All Document Reader 1715619870900
6 pages
Fundamentals of Compression: Prepared By: Haval Akrawi
No ratings yet
Fundamentals of Compression: Prepared By: Haval Akrawi
21 pages
Pandas
No ratings yet
Pandas
9 pages
Pandas in Py: A Detailed Overview Into Series and Dataframe Functions in Pandas
No ratings yet
Pandas in Py: A Detailed Overview Into Series and Dataframe Functions in Pandas
21 pages
IV Unit Fds
No ratings yet
IV Unit Fds
16 pages
Zones Creatoin Under Vcs
No ratings yet
Zones Creatoin Under Vcs
22 pages
The Memory Hierarchy: CS 105 Tour of The Black Holes of Computing
No ratings yet
The Memory Hierarchy: CS 105 Tour of The Black Holes of Computing
39 pages
Introduction To Pandas For Data Analysis
No ratings yet
Introduction To Pandas For Data Analysis
6 pages
05getting Started With Pandas
No ratings yet
05getting Started With Pandas
44 pages
Pandas - Digitalocean
No ratings yet
Pandas - Digitalocean
15 pages
Ii Unit Pandas
No ratings yet
Ii Unit Pandas
30 pages
Python Pandas Tutorial For Beginners
No ratings yet
Python Pandas Tutorial For Beginners
203 pages
Mypnotes
No ratings yet
Mypnotes
3 pages
2 Rupesha Patel Oracle DBA 1+year
No ratings yet
2 Rupesha Patel Oracle DBA 1+year
2 pages
DA - 2. Pandas
No ratings yet
DA - 2. Pandas
79 pages
Android SQLite Database Tutorial
No ratings yet
Android SQLite Database Tutorial
6 pages
Pandas For Data Science
No ratings yet
Pandas For Data Science
42 pages
MS Erref
No ratings yet
MS Erref
533 pages
The Pandas Library
No ratings yet
The Pandas Library
39 pages
Asus A7S Rev1.0 Schematic
No ratings yet
Asus A7S Rev1.0 Schematic
105 pages
Alarm - Shortlist Categorization
No ratings yet
Alarm - Shortlist Categorization
35 pages
Phan1 Pandas Numpy Matplotlib
No ratings yet
Phan1 Pandas Numpy Matplotlib
158 pages
Grouping Data With Arrays and Clusters
No ratings yet
Grouping Data With Arrays and Clusters
7 pages
Pandas
No ratings yet
Pandas
13 pages
Unit 4 DSE
No ratings yet
Unit 4 DSE
9 pages
Answers Practicas
No ratings yet
Answers Practicas
19 pages
Foundation of Network Management
No ratings yet
Foundation of Network Management
25 pages
Peer-To-Peer (P2P) Databases: Chengxiang Zhai
No ratings yet
Peer-To-Peer (P2P) Databases: Chengxiang Zhai
90 pages
Data Wrangling With Python Lab Manual
No ratings yet
Data Wrangling With Python Lab Manual
29 pages
JOINS
No ratings yet
JOINS
10 pages
GPB01 / CPU - 312 (CPU 312) / Program Blocks: Main (OB1)
No ratings yet
GPB01 / CPU - 312 (CPU 312) / Program Blocks: Main (OB1)
1 page
Pandas
No ratings yet
Pandas
25 pages
Python Unit 3 4
No ratings yet
Python Unit 3 4
92 pages
PHK Malloc Paper
No ratings yet
PHK Malloc Paper
6 pages
Xii Cs Preboard1 Set B
No ratings yet
Xii Cs Preboard1 Set B
8 pages
Python 2.1.3
No ratings yet
Python 2.1.3
6 pages
Chapter 29
No ratings yet
Chapter 29
30 pages
Pandas Notes
No ratings yet
Pandas Notes
6 pages
Computer College - Calamba Campus: Basics of Information Systems Information Concepts
No ratings yet
Computer College - Calamba Campus: Basics of Information Systems Information Concepts
18 pages
What Is Pandas
No ratings yet
What Is Pandas
9 pages
Siclimat X
No ratings yet
Siclimat X
4 pages
Removing Duplicate Rows From Table in Oracle: 11 Answers
No ratings yet
Removing Duplicate Rows From Table in Oracle: 11 Answers
4 pages
Python 2.1.2
No ratings yet
Python 2.1.2
7 pages
DevOps Session 3 Pandas
No ratings yet
DevOps Session 3 Pandas
33 pages
Vinod Kumar Kamani: Employment Information
No ratings yet
Vinod Kumar Kamani: Employment Information
3 pages
ML Unit-2 Notes
No ratings yet
ML Unit-2 Notes
17 pages
Module 4
No ratings yet
Module 4
38 pages
Pandas Tutorial
No ratings yet
Pandas Tutorial
9 pages
Class Xii Information Practices PPT On Data Handling Using Pandas-I
No ratings yet
Class Xii Information Practices PPT On Data Handling Using Pandas-I
64 pages
Python Unit Iv - Pandas
No ratings yet
Python Unit Iv - Pandas
36 pages
Unit 4 Notes - Io Interface
No ratings yet
Unit 4 Notes - Io Interface
6 pages
Data Warehousing and Data Mining Original Notes
No ratings yet
Data Warehousing and Data Mining Original Notes
47 pages
Pandas Tutorial
No ratings yet
Pandas Tutorial
7 pages
04-Data Manipulation With Pandas
No ratings yet
04-Data Manipulation With Pandas
28 pages
Pandas
No ratings yet
Pandas
26 pages
04 Getting Started With Pandas
No ratings yet
04 Getting Started With Pandas
85 pages
UNIT II Notes
No ratings yet
UNIT II Notes
23 pages
Pandas
No ratings yet
Pandas
13 pages
07 Data Wrangling
No ratings yet
07 Data Wrangling
51 pages
DAP 3 Module
No ratings yet
DAP 3 Module
62 pages
Data Handling Module
No ratings yet
Data Handling Module
10 pages
Pandas
No ratings yet
Pandas
7 pages
Lecture 5
No ratings yet
Lecture 5
36 pages
Data Handling Using Pandas-1
No ratings yet
Data Handling Using Pandas-1
60 pages
The Essential R Reference
From Everand
The Essential R Reference
Mark Gardener
No ratings yet

Python Programming For Data Science

Uploaded by

Python Programming For Data Science

Uploaded by

Python Programming for Data Science

joined_df = df1.join(df2, how='outer')

4. Function Application and Mapping: Applying Functions

Operations Between Data Structures

Function Application and Mapping

Reading CSV and Textual Files

# Writing to a CSV file

Reading Data from XML Files

# Writing a DataFrame to an XML file

Reading Data from Excel Files

# Writing to an Excel file

Reading JSON Data

# Writing a DataFrame to a JSON file

# Reading from a pickle file

Summary Table of I/O Functions

1. Handling Missing Data

# Example DataFrame with missing values

# Detect missing values

# Drop rows with missing values

# Fill missing values

# Replace outliers with median

1. Concatenating Along Rows

# Concatenating along rows

2. Concatenating Along Columns

# Concatenating along columns

3. Concatenation with Keys

4. Concatenation with Different Indices

# Concatenate with outer join

Data Transformation: Sorting, Filtering, and

# Replacing multiple values

4. Applying Custom Transformations

# Applying a function to all elements

Permutation: Reordering Data

1. Randomly Permuting Rows

String Manipulation: Text Data Operations

# Clean and standardize text

2. Splitting and Joining

3. Finding and Extracting Patterns

Discretization pd.cut(), pd.qcut() Group

1. Aggregation with groupby

# Group by 'Category' and calculate sum

2. Built-in Aggregation Functions

4. Aggregation with Multiple Columns

# Apply different aggregations

Group Iteration: Iterating Over Grouped Data

1. Iterating Through Groups

# Iterating through groups

3. Accessing Specific Groups

You might also like