Analyzing the Data -vScrutinizing the Data
CODE:
import pandas as pd
import numpy as np
a=({'PERSON_ID': ['CDS2024_001', 'VCOAS2024_002', 'VCNS2024_003',
'VCAS2024_004', 'CIDS2024_005', 'DGMO2024_006',
'DG_DIA2024_007', 'CNS2024_008', 'CAS2024_009', 'CDS2024_010'],
'PERSON_NAME': ['General Arjun Singh', 'Lieutenant General Sanjay Sharma', 'Vice Admiral
Ananya Kapoor', 'Air Marshal Vikram Singh','Lieutenant General Rajesh Kumar','Major General
Siddharth Verma', 'Lieutenant General Priya Khanna','Vice Admiral Rahul Sharma', 'Air Vice
Marshal Priya Patel',
'General Vikrant Kapoor'],
'PERSONNEL_EXPENDITURE': [2000000, 1800000, 1500000,
1700000, 1600000, 1400000, 1750000, 1550000, 1650000,
1950000],
'EQUIPMENT_EXPENDITURE ': [1500000, 1300000, 1000000,
1200000, 1100000, 950000, 1250000, 1100000, 1150000, 1400000],
'OPERATIONS_EXPENDITURE ': [1000000, 900000, 800000, 950000,
850000, 700000, 920000, 850000, 900000, 1100000],
'MAINTENANCE_EXPENDITURE ': [300000, 280000, 250000,
320000, 290000, 270000, 310000, 270000, 290000, 330000],
'TRAINING_EXPENDITURE ': [200000, 180000, 160000, 190000,
170000, 150000, 200000, 175000, 185000, 220000],
'HOME_EXPENDITURE ': [500000, 450000, 400000, 480000,
420000, 380000, 500000, 420000, 440000, 500000],
'TOTAL_EXPENDITURE ': [5500000, 5230000, 5020000, 5430000,
5050000, 4500000, 5330000, 4815000, 4615000, 5500000]
})
b=[Link](a)
print("--------------ORIGINAL DATAFRAME---------------")
print(b)
print("\n-----------HEAD & TAIL-------\n")
print("\n-----------Head()-------\n")
print([Link](2))
print("\n-----------Tail()-------\n")
print([Link](2))
print("\n-----------DATA RANKING-------\n")
print("\n-----------Rank()-------\n")
print([Link]())
print("\n-----------DATA MUNGING-------\n")
print("\n-----------isnull()-------\n")
print([Link]())
print("\n-----------notnull()-------\n")
print([Link]())
print("\n-----------DATA CLEANING-------\n")
print("\n-----------fillna(bfill)-------\n")
print([Link](method='bfill'))
print("\n-----------fillna(pad)-------\n")
print([Link](method='pad'))
print("\n-----------DATA FILTERING-------\n")
print("\n-----------filter()-------\n")
x=[Link](['MAINTENANCE_EXPENDITURE','HOME_EXPENDITURE'])
print(x)
print("\n-----------DATA AGGREGATION-------\n")
print("\n-----------aggregate()-------\n")
x=[Link](['sum','min','max','count'])
print(x)
print("\n-----------DATA GROUPING-------\n")
print("\n-----------groupby()-------\n")
x=[Link](['PERSON_ID'])
print([Link]())
OUTPUT:
--------------ORIGINAL DATAFRAME---------------
PERSON_ID ... TOTAL_EXPENDITURE
0 CDS2024_001 ... 5500000
1 VCOAS2024_002 ... 5230000
2 VCNS2024_003 ... 5020000
3 VCAS2024_004 ... 5430000
4 CIDS2024_005 ... 5050000
5 DGMO2024_006 ... 4500000
6 DG_DIA2024_007 ... 5330000
7 CNS2024_008 ... 4815000
8 CAS2024_009 ... 4615000
9 CDS2024_010 ... 5500000
[10 rows x 9 columns]
-----------HEAD & TAIL-------
-----------Head()-------
PERSON_ID ... TOTAL_EXPENDITURE
0 CDS2024_001 ... 5500000
1 VCOAS2024_002 ... 5230000
[2 rows x 9 columns]
-----------Tail()-------
PERSON_ID ... TOTAL_EXPENDITURE
8 CAS2024_009 ... 4615000
9 CDS2024_010 ... 5500000
[2 rows x 9 columns]
-----------DATA RANKING-------
-----------Rank()-------
PERSON_ID PERSON_NAME ... HOME_EXPENDITURE TOTAL_EXPENDITURE
0 2.0 3.0 ... 9.0 9.5
1 10.0 7.0 ... 6.0 6.0
2 9.0 9.0 ... 2.0 4.0
3 8.0 1.0 ... 7.0 8.0
4 4.0 6.0 ... 3.5 5.0
5 6.0 8.0 ... 1.0 1.0
6 7.0 5.0 ... 9.0 7.0
7 5.0 10.0 ... 3.5 3.0
8 1.0 2.0 ... 5.0 2.0
9 3.0 4.0 ... 9.0 9.5
[10 rows x 9 columns]
-----------DATA MUNGING-------
-----------isnull()-------
PERSON_ID PERSON_NAME ... HOME_EXPENDITURE TOTAL_EXPENDITURE
0 False False ... False False
1 False False ... False False
2 False False ... False False
3 False False ... False False
4 False False ... False False
5 False False ... False False
6 False False ... False False
7 False False ... False False
8 False False ... False False
9 False False ... False False
[10 rows x 9 columns]
-----------notnull()-------
PERSON_ID PERSON_NAME ... HOME_EXPENDITURE TOTAL_EXPENDITURE
0 True True ... True True
1 True True ... True True
2 True True ... True True
3 True True ... True True
4 True True ... True True
5 True True ... True True
6 True True ... True True
7 True True ... True True
8 True True ... True True
9 True True ... True True
[10 rows x 9 columns]
-----------DATA CLEANING-------
-----------fillna(bfill)-------
PERSON_ID ... TOTAL_EXPENDITURE
0 CDS2024_001 ... 5500000
1 VCOAS2024_002 ... 5230000
2 VCNS2024_003 ... 5020000
3 VCAS2024_004 ... 5430000
4 CIDS2024_005 ... 5050000
5 DGMO2024_006 ... 4500000
6 DG_DIA2024_007 ... 5330000
7 CNS2024_008 ... 4815000
8 CAS2024_009 ... 4615000
9 CDS2024_010 ... 5500000
[10 rows x 9 columns]
-----------fillna(pad)-------
PERSON_ID ... TOTAL_EXPENDITURE
0 CDS2024_001 ... 5500000
1 VCOAS2024_002 ... 5230000
2 VCNS2024_003 ... 5020000
3 VCAS2024_004 ... 5430000
4 CIDS2024_005 ... 5050000
5 DGMO2024_006 ... 4500000
6 DG_DIA2024_007 ... 5330000
7 CNS2024_008 ... 4815000
8 CAS2024_009 ... 4615000
9 CDS2024_010 ... 5500000
[10 rows x 9 columns]
-----------DATA FILTERING-------
-----------filter()-------
Empty DataFrame
Columns: []
Index: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
-----------DATA AGGREGATION-------
-----------aggregate()-------
PERSON_ID ... TOTAL_EXPENDITURE
sum CDS2024_001VCOAS2024_002VCNS2024_003VCAS2024_0... ... 50990000
min CAS2024_009 ... 4500000
max VCOAS2024_002 ... 5500000
count 10 ... 10
[4 rows x 9 columns]
-----------DATA GROUPING-------
-----------groupby()-------
PERSON_NAME ... TOTAL_EXPENDITURE
PERSON_ID ...
CAS2024_009 Air Vice Marshal Priya Patel ... 4615000
CDS2024_001 General Arjun Singh ... 5500000
CDS2024_010 General Vikrant Kapoor ... 5500000
CIDS2024_005 Lieutenant General Rajesh Kumar ... 5050000
CNS2024_008 Vice Admiral Rahul Sharma ... 4815000
DGMO2024_006 Major General Siddharth Verma ... 4500000
DG_DIA2024_007 Lieutenant General Priya Khanna ... 5330000
VCAS2024_004 Air Marshal Vikram Singh ... 5430000
VCNS2024_003 Vice Admiral Ananya Kapoor ... 5020000
VCOAS2024_002 Lieutenant General Sanjay Sharma ... 5230000
[10 rows x 8 columns]