Rajendra Reddy Task-1
Rajendra Reddy Task-1
______________________________________________________________________ _______________________________
Question1:
import pandas as pd
df1 = pd.read_csv('Programmer.csv')
df2 = pd.read_csv('Software.csv')
df3 = pd.read_csv('Transaction.csv')
print('Dataframe1:')
df1.head()
Output:
Datafíame1:
Output:
2
Datafíame3:
~~~~~~ ~~~ ~~~~ ~~~~ ~~~ ~~~~ ~~~ ~~~~ ~~~~ ~~~ ~~~~ ~~~ ~~~~ ~~~~ ~~~ ~~~~ ~~~ ~~~~~ ~
Question3:
3
import pandas as pd
cust_df=pd.read_csv('Customers.csv')
trans_df=pd.read_csv('Transaction.csv')
merged_df=pd.merge(cust_df,trans_df,on='customer_id',how='inner')
merged_df['StartDate']=pd.to_datetime(merged_df['start_date'],format='%d-%m-%y',errors='coerce')
merged_df['EndDate']=pd.to_datetime(merged_df['end_date'],format='%d-%m-%y',errors='coerce')
merged_df['Duration']=(merged_df['EndDate']-merged_df['StartDate']).dt.days
print('Merged Dataframe With Duration:')
print(merged_df.head())
Output:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Question4:
import pandas as pd
cust_df=pd.read_csv('Customers.csv')
trans_df=pd.read_csv('Transaction.csv')
merged_df=pd.merge(cust_df,trans_df,on='customer_id',how='inner')
merged_df['StartDate']=pd.to_datetime(merged_df['start_date'],format='%d-%m-%y',errors='coerce')
merged_df['EndDate']=pd.to_datetime(merged_df['end_date'],format='%d-%m-%y',errors='coerce')
merged_df['Duration']=(merged_df['EndDate']-merged_df['StartDate']).dt.days
merged_df=merged_df.drop_duplicates()
print('Merged Dataframe With Duration:')
print(merged_df.head())
Output:
4
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Question5:
import pandas as pd
cust_df=pd.read_csv('Customers.csv')
trans_df=pd.read_csv('Transaction.csv')
merged_df=pd.merge(cust_df,trans_df,on='customer_id',how='inner')
Typ e you r t e xt
merged_df['StartDate']=pd.to_datetime(merged_df['start_date'],fo rma t='%d-%m-%y',errors='coerce')
merged_df['EndDate']=pd.to_datetime(merged_df['end_date'],format='%d-%m-%y',errors='coerce')
merged_df['Duration']=(merged_df['EndDate']-merged_df['StartDate']).dt.days
merged_df=merged_df.drop_duplicates()
merged_df=merged_df.dropna()
print('Merged Dataframe With Duration:')
print(merged_df.head())
Output:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Question6:
import pandas as pd
cust_df=pd.read_csv('Customers.csv')
5
trans_df=pd.read_csv('Transaction.csv')
merged_df=pd.merge(cust_df,trans_df,on='customer_id',how='inner')
merged_df['StartDate']=pd.to_datetime(merged_df['start_date'],format='%d-%m-%y',errors='coerce')
merged_df['EndDate']=pd.to_datetime(merged_df['end_date'],format='%d-%m-%y',errors='coerce')
merged_df['Duration']=(merged_df['EndDate']-merged_df['StartDate']).dt.days
merged_df=merged_df.drop_duplicates()
merged_df=merged_df.dropna()
avg_duration_per_cust=merged_df.groupby('customer_id')['Duration'].mean().reset_index()
avg_duration_per_cust.rename(columns={'Duration': 'Average Duration'}, inplace=True)
print('Average Duration per Customer:')
print(avg_duration_per_cust.head())
Output:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Question7:
unique_transaction_types=merged_df['txn_type'].unique()
print('Unique Transaction Types:')
for transaction_type in unique_transaction_types:
print(transaction_type)
Output:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Question8:
continent=pd.read_csv('Continent.csv')
print('Continent Dataframe:')
print(continent.head())
Output:
Type your text
import pandas as pd
cont_df=pd.read_csv('Continent.csv')
trans=merged_df.groupby(['region_id','txn_type']).size().reset_index(name='count')
print('Transaction Count per Region and Type:')
print(trans)
Output:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Question9:
savg=df2.groupby('DEVELOPIN')['SCOST'].mean()
pas=df2[df2['DEVELOPIN']=='PASCAL']
pasavg=pas['SCOST'].mean()
print('The Average Selling Cost for packages developed in PASCAL:')
print(pasavg)
Output:
7
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Question10:
df4=pd.read_csv('Studies.csv')
df4['COURSE'].unique()
Output:
df4[df4['COURSE']=='DAP']
Output:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Question11:
lcf=df2['SCOST'].min()
print('Lowest Course Fee:',lcf)
Output:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Question12:
rp=df2[df2['SCOST']>=df2['DCOST']]
print('Details of Packages for which Developmental Costs Have Been Recovered:')
print(rp)
Output:
8
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Question13:
bc=df2[df2['DEVELOPIN']=='BASIC']['DCOST'].max()
print('Cost of the costliest software deve in BASIC:')
print(bc)
Output:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Question14:
df1.head()
Output:
pr=df1[((df1['PROF1']=='Programmer')|(df1['PROF2']=='Programmer'))&(df1['SALARY']>=5000)&(df1['SALARY']<=10
000)]
PC=pr.shape[0]
PC
Output:
0
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Question15:
cp=df1[
(df1['PROF1']=='COBOL')|(df1['PROF2']=='COBOL')|
9
(df1['PROF1']=='PASCAL')|(df1['PROF2']=='PASCAL')]
cp
c=len(cp)
print("The number of programmers who knows either COBOL or PASCAL is:" )
print(c)
Output:
______________________________________________________________________ _______________________________