Neel
Neel
1
USE CASE - 1 (NUMPY)
Tasks To Perform
import numpy as np
import random
movie_id=np.arange(1301,2301) #generating 1000 movie ids
print(movie_id.shape)
print(movie_id[0:5])
print(movie_id[500:506])
2
3. We have 10 movie experts, lets us take their review too, also 50 new moives have to be added to the
matrix along with their reviews.
print(movie_matrix.shape)
col=5 #cols are movie id in matrix
m=movie_matrix[:,col]
print(m)
m=m[m>=0]
print(m)
print(len(m))
print(m.shape[0]) # same as previous line #num of rating
print(round(m.mean(),2)) #mean
print(round(m.std(),2)) # standard deviation
3
4. Create a Final moive rating matrix with 4 columns, i.e ‘Movie ID’, ‘Avg-Rating’, ‘Number Of
Ratings’, ‘Standard Deviation Of Ratings’.
movie_id=np.arange(1301,2351)
movie_stats=[]
for col in range(1050):
m=movie_matrix[:,col]
m=m[m>=0]
movie_stats.append([movie_id[col], round(m.mean(),2), m.size ,round(m.std(),2)])
movie_stats=np.array(movie_stats)
print(movie_stats.shape)
print(movie_stats[:5,:])
5. Also Counvert the final movie ratings to have range from 0 to 10, such that the minimum ratings
convert into 0 and maximum to 10 and other values in between.
startOfNewRange=0
for i in range(1050):
movie_rating=movie_stats[i,1]
print('Movie',i,'old rating',movie_rating)
print('Distance from minimum',movie_rating-x.min())
print('ratio of range',originalRatingRange/avgRatingRange)
newRating=(movie_rating-x.min())*(originalRatingRange/avgRatingRange)+startOfNewRange
print('Movie',i,'new rating :',newRating)
4
6. Display the films rating wise, Highest to Lowest.
movie_ratings=np.array(movie_stats)
movie_ratings=np.sort(movie_ratings,axis=0)
movie_ratings=movie_ratings[::-1]
print(movie_ratings)
Tasks to Perform.
!wget https://www.dropbox.com/s/onl5ac2ea3v11aw/names.txt
5
import pandas as pd
import random as r
import numpy as np
f=open('names.txt')
allnames= f.read()
f.close()
print(allnames)
names = allnames.split('\n')
print(type(names))
print(len(names))
print(names[100:110])
removedName=names.pop()
print(len(names))
6
1. We have hired 1000 new employees. Here are the names in a text file, each name separated using a
new line: Can you generate the following
● Employer ID: Starting from 2929999 Example 2029001, 2039002 and so on
● Email ID: [email protected]
● Password: It should be an alphanumeric value, Must be having capital letters,small letters, and
special symbols
def emailgen(name):
namesp=name.split()
emailid = '.'.join(namesp) + '@jainuniversity.ac.in'
return emailid.lower()
emailgen(names[100])
def pwd_gen():
caps_alpha = r.sample('ABCDEFGHIJKLMNOPQRSTUVWXYZ', r.randint(1,3))
small_alpha = r.sample('abcdefghijklmnopqrstuvwxyz', r.randint(2,5))
num = r.sample('0123456789', r.randint(1,3))
sp_chr = r.sample('!@#$%^&*_.', r.randint(1,1))
pwdlist=caps_alpha+small_alpha+num+sp_chr
r.shuffle(pwdlist) #it belongs to numpy
pwd=''.join(pwdlist)
return pwd
pwd_gen()
# List Comprehension
print(names)
print(emailIDs)
print(passwords)
print(empIDs)
2. I need you to perform some operations on the data, so can we load the data in a Pandas DataFrame.
7
df = pd.DataFrame({'Name': names, 'Employee ID': empIDs, 'Email ID': emailIDs, 'Password' : passwords})
3. Just to cross-check, can you show me the first 2 and the last 3 rows, Also, let's check the shape of
the DataFrame and print the data-types of each column.
df.head(2)
df.tail(3)
df.columns
type(df.Name)
df.dtypes
df[df.Name=='Nancy Zediker']
Or
df[df['Name']=='Nancy Zediker']
5. Can we check if the column Email it should not contain any duplicate emails, also check its size and
print the first 10 values.
8
serEmail = df['Email ID']
type(serEmail)
print(serEmail.shape)
df[emailDub]
6. If duplicate email is found, add a number in the duplicated email We have planned to invite them
on lunch in batches.
df[df['Name']=='Lois Thompson']
df.loc[850,'Email ID']='[email protected]'
df[df['Name']=='Lois Thompson']
df['Email ID'].duplicated().any()
7. Lets create the first batch of all people whose names are starting with A. Also give me their count.
9
fb = pd.DataFrame(A)
fb
8. I just got to know, that the people at index 10, 130, and 560 will not be joining so please remove
their records.
print(df.iloc[10,0])
print(df.iloc[130,0])
print(df.iloc[560,0])
df.head(12)
9. We also need to share the data with the finance department so can you create a new DataFrame
without the password column and save it in an excel file for sharing
10
file_name = 'Data.xlsx'
# saving the excel
df2.to_excel(file_name)
print('DataFrame is written to Excel File successfully.')
10. Can you tell me how I will access the data from excel file wing names. For example show me the
data for ‘John Brown’ and ‘Michael Combes’
df3=pd.read_excel('Data.xlsx')
print(df3.shape)
print(df3.columns)
df3.head()
11
print(df3.loc[(1,3),['Name', 'Employee ID', 'Email ID']])
12