0% found this document useful (0 votes)
5 views

Neel

The document outlines a series of tasks related to data manipulation and analysis using NumPy and Pandas in Python. It includes generating movie IDs, creating user rating matrices, handling employee data with email and password generation, and performing operations on a DataFrame. Additionally, it covers saving data to Excel and accessing specific records from the dataset.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Neel

The document outlines a series of tasks related to data manipulation and analysis using NumPy and Pandas in Python. It includes generating movie IDs, creating user rating matrices, handling employee data with email and password generation, and performing operations on a DataFrame. Additionally, it covers saving data to Excel and accessing specific records from the dataset.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Bachelor of Technology in

Computer Science and Engineering


Scripting Language Laboratory
18CS58L
NAME:G.VENKATA NEELESH
USN NO: 20BTRCS078
SECTION: B
YEAR / SEM: 5th / 3rd
BRANCH: CSE GENERAL

1
USE CASE - 1 (NUMPY)

Tasks To Perform

1. Generate 1000 Movies IDs starting from 1301.

import numpy as np
import random
movie_id=np.arange(1301,2301) #generating 1000 movie ids
print(movie_id.shape)
print(movie_id[0:5])
print(movie_id[500:506])

2. Create a movie matrix, to store user rating such that:


a. There are 100 users.
b. Each Users can review as many movies as they wish.
c. The Review should be b/w 0 to 10 (inclusive)

def createMovieMatrix(numUser, numMovies) : # creating a matrix


movie_matrix=[]
for user in range(numUser):
movies_rated_by_a_user= np.full(numMovies,-1) # default -1
num_movies_rated=random.randint(0,numMovies-1) # number of movies rated
movies_that_user_rates=random.sample(range(numMovies),num_movies_rated) # all movies cant be
rated by every user this will give random values to num of movies rated
# range is 0 to 999 and it'll display the array of number of movies rated having values in range 0 to 999
for i in movies_that_user_rates :
movies_rated_by_a_user[i]=random.randint(0,10)
movie_matrix.append(movies_rated_by_a_user)
movie_matrix=np.array(movie_matrix)
return movie_matrix

#displaying the matrix


numMovies=1000
numUser=100
movie_matrix=createMovieMatrix(numUser,numMovies)
print('Movie Matrix Details : ')
print('Shape : ', movie_matrix.shape)
print('10, 10 slice of movie matrix : ')
print(movie_matrix[31:40,500:510])

2
3. We have 10 movie experts, lets us take their review too, also 50 new moives have to be added to the
matrix along with their reviews.

movie_matrix=createMovieMatrix(100,1000) # adding 10 new users


expert_matrix=createMovieMatrix(10,1000)
movie_matrix=np.vstack([movie_matrix,expert_matrix])
print(movie_matrix.shape)

newMovie_matrix=createMovieMatrix(110,50) # adding 50 new movies


movie_matrix=np.hstack([movie_matrix,newMovie_matrix])
print(movie_matrix.shape)
print(movie_matrix)

print(movie_matrix.shape)
col=5 #cols are movie id in matrix
m=movie_matrix[:,col]
print(m)
m=m[m>=0]
print(m)
print(len(m))
print(m.shape[0]) # same as previous line #num of rating
print(round(m.mean(),2)) #mean
print(round(m.std(),2)) # standard deviation

3
4. Create a Final moive rating matrix with 4 columns, i.e ‘Movie ID’, ‘Avg-Rating’, ‘Number Of
Ratings’, ‘Standard Deviation Of Ratings’.

movie_id=np.arange(1301,2351)
movie_stats=[]
for col in range(1050):
m=movie_matrix[:,col]
m=m[m>=0]
movie_stats.append([movie_id[col], round(m.mean(),2), m.size ,round(m.std(),2)])
movie_stats=np.array(movie_stats)
print(movie_stats.shape)
print(movie_stats[:5,:])

5. Also Counvert the final movie ratings to have range from 0 to 10, such that the minimum ratings
convert into 0 and maximum to 10 and other values in between.

startOfNewRange=0
for i in range(1050):
movie_rating=movie_stats[i,1]
print('Movie',i,'old rating',movie_rating)
print('Distance from minimum',movie_rating-x.min())
print('ratio of range',originalRatingRange/avgRatingRange)
newRating=(movie_rating-x.min())*(originalRatingRange/avgRatingRange)+startOfNewRange
print('Movie',i,'new rating :',newRating)

4
6. Display the films rating wise, Highest to Lowest.

movie_ratings=np.array(movie_stats)
movie_ratings=np.sort(movie_ratings,axis=0)
movie_ratings=movie_ratings[::-1]
print(movie_ratings)

USE CASE - 2 (PANDAS)

Tasks to Perform.

!wget https://www.dropbox.com/s/onl5ac2ea3v11aw/names.txt

5
import pandas as pd
import random as r
import numpy as np

f=open('names.txt')
allnames= f.read()
f.close()
print(allnames)

names = allnames.split('\n')
print(type(names))
print(len(names))
print(names[100:110])

removedName=names.pop()

print(len(names))

6
1. We have hired 1000 new employees. Here are the names in a text file, each name separated using a
new line: Can you generate the following
● Employer ID: Starting from 2929999 Example 2029001, 2039002 and so on
● Email ID: [email protected]
● Password: It should be an alphanumeric value, Must be having capital letters,small letters, and
special symbols

def emailgen(name):
namesp=name.split()
emailid = '.'.join(namesp) + '@jainuniversity.ac.in'
return emailid.lower()

emailgen(names[100])

def pwd_gen():
caps_alpha = r.sample('ABCDEFGHIJKLMNOPQRSTUVWXYZ', r.randint(1,3))
small_alpha = r.sample('abcdefghijklmnopqrstuvwxyz', r.randint(2,5))
num = r.sample('0123456789', r.randint(1,3))
sp_chr = r.sample('!@#$%^&*_.', r.randint(1,1))
pwdlist=caps_alpha+small_alpha+num+sp_chr
r.shuffle(pwdlist) #it belongs to numpy
pwd=''.join(pwdlist)
return pwd

pwd_gen()

# List Comprehension

emailIDs = [emailgen(name) for name in names]


passwords = [pwd_gen() for i in range(1000)]
empIDs = [id for id in range(2020001, 2020001+1000)]

print(names)
print(emailIDs)
print(passwords)
print(empIDs)

2. I need you to perform some operations on the data, so can we load the data in a Pandas DataFrame.

7
df = pd.DataFrame({'Name': names, 'Employee ID': empIDs, 'Email ID': emailIDs, 'Password' : passwords})

3. Just to cross-check, can you show me the first 2 and the last 3 rows, Also, let's check the shape of
the DataFrame and print the data-types of each column.

df.head(2)

df.tail(3)

df.columns
type(df.Name)
df.dtypes

4. Print the employee id email ail id and password of ‘Nancy Zediker’.

df[df.Name=='Nancy Zediker']
Or
df[df['Name']=='Nancy Zediker']

5. Can we check if the column Email it should not contain any duplicate emails, also check its size and
print the first 10 values.
8
serEmail = df['Email ID']
type(serEmail)
print(serEmail.shape)

emailDub = df['Email ID'].duplicated()


df['Email ID'].duplicated().any()

df[emailDub]

6. If duplicate email is found, add a number in the duplicated email We have planned to invite them
on lunch in batches.

df[df['Name']=='Lois Thompson']

df.loc[850,'Email ID']='[email protected]'

df[df['Name']=='Lois Thompson']

df['Email ID'].duplicated().any()

7. Lets create the first batch of all people whose names are starting with A. Also give me their count.

# Create a batch of employees Starting with 'A'


A = df[df.Name.str.startswith("A")]
A.shape

9
fb = pd.DataFrame(A)
fb

8. I just got to know, that the people at index 10, 130, and 560 will not be joining so please remove
their records.

print(df.iloc[10,0])
print(df.iloc[130,0])
print(df.iloc[560,0])

df.drop([10,130,560], axis=0, inplace=True)

df.head(12)

9. We also need to share the data with the finance department so can you create a new DataFrame
without the password column and save it in an excel file for sharing

# Using DataFrame.copy() create new DaraFrame.


df2 = df[['Name', 'Employee ID','Email ID']].copy()
print(df2)

10
file_name = 'Data.xlsx'
# saving the excel
df2.to_excel(file_name)
print('DataFrame is written to Excel File successfully.')

10. Can you tell me how I will access the data from excel file wing names. For example show me the
data for ‘John Brown’ and ‘Michael Combes’

df3=pd.read_excel('Data.xlsx')
print(df3.shape)
print(df3.columns)
df3.head()

11
print(df3.loc[(1,3),['Name', 'Employee ID', 'Email ID']])

12

You might also like