50% found this document useful (2 votes)
710 views

GWA - Lab Workbook

The document is a lab workbook for a course on graph and web analytics. It provides instructions and exercises for 10 lab sessions covering topics like introducing graphs using NetworkX, exploring various graph types, importing and exporting graph data, graph analysis measures and traversals, node and group analysis, community detection, web scraping, customer segmentation, and applying Google Analytics. Each lab session involves a pre-lab assignment, in-lab exercises, and a post-lab homework question to reinforce the concepts learned. Student progress and evaluations are tracked in a table at the end.

Uploaded by

Mohan Yvk
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
50% found this document useful (2 votes)
710 views

GWA - Lab Workbook

The document is a lab workbook for a course on graph and web analytics. It provides instructions and exercises for 10 lab sessions covering topics like introducing graphs using NetworkX, exploring various graph types, importing and exporting graph data, graph analysis measures and traversals, node and group analysis, community detection, web scraping, customer segmentation, and applying Google Analytics. Each lab session involves a pre-lab assignment, in-lab exercises, and a post-lab homework question to reinforce the concepts learned. Student progress and evaluations are tracked in a table at the end.

Uploaded by

Mohan Yvk
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 70

LAB WORKBOOK

17CS3260 INTRODUCTION TO GRAPH & WEB ANALYTICS

III B.TECH 2019-20 EVEN SEMESTER


K L UNIVERSITY | INTRODUCTION TO GRAPH & WEB ANALYTICS – 17CS3260
17CS3260 INTRODUCTION TO GRAPH & WEB ANALYTICS

LABORATORY WORKBOOK

STUDENT NAME
REG. NO
YEAR
SEMESTER
SECTION
FACULTY

1
17CS3260 INTRODUCTION TO GRAPH & WEB ANALYTICS

Table of Contents

ORGANIZATION OF THE STUDENT LAB WORKBOOK ............................................... 3

#01 INTRODUCTION TO GRAPHS USING NETWORKX .............................................. 6

#02 EXPLORING VARIOUS GRAPHS ....................................................................... 12

#03 IMPORTING & EXPORTING GRAPH DATA ....................................................... 19

#04 GRAPH DISTANCE MEASURES & TRAVERSALS ................................................ 27

#05 NODE & GROUP LEVEL ANALYSIS ................................................................... 33

#06 COMMUNITY DETECTION ............................................................................... 40

#07 WEB SCRAPING .............................................................................................. 48

#08 CUSTOMER SEGMENTATION .......................................................................... 53

#09 GOOGLE ANALYTICS ON GOOGLE MERCHANDISE STORE - 1 .......................... 59

#10 GOOGLE ANALYTICS ON GOOGLE MERCHANDISE STORE - 2 .......................... 64

2
17CS3260 INTRODUCTION TO GRAPH & WEB ANALYTICS

Organization of the Student Lab Workbook

The laboratory framework includes a creative element but shifts the time-intensive aspects
outside of the Two-Hour closed laboratory period. Within this structure, each laboratory
includes three parts: Prelab, In-lab, and Post-lab.
a. Pre-Lab

The Prelab exercise is a homework assignment that links the lecture with the laboratory
period - typically takes 2 hours to complete. The goal is to synthesize the information they
learn in lectures with material from their textbook to produce a working piece of software.
Prelab Students attending a two-hour closed laboratory are expected to make a good-faith
effort to complete the Prelab exercise before coming to the lab. Their work need not be
perfect, but their effort must be real (roughly 80 percent correct).

b. In-Lab

The In-lab section takes place during the actual laboratory period. The First Hour of the
laboratory period can be used to resolve any problems the students might have experienced
in completing the Prelab exercises. The intent is to give constructive feedback so that
students leave the lab with working Prelab software - a significant accomplishment on their
part. During the second hour, students complete the In-lab exercise to reinforce the
concepts learned in the Prelab. Students leave the lab had received feedback on their Prelab
and In-lab work.

c. Post-Lab

The last phase of each laboratory is a homework assignment that is done following the
laboratory period. In the Post-lab, students analyze the efficiency or utility of a given system
call. Each Post-lab exercise should take roughly 120 minutes to complete.

3
17CS3260 INTRODUCTION TO GRAPH & WEB ANALYTICS

2019-20 EVEN SEMESTER LAB CONTINUOUS EVALUATION

In-Lab
S. Pre-Lab Post- Viva Voce Total Faculty
No. Date Experiment Name (5M) Lab (5M) (50M) Signature
Logic Execution Result Analysis
(5M)
(10M) (10M) (10M) (5M)

1.

2.

3.

4.

5.
17CS3260 INTRODUCTION TO GRAPH & WEB ANALYTICS

2019-20 EVEN SEMESTER LAB CONTINUOUS EVALUATION

In-Lab
S. Pre-Lab Post- Viva Voce Total Faculty
No. Date Experiment Name (5M) Lab (5M) (50M) Signature
Logic Execution Result Analysis
(5M)
(10M) (10M) (10M) (5M)

6.

7.

8.

9.

10.
17CS3260 INTRODUCTION TO GRAPH & WEB ANALYTICS
LAB SESSION 01: Introduction to Graphs using NetworkX

Date of the Session: ___/___/___ Time of the Session: _____to______

Prerequisite:
• Basic programming knowledge on Python
• Graph and its properties
• Jupyter Notebook Installation
Pre-Lab:
Answer the following questions:
1. Define the graph, vertices, and edges.
A:

2. Mention and explain the types of Graphs.


A:

3. Mention the Spark API that is used for graph and graph parallel computation.
A:

6
17CS3260 INTRODUCTION TO GRAPH & WEB ANALYTICS
4. What is the difference between Graph and Network?
A:

5. Write the command to install NewtorkX using pip in python.


A:

7
17CS3260 INTRODUCTION TO GRAPH & WEB ANALYTICS
In-Lab:
1. Install the NetworkX package in python and import it.
a) Create a new undirected Graph object and add node ‘1’ to it.
b) Draw the network.
c) Add (2,3,4,5,6) nodes to the graph and draw the network.
d) Create edges between (1-2,1-3,1-4,1-5,2-3,2-5,2-6,3-5,4-5,6-4) and draw the network.
e) Print the details of the graph i.e., Number of nodes, number of edges, nodes and edges.
f) Remove the node 2 from the graph network and draw the network.
g) Print the number of nodes and edges.
h) Print the neighbors of node 1 and clear all the nodes from the graph.
  
Writing space for the In-Lab:(For Student’s use only)

8
17CS3260 INTRODUCTION TO GRAPH & WEB ANALYTICS
2. Create a graph of 10 nodes where every node is connected to each other and represented
them with unique colors.

Writing space for the In-Lab:(For Student’s use only)


  

9
17CS3260 INTRODUCTION TO GRAPH & WEB ANALYTICS
Post-Lab:
1. Mike wants to draw a picture of his friends along with their favorites so he approached you, his
cousin to get his work done. Mike said that his friends are Justin, Alley, Sebastian, Aicel,
William, Grace, Rose and Hazel. According to Mike their favorite color and fruit are, Justin –
Purple & Grapes, Alley – White & Apple, Sebastian – Orange & Apple, Aicel – Green &
Watermelon, William – Blue & Kiwi, Grace – White & Watermelon, Rose – Red & Strawberry
and Hazel – Blue & Plums. Mike said that Hazel, Grace & Rose, William & Aicel, Alley, Sebastian
& Aicel, Hazel & Sebastian, Aicel & Rose are friends before he met them through Justin, who
was a friend of Rose & Aicel. You, being fond of graphs decided to picture the information in
graphs rather than drawing them. Mike’s favorites are similar to yours. You, as his cousin,
draw the graph of his friends with the provided attributes and color them w.r.t to their
favorite colors.

Writing space for the Post-Lab:(For Student’s use only)

10
17CS3260 INTRODUCTION TO GRAPH & WEB ANALYTICS
(Writing space for Post-Lab)

(For Evaluator’s use only)

Comment of the Evaluator (if Any) Evaluator’s Observation

Marks Secured: _______ out of ________

Full Name of the Evaluator:

Signature of the Evaluator Date of Evaluation:

11
17CS3260 INTRODUCTION TO GRAPH & WEB ANALYTICS
LAB SESSION 02: Exploring Various Graphs

Date of the Session: ___/___/___ Time of the Session: _____to______

Prerequisite:
• Graphing using NetworkX
• Properties of various graphs
Pre-Lab:
Answer the following:
1. What are the different types of graphs in a network?
A:

2. Which kind of graphs are used for Facebook and Twitter network data analysis? Justify your
answer.
A:

3. Which kind of graphs can have a self-loop and multi-edges?


A:

12
17CS3260 INTRODUCTION TO GRAPH & WEB ANALYTICS
4. What kind of graphs are used to show asymmetric relationships and symmetric relationships?
A:

5. Can we find the largest connected component of a directed graph? Justify your answer.
A:

6. Describe the function and its parameters for the following.


Function Description

random.randinit()

random.uniform()

disjoint.union()

13
17CS3260 INTRODUCTION TO GRAPH & WEB ANALYTICS
In-Lab:
1. Create an undirected graph using erdos_renyi_graph with 8 nodes and an edge probability
value of 0.25. Implement the following network layouts to the graph.
a) Spring Layout
b) Circular Layout
c) Random Layout
d) Spectral Layout

Writing space for the In-Lab:(For Student’s use only)

14
17CS3260 INTRODUCTION TO GRAPH & WEB ANALYTICS
(Writing space for In-Lab)

15
17CS3260 INTRODUCTION TO GRAPH & WEB ANALYTICS
2. Define a directed graph ‘G’.
a) Take a list named ‘city_set’ consisting of the following cities ‘Delhi’, ‘Bangalore’,
‘Hyderabad’, ‘Ahmedabad’, ‘Chennai’, ‘Kolkata’, ‘Surat’, ‘Pune’, ‘Jaipur’. Thereafter, assign
each city as a particular node.
b) Create a list named ‘costs’ which contains the costs of traveling from one city to
another(weights of edges).
c) By randomly allotting the weights to the edges, generate 16 random edges between the
nodes.
d) Display the complete graph consisting of nodes, edges, and weights with a circular layout.
e) Find out which two cities have paths between them using the ‘has_path’ function.

Writing space for the In-Lab:(For Student’s use only)

16
17CS3260 INTRODUCTION TO GRAPH & WEB ANALYTICS
Post-Lab:
1. Create a directed graph with nodes from A to H having connections between the nodes as {
('A', 'B'), ('A', 'C'), ('D', 'B'), ('E', 'C'), ('E', 'F'),('B', 'H'), ('B', 'G'), ('B', 'F'), ('C', 'G') } where the
connections { ('A', 'C'), ('E', 'C'),('F','E'),('H','B') } are directed and the rest are undirected.
Differentiate the directed edges and undirected edges with colors where directed are of red
color and the undirected are of black color.

Writing space for the Post-Lab:(For Student’s use only)

17
17CS3260 INTRODUCTION TO GRAPH & WEB ANALYTICS
2. Create the following graph with the NetworkX library.
a) Generate a directed graph G with four nodes ‘A’, ‘B’, ‘C’, and ‘D’.
b) Also, generate another directed graph G1 with two nodes ‘I’ and ‘J’.
c) Convert the above-directed graphs into undirected graphs.
d) Find the largest connected component of the graph produced by the cartesian product of
undirected graphs.

Writing space for the Post-Lab:(For Student’s use only)

(For Evaluator’s use only)

Comment of the Evaluator (if Any) Evaluator’s Observation

Marks Secured: _______ out of ________

Full Name of the Evaluator:

Signature of the Evaluator Date of Evaluation:

18
17CS3260 INTRODUCTION TO GRAPH & WEB ANALYTICS
LAB SESSION 03: Importing & Exporting Graph Data

Date of the Session: ___/___/___ Time of the Session: _____to______

Prerequisite:
• Graphing using NetworkX
• Formats of GML, PAJEK, GraphML, and GEXF.
• Matrices and their properties
Pre-Lab:
Answer the following:
1. What are the different formats for the network datasets?
A:

2. Write the structure of the following graph in PAJEK format?

A:

3. Write the structure of the following graph in the GEXF format.

A:

19
17CS3260 INTRODUCTION TO GRAPH & WEB ANALYTICS
4. Can the below-mentioned adjacency matrix form an undirected graph? Justify your answer by
representing it in the respective graph.
[ [ 0, 1, 0, 0, 0, 0]
[ 1, 0, 1, 1, 0, 0]
[ 0, 0, 1, 0, 0, 0]
[ 1, 0, 1, 0, 0, 0]
[ 0, 0, 0, 0, 0, 1]
[ 0, 0, 0, 0, 0, 0]]

A:

5. Represent the adjacency list of the below given directed graph.

A:

6. Is the below-given matrix a sparse matrix? Justify your answer.


[ [ 1, 0, 2, 0 ]
[ 0, 3, 0, 4 ]
[ 0, 0, 5, 0 ]
[ 6, 0, 0, 7 ] ]
A:

20
17CS3260 INTRODUCTION TO GRAPH & WEB ANALYTICS
In-Lab:
1. Import the appropriate package for GML formats.
a) Read the Dolphin Social Network dataset (dolphins.gml) in GML format and plot the graph
with an appropriate layout.
b) Create a Lollipop graph with the clique of 5 and a path of 2 nodes and convert it into a
GML format.

Writing space for the In-Lab:(For Student’s use only)

21
17CS3260 INTRODUCTION TO GRAPH & WEB ANALYTICS
2. Import the appropriate packages for PAJEK format.
a) Read the Football dataset (football.net) in PAJEK format and plot the graph with an
appropriate layout.
b) Create a Petersen graph and convert it into a PAJEK format.

Writing space for the In-Lab:(For Student’s use only)

22
17CS3260 INTRODUCTION TO GRAPH & WEB ANALYTICS
3. Create a graph as shown below and convert it into an adjacency matrix and print the matrix.
Check its sparsity and print the matrix in an appropriate representation.

  

Writing space for the In-Lab:(For Student’s use only)

23
17CS3260 INTRODUCTION TO GRAPH & WEB ANALYTICS
Post-Lab:
1. Read the Zachary Karate Club dataset (karate.gml), which has the details of the network of
friends in the University Karate Club, in GML Format.
a) Find the number of students in the Karate Club.
b) Find the number of relations among all the students in the club.
c) Find the number of friends does each friend has.
d) Plot the friends’ network in the Karate club with an appropriate layout.
e) Plot a histogram displaying the number of friends each student is associated with.

Writing space for the Post-Lab:(For Student’s use only)

24
17CS3260 INTRODUCTION TO GRAPH & WEB ANALYTICS
2. Create and draw a graph network between 10 cities i.e., New York, Jersey City, Newark,
Boston, Washington, Baltimore, Rochester, Norfolk, Philadelphia, and Guttenberg, with the
distance among them is as given below. Export the edge list of the graph to a file named
‘cities.edgelist’ and understand the format of the data in that file.

New Jersey Newark Boston Washington Baltimore Rochester Norfolk Philadelphia Guttenberg
City
York City
New 0 4.2 13.9 215.1 227.0 187.7 333.3 363.3 94.5 8.3
York
Jersey 4.2 0 12.0 218.1 223.6 184.3 331.3 359.9 91.1 6.5
City
Newark 13.9 12.0 0 224.0 218.0 178.6 322.8 354.3 85.5 13.4
Boston 215.1 218.1 224.0 0 441.2 401.8 392.3 557.5 308.7 212.9
Washington 227.0 223.6 218.0 441.2 0 38.5 379.1 194.6 139.0 226.5
Baltimore 187.7 184.3 178.6 401.8 38.5 0 336.9 240.0 105.7 193.2
Rochester 333.3 331.3 322.8 392.3 379.1 336.9 0 571.4 339.8 334.8
Norfolk 363.3 359.9 354.3 557.5 194.6 240.0 571.4 0 276.8 364.3
Philadelphia 94.5 91.1 85.5 308.7 139.0 105.7 339.8 276.8 0 95.7
Guttenberg 8.3 6.5 13.4 212.9 226.5 193.2 334.8 364.3 95.7 0

Writing space for the Post-Lab:(For Student’s use only)

25
17CS3260 INTRODUCTION TO GRAPH & WEB ANALYTICS
(Writing space for Post-Lab)

(For Evaluator’s use only)

Comment of the Evaluator (if Any) Evaluator’s Observation

Marks Secured: _______ out of ________

Full Name of the Evaluator:

Signature of the Evaluator Date of Evaluation:

26
17CS3260 INTRODUCTION TO GRAPH & WEB ANALYTICS
LAB SESSION 04: Graph Distance Measures & Traversals

Date of the Session: ___/___/___ Time of the Session: _____to______

Prerequisite:
• Graphing using NetworkX
• Concepts of BFS, DFS & Dijkstra’s algorithm
Pre-Lab:
Answer the following:
1. How do you calculate the distance between two nodes in a graph?
A:

2. What is the eccentricity of a disconnected graph?


A:

3. Is the eccentricity of the weighted graph the same as that of the unweighted graph? Justify
your answer.
A:

27
17CS3260 INTRODUCTION TO GRAPH & WEB ANALYTICS
4. Manually traverse the given graph’s path based on the Depth First Search(DFS) algorithm.

A:

5. Manually traverse the given graph’s path based on the Breadth-First Search(BFS) algorithm.

A:

28
17CS3260 INTRODUCTION TO GRAPH & WEB ANALYTICS
In-Lab:
1. Create a Krackhardt kite graph using the NetworkX library function.
a) Find the number of nodes and edges in the graph.
b) Find the degree of each node in the graph.
c) Calculate the eccentricity of every node in the graph.
d) Plot the graph with node size based on eccentricity values.
e) Find the radius and diameter of the graph.
f) Find the center and periphery of the graph.
g) Plot the graph and show the center with a different color.

Writing space for the In-Lab:(For Student’s use only)

29
17CS3260 INTRODUCTION TO GRAPH & WEB ANALYTICS
2. Using NetworkX,
a) Visualize the path of the graph from node ‘1’ based on the Breadth-First Search(BFS)
algorithm.
b) Visualize the step-by-step traversal of the graph from the node ‘0’ based on the Depth First
Search(DFS) algorithm.

Writing space for the In-Lab:(For Student’s use only)

30
17CS3260 INTRODUCTION TO GRAPH & WEB ANALYTICS
Post-Lab:
1. Generate a graph with random edges and weights by considering the cities {Delhi, Kolkata,
Hyderabad, Bangalore, Chennai, Vijayawada, Vizag, Mumbai, Jaipur, Bhopal} as the nodes.
a) Find the shortest path between Bhopal and Vijayawada using Dijkstra’s algorithm.
b) Find the shortest path between Bhopal and Chennai using Dijkstra’s algorithm.

Writing space for the Post-Lab:(For Student’s use only)

31
17CS3260 INTRODUCTION TO GRAPH & WEB ANALYTICS
(Writing space for Post-Lab)

(For Evaluator’s use only)

Comment of the Evaluator (if Any) Evaluator’s Observation

Marks Secured: _______ out of ________

Full Name of the Evaluator:

Signature of the Evaluator Date of Evaluation:

32
17CS3260 INTRODUCTION TO GRAPH & WEB ANALYTICS
LAB SESSION 05: Node & Group Level Analysis

Date of the Session: ___/___/___ Time of the Session: _____to______

Prerequisite:
• Properties of Graphs
• Concept of Centrality
Pre-Lab:
Answer the following:
1. What are the most important nodes in a network?
A:

2. Calculate the betweenness centrality for each node in the following graph without using the
NetworkX library.

A:

3. What does a higher clustering coefficient of a graph represent?


A:

33
17CS3260 INTRODUCTION TO GRAPH & WEB ANALYTICS
4. Differentiate between triangle clustering and square clustering.
A:

5. What is the k-core of a graph? What is the highest possible value of k-core if the given graph is
a complete graph with 9 nodes?
A:

34
17CS3260 INTRODUCTION TO GRAPH & WEB ANALYTICS
In-Lab:
1. Create the below-given graph using the NetworkX library and then,
a) Find the important node which has more connections in the network.
b) Calculate how nearer a node is to all the nodes.
c) Find the node that plays a significant role in the information flow of the graph.
d) One of the points to be observed while calculating the importance of a particular node is
by finding the number of high priority nodes that it is connected to. Which type of
centrality measure is supposed to be considered for this scenario and apply to the graph.
e) Plot the graph by considering various node sizes and node colors based on their
importance in the network.

Writing space for the In-Lab:(For Student’s use only)

35
17CS3260 INTRODUCTION TO GRAPH & WEB ANALYTICS
2. Import the NetworkX package and then,
a) Construct a graph ‘G’ having the following edges {('A', 'B'), ('A', 'K'), ('B', 'K'), ('A', 'C'), ('B',
'C'), ('C', 'F'), ('F', 'G'), ('C', 'E'), ('E', 'F'), ('E', 'D'), ('E', 'H'), ('I', 'J')}
b) Display the clustering coefficients for all the nodes of the graph.
c) Find the average clustering coefficient of the graph.
d) Calculate the transitivity of the graph ‘G’.

Writing space for the In-Lab:(For Student’s use only)

36
17CS3260 INTRODUCTION TO GRAPH & WEB ANALYTICS
3. Construct a graph ‘G’ connected by the edges {(a,b), (a,f), (a,c), (b,c), (b,f), (f,c), (c,d), (d,e),
(e,f)}
a) Find out the edges of the subgraph which has the maximum clique number of the graph.
b) What is the maximum number of vertices of a clique in a bipartite graph?

Writing space for the In-Lab:(For Student’s use only)

37
17CS3260 INTRODUCTION TO GRAPH & WEB ANALYTICS
Post-Lab:
1. Perform all the centrality measures on given Facebook data (facebook.txt) and analyze the
importance of nodes.

Writing space for the Post-Lab:(For Student’s use only)

38
17CS3260 INTRODUCTION TO GRAPH & WEB ANALYTICS
2. Construct a graph ‘H’ comprising of the edges {(1,2), (1,3), (2,3), (1,4), (2,4), (3,4), (10,20),
(10,30), (20,30)}
a) Display all the 2-core and 3-core connected component subgraphs of H.

Writing space for the Post-Lab:(For Student’s use only)

(For Evaluator’s use only)

Comment of the Evaluator (if Any) Evaluator’s Observation

Marks Secured: _______ out of ________

Full Name of the Evaluator:

Signature of the Evaluator Date of Evaluation:

39
17CS3260 INTRODUCTION TO GRAPH & WEB ANALYTICS
LAB SESSION 06: Community Detection

Date of the Session: ___/___/___ Time of the Session: _____to______

Prerequisite:
• Graphing using NetworkX
• Concepts of Centrality
Pre-Lab:
Answer the following:
1. Define Community detection and what constitutes a community within a graph?
A:

2. What are the various kinds of community detection techniques?


A:

40
17CS3260 INTRODUCTION TO GRAPH & WEB ANALYTICS
3. Mention the differences between overlapping and non-overlapping communities. Also, specify
some algorithms that are used for both of them.
A:

4. How are the nodes partitioned in the Louvain algorithm and the Girvan Newman algorithm?
A:

41
17CS3260 INTRODUCTION TO GRAPH & WEB ANALYTICS
In-Lab:
1. Construct a random graph consisting of 30 nodes with a probability of 0.05 for edge creation.
a) Plot the graph with the appropriate layout and detect the communities using the
best_partition() function.
b) Generate a dendrogram for the created graph and print the partitions at each level.

Writing space for the In-Lab:(For Student’s use only)

42
17CS3260 INTRODUCTION TO GRAPH & WEB ANALYTICS
2. Apply the Girvan Newman Algorithm by considering a path graph with 10 nodes and
implement 2 functions namely ‘Girvan’ and ‘edge_to_remove()’.
a) Construct the function wherein the functions return count of connected component
subgraph and the edge to be removed respectively.
b) Display the various communities as the end result.
Writing space for the In-Lab:(For Student’s use only)

43
17CS3260 INTRODUCTION TO GRAPH & WEB ANALYTICS
(Writing space for In-Lab)

44
17CS3260 INTRODUCTION TO GRAPH & WEB ANALYTICS
Post-Lab:
1. Import the facebook_combined.txt file and then,
a) Detect the communities from the data using the best_partition() function.
b) Plot the detected communities with an appropriate layout.
c) Generate a dendrogram for the created graph and print the partitions at each level.
Writing space for the Post-Lab:(For Student’s use only)

45
17CS3260 INTRODUCTION TO GRAPH & WEB ANALYTICS
2. Using Girvan Newman function,
a) Do the community partitioning on a barbell graph that has 10 nodes such that you obtain
k number of communities in sorted order.
b) Perform community partitioning with various values of k.

Writing space for the Post-Lab:(For Student’s use only)

46
17CS3260 INTRODUCTION TO GRAPH & WEB ANALYTICS
(Writing space for Post-Lab)

(For Evaluator’s use only)

Comment of the Evaluator (if Any) Evaluator’s Observation

Marks Secured: _______ out of ________

Full Name of the Evaluator:

Signature of the Evaluator Date of Evaluation:

47
17CS3260 INTRODUCTION TO GRAPH & WEB ANALYTICS
LAB SESSION 07: Web Scraping

Date of the Session: ___/___/___ Time of the Session: _____to______

Prerequisite:
• Basic Knowledge on Web Scraping
Pre-Lab:
Answer the following:
1. Define Web Scraping.
A:

2. List the Python libraries required for Web Scraping.


A:

3. Is Web Scraping data mining?


A:

48
17CS3260 INTRODUCTION TO GRAPH & WEB ANALYTICS
4. What is the difference between Web Crawling and Web Scraping?
A:

5. Is Google a Web Crawler or Web Scraper?


A:

49
17CS3260 INTRODUCTION TO GRAPH & WEB ANALYTICS
In-Lab:
1. Perform Web Scraping on any Twitter profile page using the BeautifulSoup4 package.
a) Import the libraries BeautifulSoup and urllib2.
b) Store the URL in a variable ‘theurl’.
c) Create a BeautifulSoup object on the URL with HTML parser.
d) Print the following:
i. Title of the Twitter account without the HTML tag.
ii. ProfileHeaderCard from the div tag.
e) Print all the inks in the Twitter profile page using the findAll() function.
Writing space for the In-Lab:(For Student’s use only)

50
17CS3260 INTRODUCTION TO GRAPH & WEB ANALYTICS
Post-Lab:
1. Perform Web Scraping on Wikipedia page using the BeautifulSoup package.
Wikipedia page: https://en.wikipedia.org/wiki/Artificial_intelligence
a) Import the libraries BeautifulSoup, urllib, re.
b) Connect to the website using urllib.
c) Create a BeautifulSoup object with HTML parser on the URL information.
d) Find all the list items with class ‘tocsection-‘ using re.compile() and find_all() functions.
e) Create an array and scrape all the text in the class i.e., ‘tocsection-‘ using the getText()
method.
f) Save the scraped text context into a.txt file.
Writing space for the Post-Lab:(For Student’s use only)

51
17CS3260 INTRODUCTION TO GRAPH & WEB ANALYTICS
(Writing space for Post-Lab)

(For Evaluator’s use only)

Comment of the Evaluator (if Any) Evaluator’s Observation

Marks Secured: _______ out of ________

Full Name of the Evaluator:

Signature of the Evaluator Date of Evaluation:

52
17CS3260 INTRODUCTION TO GRAPH & WEB ANALYTICS
LAB SESSION 08: Customer Segmentation

Date of the Session: ___/___/___ Time of the Session: _____to______

Prerequisite:
• RFM Analysis
Pre-Lab:
Answer the following:
1. What is web analytics?
A:

2. How does web analytics work?


A:

3. What all can you analyze through web analytics?


A:

53
17CS3260 INTRODUCTION TO GRAPH & WEB ANALYTICS
4. What do you mean by Behavioural analytics?
A:

54
17CS3260 INTRODUCTION TO GRAPH & WEB ANALYTICS
In-Lab:
1. An online retail company sells all-occasion gifts. Many customers of the company are
wholesalers. The owner of the company wants to identify potential customers to improve the
marketing campaign which ultimately increases sales. Given the data (Online Retail.csv) that
consists of all the transactional information like price, quantity, product description and stock
code with customers id of different countries who make purchases from the online retail
company situated in the United Kingdom (UK) during an eight-month period. You help the
owner of the online retail company in identifying the potential customers using RFM analysis
by following the below steps.
a) Pre-process the given data by removing the duplicates and dealing with missing values in
the variables ‘customerid’ and ‘country’.
b) Plot a bar graph and observe which country has the highest transactions.
c) Restrict the data only to the country and the highest transactions.
d) Remove all the negative values from the variable ‘Quantity’.
e) Multiply ‘Quantity’ and ‘UnitPrice’ and add the result to a new column ‘TotalPrice’.
f) Calculate the recency for the last invoice date.
g) Create an RFM table with recency, frequency, and monetary values.
h) Split the metrics into segments using quantiles and convert them into the dictionary.
i) Customers with the lowest recency, highest frequency, and monetary values are the best
customers so write two functions to create bins for these values.
j) Add the segments or bin values to the RFM table using the apply function.
k) Add a new column “RFMScore” by adding recency, frequency, and monetary values.
l) Finally, select the customers with the maximum RFMScore this gives the result of
potential customers.

Writing space for the In-Lab:(For Student’s use only)

55
17CS3260 INTRODUCTION TO GRAPH & WEB ANALYTICS
(Writing space for In-Lab)

56
17CS3260 INTRODUCTION TO GRAPH & WEB ANALYTICS
Post-Lab:
1. What are the different segmentation strategies other than behavioral analysis?
A:

2. How can segmentation benefit the business of the companies?


A:

57
17CS3260 INTRODUCTION TO GRAPH & WEB ANALYTICS
3. Can customer segmentation really improve customer satisfaction ad retention? Justify your
answer.
A:

(For Evaluator’s use only)

Comment of the Evaluator (if Any) Evaluator’s Observation

Marks Secured: _______ out of ________

Full Name of the Evaluator:

Signature of the Evaluator Date of Evaluation:

58
17CS3260 INTRODUCTION TO GRAPH & WEB ANALYTICS
LAB SESSION 09: Google Analytics on Google Merchandise Store - 1

Date of the Session: ___/___/___ Time of the Session: _____to______

Prerequisite:
• A Google Account
Pre-Lab:
Answer the following:
1. To collect data using Google Analytics, which steps must be completed?
A:

2. When will Google Analytics end a session by default?


A:

3. Can the data processed by Google Analytics, stored in a database can be modified?
A:

59
17CS3260 INTRODUCTION TO GRAPH & WEB ANALYTICS
4. To use Analytics to collect website data, what must be added to the website page HTML?
A:

5. Where should the Analytics tracking code be placed in the HTML of a webpage to collect data?
A:

6. A deleted view can be recovered by the account administrators within how many days?
A:

60
17CS3260 INTRODUCTION TO GRAPH & WEB ANALYTICS
In-Lab:
1. Create a Google Analytics account with the given instructions.
a) Sign in your Google account.
b) Go to https://analytics.google.com/analytics/web/provision/#/provision
c) Click on ‘Step up for free’ button
d) For the account details,
i) Give the account name as ‘Demo’ and click on next leaving all the default selections.
ii) Click ‘Web’ for What do you want to measure? And proceed to next.
iii) In property setup, provide Website Name as ‘Demo’, website URL:
www.kluniversity.in, Industry Category: Jobs and Education and Provide the
reporting time of India.
iv) Accept the Terms of Service Agreement
e) Go to the Home and check the analytics of the provided website and conclude if there are
any statistics are being generated from that website.

Writing space for the In-Lab:(For Student’s use only)

61
17CS3260 INTRODUCTION TO GRAPH & WEB ANALYTICS
Post-Lab:
1. Get the demo Google Merchandise Store website from Access Demo Account and analyze the
reports.
a) What was your Account ID?
b) How many Views were created? And mention them with their ID i.e., View ID.
c) What was the purpose of those views?
d) How many users are currently visiting the page?
e) How many users are visiting the page in the past 7 days w.r.t the number of sessions.
f) Which pages are these active users are viewing?
g) Which country and city has the most active users?
h) What was the percentage of a visitor being an existing visitor or a new visitor?
i) What kind of language is mostly preferred by the users for the website?
j) Which Operating System (of System & Mobile) is mostly used by the Users?
k) Which browser (in System & Mobile) is used by most of the users?
l) Which Browser is taking less average time to load the page?
m) What is the Event Action taken by most of the active users?
n) What was the average length of a session in a week?
o) Can you find any Indian Service Providers used by the users? Mention if any.
p) Check the medium of the active users belongs to the Traffic Sources.

Writing space for the Post-Lab:(For Student’s use only)

62
17CS3260 INTRODUCTION TO GRAPH & WEB ANALYTICS
(Writing space for Post-Lab)

(For Evaluator’s use only)

Comment of the Evaluator (if Any) Evaluator’s Observation

Marks Secured: _______ out of ________

Full Name of the Evaluator:

Signature of the Evaluator Date of Evaluation:

63
17CS3260 INTRODUCTION TO GRAPH & WEB ANALYTICS
LAB SESSION 10: Google Analytics on Google Merchandise Store - 2

Date of the Session: ___/___/___ Time of the Session: _____to______

Prerequisite:
• Google Analytics account with Google Merchandise Store website data
Pre-Lab:
Answer the following:
1. What are the 8 critical metrics?
A:

2. Define Bounce Rate.


A:

3. What is the difference between Visits and Unique Visitors?


A:

64
17CS3260 INTRODUCTION TO GRAPH & WEB ANALYTICS
4. How does the Conversion Rate is calculated?
A:

5. What is the difference between Organic and Direct Traffic?


A:

6. Define Search Engine Optimization(SEO).


A:

65
17CS3260 INTRODUCTION TO GRAPH & WEB ANALYTICS
In-Lab:
1. Analyze the reports with the justifying statistics
a) See which conversions your active users hit, along with total conversions.
b) What was the Bounce Rate for the past 7 days and explain if that’s a good sign for the user
engagement or not?
c) Users of what age and gender are tended to be less interested w.r.t to the Bounce Rate.
d) If new clothing is designed and has to put on the store, then check if the quantity for the
female and male are to be equal or different and explain the reason.
e) Does it seem that users are most likely to spend 3-30mins by considering the page views
during that session duration?
f) Is Paid search is helping the store to get more new users more than through Organic
Search?
g) Are the users visiting through Organic Search are generating more revenue than the
direct?
h) Are the users preferring desktop over mobile to visit the website and is this is the same for
the revenue generated?
i) Check if there is stability in the Ecommerce Conversion Rate by comparing it with the
Engaged User (Goal 2 Conversion Rate).
j) Is investing more money on Campaigns is resulting in a greater number of clicks and
revenue?
k) Which day in a week would be great to announce a decrease in prices or discounts w.r.t
the users visited and the revenue generated in the past one month?
l) Are the returning visitors making more successful transactions or the new users w.r.t their
shopping behavior?
m) Which product’s quantity can be added more than available by considering the Product
Revenue?
Writing space for the In-Lab:(For Student’s use only)

66
17CS3260 INTRODUCTION TO GRAPH & WEB ANALYTICS
(Writing space for In-Lab)

67
17CS3260 INTRODUCTION TO GRAPH & WEB ANALYTICS
Post-Lab:
1. Analyze the reports of the Google Merchandise Store Website.
a) What is the average Bounce Rate for the entire website?
b) How many people are on the website right now? And what source drove the most active
visitors?
c) How many Non-branded visits from SEO from Organic search traffic occurred on 1st
January 2019?
d) How many page views are driven from Facebook to Google Merchandise Store?
e) Provide the Social Users flow from Youtube until the 3rd interaction.
f) Provide the Social Users flow from India till the 5th interaction.
g) Provide the Treemap for Users vs Pages/Sessions for all the traffic.
Writing space for the Post-Lab:(For Student’s use only)

68
17CS3260 INTRODUCTION TO GRAPH & WEB ANALYTICS
(Writing space for Post-Lab)

(For Evaluator’s use only)

Comment of the Evaluator (if Any) Evaluator’s Observation

Marks Secured: _______ out of ________

Full Name of the Evaluator:

Signature of the Evaluator Date of Evaluation:

69

You might also like