0% found this document useful (0 votes)
33 views26 pages

Web Structure Mining Overview FSS2024

The document discusses Web Structure Mining, which involves discovering patterns in the hyperlink structure of the web and social ties among actors. It outlines key concepts such as hyperlink graphs, social networks, and knowledge graphs, and provides a chapter outline covering graph terminology, metrics, community detection, and machine learning applications. Additionally, it includes examples of graph types and attributes, as well as degree distribution in networks.

Uploaded by

yumna
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views26 pages

Web Structure Mining Overview FSS2024

The document discusses Web Structure Mining, which involves discovering patterns in the hyperlink structure of the web and social ties among actors. It outlines key concepts such as hyperlink graphs, social networks, and knowledge graphs, and provides a chapter outline covering graph terminology, metrics, community detection, and machine learning applications. Additionally, it includes examples of graph types and attributes, as well as degree distribution in networks.

Uploaded by

yumna
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd

Web Structure Mining

and Social Network


Analysis

Thank You Credits

Universität Mannheim – Bizer: Web Structure Mining – FSS2024 (Version: 22.03.2024) – Slide 1
Web Structure Mining

■ Definition

Discovery and interpretation of patterns in


1. the hyperlink structure of the Web
2. the social ties among actors that interact
on the Web

■ Typical sources of web graphs


1. web crawls including HTML pages and hyperlinks
2. social networks representing relations between actors
3. knowledge graphs that have been extracted from the Web
4. other types of community data (discussion forums,
email conversations, navigation paths …)
■ Web structure mining focuses on the structure, but is also
often combined with content or usage mining
techniques

Universität Mannheim – Bizer: Web Structure Mining – FSS2024 (Version: 22.03.2024) – Slide 2
Hyperlink Graph

A hyperlink graph is a collection of hyperlinks between web


pages which belong to web sites.

Universität Mannheim – Bizer: Web Structure Mining – FSS2024 (Version: 22.03.2024) – Slide 3
Social Network

A social network is a set of relations (e.g. friendship, interest,


data exchange) between social entities, i.e. members of a social
system (actors).

Universität Mannheim – Bizer: Web Structure Mining – FSS2024 (Version: 22.03.2024) – Slide 4
Knowledge Graph

A knowledge graph is a set of relations having different types


(e.g. located in, painted, is interested in, is a) between entities
(Mona Lisa, Louvre, Da Vinci) belonging to classes (e.g.
persons, paintings, museums, places, dates).

Universität Mannheim – Bizer: Web Structure Mining – FSS2024 (Version: 22.03.2024) – Slide 5
Chapter Outline

1. Describing Graphs
1. Terminology and Metrics

2. Prominence
1. Centrality
2. Prestige

3. Community Detection
1. Connected Components and K-Cores
2. Clustering-based Techniques

4. Machine Learning with Graphs


1. Link Prediction and Node Classification
2. Node Embeddings
3. Graph Neural Networks

Universität Mannheim – Bizer: Web Structure Mining – FSS2024 (Version: 22.03.2024) – Slide 6
1. Describing Graphs: Terminology and Metrics

A Graph is a collection of vertices that are connected by edges.

Network often refers to real systems


vertex
Graph: mathematical representation
of a network
edge But often: “Network” ≡ “Graph”

Community Points Lines


Math vertices edges, arcs
Computer Science nodes links
Physics sites bonds
Sociology actors ties, relations

Universität Mannheim – Bizer: Web Structure Mining – FSS2024 (Version: 22.03.2024) – Slide 7
Graphs

A graph is an ordered pair where


is a set of vertices and is a set of
edges.

Two vertices a and b are called adjacent if

directed edge/arc: a b

undirected edge: a b

Universität Mannheim – Bizer: Web Structure Mining – FSS2024 (Version: 22.03.2024) – Slide 8
Examples: Directed and Undirected Graphs

Undirected Graph Directed Graph


undirected edges (symmetrical)  edge directed edges  arcs

Graph: Digraph = directed graph:


L
A D
M B
F
C
I
D

B G E
G
A
H
C F

Undirected edges: Directed arcs:


• co-authorship links • hyperlinks on the WWW
• roads (mostly) • following on Twitter
• phone calls
Universität Mannheim – Bizer: Web Structure Mining – FSS2024 (Version: 22.03.2024) – Slide 9
Graph Terminology

Definition: When (u, v) is an edge of the graph G with


directed edges, u is said to be adjacent to v, and v is
said to be adjacent from u.
The vertex u is called the initial vertex of (u, v), and v is
called the terminal vertex of (u, v).
The initial vertex and terminal vertex of a loop are the
same.
Representing Graphs
a a
d
b b
d

c c

Adjacent Initial Terminal


Vertex
Vertices Vertex Vertices
a b, c, d a c
b a, d b a
c a, d c
d a, b, c d a, b, c
Adjacency Matrix

A graph can be represented as


adjacency matrix.

j 1

2
i 3

4 5

Universität Mannheim – Bizer: Web Structure Mining – FSS2024 (Version: 22.03.2024) – Slide 12
Adjacency Matrices for Directed and Undirected
Graphs

4 4

3
3 2
2
1
1
A12
A14
(0 1 0 1 (0 0 0 1ö
0 0 1 0 0 0
Aij = 1
0 0 A ij 0 0
ç0 1 ç0 0
÷ ÷
1 1 1 0 1 1 0
0
Aij=1 if there is a link between vertices i and j
Aij=0 if vertices i and j are not connected to each other.

Note that for an undirected graph (left) the matrix is symmetric.

Universität Mannheim – Bizer: Web Structure Mining – FSS2024 (Version: 22.03.2024) – Slide 13
Weighted and Unweighted Graphs

Unweighted Graph Weighted Graph


(undirected) (undirected)
4 4
1 1

2 2
3 3

(0 ( 0 2
ç1 1 1 0÷ö ç 2 0ö

÷
Aij = 0 1 1÷ Aij = 0.5
ç1 0÷
ç ç0 ÷ ç 0
è 1 0 0÷ è 0ø
÷
1
1 0 0ø Example: Road networks (distance in miles)
ç0.5 1 0
ç
Universität Mannheim – Bizer: Web Structure Mining – FSS2024 (Version: 22.03.2024) – Slide 14
Bipartite Graphs

Bipartite graph (or bigraph) is a


graph whose vertices can be
divided into two disjoint sets U and
V such that every line connects a
vertex in U to one in V; that is, U
and V are independent sets.

Examples:
• movie/actor network
• disease/symptom network
• photo/tag network on Flickr
• customer/product recommendations

Universität Mannheim – Bizer: Web Structure Mining – FSS2024 (Version: 22.03.2024) – Slide 15
Vertex, Arc and Edge Attributes

Vertices, arcs and edges can have attributes.


Example of a network with vertex and arc attributes:
■ girls’ school dormitory dining-table partners (Moreno, The sociometry reader, 1960)
■ first and second choices shown

Louise
Ada Lena
Adele
Marion
Jane
Cora Frances
Eva Maxine Mary
Anna Ruth
Edna
Robin Martha Betty

Jean
Laura
Alice
Helen Hazel Hilda
Ellen

Ella
Irene
Universität Mannheim – Bizer: Web Structure Mining – FSS2024 (Version: 22.03.2024) – Slide 16
Graph Terminology

Definition: The degree of a vertex in an undirected


graph is the number of edges incident with it, except that
a loop at a vertex contributes twice to the degree of that
vertex.
In other words, you can determine the degree of a vertex
in a displayed graph by counting the lines that touch it.
The degree of the vertex v is denoted by deg(v).
Example: Degrees of Undirected and Directed
Graphs
Undirected

Degree: the number of edges connected to the vertex.


A
kA = kB =
B
1 4

In directed graphs we can define an in-degree and out-


D
B
degree. The (total) degree is the sum of in- and out-degree.
Directed

G
E k Cin = 2 k Cout = 1 kC = 3
A

Source: a vertex with kin= 0 and kout> 0


Sink: a vertex with kout= 0 and kin> 0
Universität Mannheim – Bizer: Web Structure Mining – FSS2024 (Version: 22.03.2024) – Slide 18
Graph Terminology

A vertex of degree 0 is called isolated, since it is not


adjacent to any vertex.
Note: A vertex with a loop at it has at least degree 2
and, by definition, is not isolated, even if it is not
adjacent to any other vertex.
A vertex of degree 1 is called pendant. It is adjacent to
exactly one other vertex.
Graph Terminology
Example: Which vertices in the following graph are
isolated, which are pendant, and what is the maximum
degree?
f h
d
a e

b c f j

Solution: Vertex f is isolated, and vertices a, d


and j are pendant. The maximum degree is
deg(g) = 5.
Degree

Degree: Number of edges adjacent to


j

In-degree:
i

Out-degree:

1 2

3 4
Universität Mannheim – Bizer: Web Structure Mining – FSS2024 (Version: 22.03.2024) – Slide 21
Degree Distribution

Summarizes the degrees of all vertices.


Alternative representations:
1. A frequency count of the vertices of each degree

2. P(k): probability that a randomly chosen vertex has degree k

5
P(k
)
4

3
0.6 P(k) = Nk / N
0.5
frequency

2 0.4
1
0.3
0.2
0
0 1 2 0.1
indegree
1 2 3 4 k

Universität Mannheim – Bizer: Web Structure Mining – FSS2024 (Version: 22.03.2024) – Slide 22
Degree Distribution: Friendship on Facebook

Displayed on
log-log scale.

New
or
lonely Human or
user? robot?

Source: Zafarani, et al: Social Media Mining. Cambridge University Press, 2014.

Universität Mannheim – Bizer: Web Structure Mining – FSS2024 (Version: 22.03.2024) – Slide 23
In-Degree Distribution of the WDC Hyperlink Graph

Covers 3.5 billion web pages and 128 billion hyperlinks, extracted from Common Crawl 2012

Displayed on
log-log scale,
meaning that
left third covers
over 99% of
the mass.

Meusel, Vigna, Lehmberg, Bizer: Graph Structure in the Web - Revisited. 23rd Conference on World Wide Web
(WWW2014). Website: [Link]
Universität Mannheim – Bizer: Web Structure Mining – FSS2024 (Version: 22.03.2024) – Slide 19
Explore Common Crawl: Top In-Degree Websites

[Link] Universität Mannheim – Bizer: Web Structure Mining – FSS2024 (Version: 22.03.2024) – Slide 20
Literature

1. Zafarani, et al: Social Media Mining. Cambridge University


Press, 2014. Free online version
[Link]
2. Wasserman and Faust: Social Network Analysis. Cambridge
University Press, 1994.
3. David Easley, Jon Kleinberg: Networks, Crowds, and Markets:
Reasoning About a Highly Connected World. Cambridge
University Press, 2010. Free online version
[Link]
4. Bing Liu: Web Data Mining. 2nd Edition, Springer, 2011.

Universität Mannheim – Bizer: Web Structure Mining – FSS2024 (Version: 22.03.2024) – Slide 62

You might also like