Rec Sys Network
Rec Sys Network
Recommender
Systems
Structural Recommendations in Networks
Example of network based
recommendation
• Networks have become ubiquitous as a modeling tool in many
applications, such as social and information networks. Therefore, it
is particularly useful to discuss various structural elements of a
network that can be recommended in different scenarios.
• Network: a collection of entities that are interconnected with links.
• people that are friends
• computers that are interconnected
• web pages that point to each other
• proteins that interact
• Networks are also called graphs, the entities are nodes, and the
links are edges
Types of Structural
Recommendations
• Recommending nodes by authority and context
• Quality and authority of a node is judged by the incoming links.
• Page-Rank algorithm is adopted by search engines for this purpose. But, it is not personalized.
• Personalized page rank can be used for this purpose
• Recommendation from reputed nodes
• Recommending nodes by example
• Nodes are tagged with similar properties
• Closely related to neighbourhood based approaches
• Target marketing
• Recommending nodes by influence and content
• Nodes with potential to disseminate for information based on their connectivity.
• Social influencer and viral marketing
• Recommending links
• Recommending links to increase the size of a network
• Friend suggestions
Recommending nodes
by authority and context
Recommending nodes by authority
and context
• A health drinks producer wishes to find a brand ambassador for a
newly introduced item. An appropriate person would be whom
everyone respect in this domain. How to find such a person in a social
network?
• Page rank algorithm is the basis
• Initially proposed for ranking Web pages
• achieved using the citation structure of the Web
• A citation can be logically viewed as a vote for the Web page
• To provide a more holistic citation-based vote PageRank is used. The PageRank algorithm
generalizes the notion of citation-based ranking in a recursive way
• Page rank algorithm is not personalized
• Personalized Page Rank
Foundations of page rank algorithm
• In-links of page : The hyperlinks that point to page i from other pages.
• Out-links of page : The hyperlinks that point out to other pages from
page .
• A hyperlink from a page pointing to another page is an implicit
conveyance of authority to the target page. Thus, the more in-links
that a page receives, the more prestige the page has.
• Pages that point to page also have their own prestige scores. A page
with a higher prestige score pointing to is more important than a
page with a lower prestige score pointing to . In other words, a page is
important if it is pointed to by other important pages.
Foundations of page rank algorithm: Markov Model
• Treat the Web as a directed graph , where V is the set of vertices or
nodes, i.e., the set of all pages, and E is the set of directed edges in the
graph, i.e., hyperlinks.
• Let is the total number of pages and is the number of out-links of page.
• Let be the state transition probability matrix, where
A state i is periodic with period k > 1 if k is the smallest number such that all paths leading
from state i back to state i have a length that is a multiple of k. If a state is not periodic (i.e.,
k = 1), it is aperiodic. A Markov chain is aperiodic if all states are aperiodic.
Assumption to deal with the problem:
The random surfer has two options:
1. With probability d, he randomly chooses an out-link to follow.
2. With probability 1d, he jumps to a random page without a link.
where E is eeT (e is a column vector of all 1’s) and thus E is a nn square matrix of all 1’s.
1/n is the probability of jumping to a particular page. n is the total number of nodes in the
graph. It is assumed that A has already been made a stochastic matrix.
Using d = 0.9
This matrix can converge to a steady state in the long run. Thus, after simplification
using steady state equations
Or equivalently
The power iteration method for PageRank
Modifications to page rank for
recommending nodes in a
personalized setting
• In personalized page rank algorithm, a personalization vector is
multiplied to the transition probability matrix.
• The personalization vector has one entry per node. If the node is of
interest then the corresponding entry in the vector takes the value 1
otherwise it is zero.
• This enables discovering topic sensitive authoritative nodes.
• The personalized PageRank approach can also be used to discover the
neighborhoods in user-item graphs or user-user graphs in traditional
collaborative filtering applications.
Recommending nodes by example
Finding the nodes with similar
interests
• An manufacturer of golf equipment wishes to target few nodes for
marketing. He must select those node, who are interested in golf. The
interests can be inferred from their own posts, the likes for the others
posts, tagging the related items/news etc.
• Finding such users utilizes the concept of homophily in a social
network, which says, nodes with similar properties are usually
connected.
• Therefore, the profile, properties, and ratings of the neighbourhood
node can be leveraged to make recommendations.
Recommendation by collective
classification
• The actors with specific interest in the network can be
specified with the use of labels. Therefore, a subset of the
nodes are associated with labels.
• It is desired to use these labels as training data to determine
the labels of the other nodes where they are unspecified. It
is assumed that for labeled nodes, the index of the label is
drawn from{1 . . . r}.
• Like the collaborative filtering problem, this is also an
incomplete data estimation problem, except that it is done
in the context of network structures.
Label propagation for classification
Source: velog.io
Challenges
•Convergence is not guaranteed.
•Node feature information was not used.
→ This is because only node labels and network information are used.
Recommendation by collective
classification
• Because nodes with similar properties are usually connected, it is reasonable to
assume that this is also true of node labels. A solution to this problem is to
examine the k labeled nodes in the proximity of a given node and report the
majority label.
• This approach is, the network analog of a nearest neighbor classifier. However,
such an approach is generally not possible in collective classification because of
the sparsity of node labels.
• In order to handle sparsity, one must not only use the direct connections to
labeled nodes, but also use the indirect connections through unlabeled nodes.
• Two widely discussed algorithms in this regard are:
• Iterative classification algorithm
• Random walk-based method
Iterative classification algorithm
• Network G = (N,A)
• Class labels: drawn from {1 . . .r}.
• The total number of nodes is denoted by n, from which nt nodes are
unlabeled test nodes.
• Each edge (i, j) ∈ A is associated with the weight wij .
• Node i has two types of features (Content feature, Link feature
• The content Xi is available at the node i in the form of a
multidimensional feature vector.
• ICA algorithm derives a set of link features in addition to the available content features
• A link feature is generated for each class, containing the fraction of its incident nodes
belonging to that class. For each node i, its adjacent node j is weighted by wij for
computing its credit to the relevant class.
Iterative classification algorithm
Recommending nodes by
influence and content
Recommending nodes by influence
and content
• You want to choose few nodes (because you may be budget constrained)
for viral marketing of your product. How to choose few node to ensure
maximum coverage in the network.
• (Influence Maximization) Given a social network G = (N,A), determine a
set of k seed nodes S, influencing which will maximize the overall spread
of influence in the network.
• Each model or heuristic can quantify the influence level of a node with
the use of a function of S that is denoted by f(·). This function maps
subsets of nodes to real numbers representing influence values.
Therefore, after a model has been chosen for quantifying the influence
f(S) of a given set S, the optimization problem is that of determining the
set S that maximizes f(S).
Recommending nodes by influence
and content
• An interesting property of a very large number of influence analysis models is
that the optimized function f(S) is submodular.
• It is a mathematical way of representing the natural law of diminishing
returns, as applied to sets. In other words, if S ⊆ T , then the additional
influence obtained by adding an individual to set T cannot be larger than the
additional influence of adding the same individual to set S.
• Thus, the incremental influence of the same individual diminishes, as larger
supersets of cohorts are available as seeds.
• Two common approaches for defining the influence function f(S) of a set of
nodes S are the Linear Threshold Model and the Independent Cascade Model.
Linear Threshold Model
the algorithm initially starts with an active set of seed nodes S and iteratively
increases the number of active nodes based on the influence of neighboring
active nodes. Active nodes are allowed to influence their neighbors over
multiple iterations throughout the execution of the algorithm until no more
nodes can be activated. The influence of neighboring nodes is quantified with
the use of a linear function of the edge-specific weights bij. For each node i in
the network G = (N, A), the following is assumed to be true: