SlideShare a Scribd company logo
Learning In Networks:
From Spiking Neural Nets To Graphs
Victor Miagkikh
Machine Learning Specialist,
Cisco Systems Inc.
Ironport business unit
Roadmap
•
•
•
•

Introduction to Hebbian learning
Bee navigation problem
Introduction to Spiking Neural Networks(SNN)
rHebb algorithm: Hebbian learning augmented
with reward controls plasticity learning principle
• Reinforcement learning applications for movies
and stock purchase recommendations
• Methodology for introducing reinforcement
learning in networks
• Q&A
Networks
• There are many different kinds of networks around us
Learning in Networks

• Suppose we would like to add a connection
between San Francisco and Orlando. Is it a
good thing to do?
Neural Networks

• There is no external analyst in the brain who
changes connections between neurons. What
principles make learning possible?
Hebbian Learning
• Let us assume that the persistence or
repetition of a reverberatory activity (or
"trace") tends to induce lasting cellular
changes that add to its stability.…
When an axon of cell A is near enough
to excite a cell B and repeatedly or
persistently takes part in firing it, some
growth process or metabolic change
Donald Olding Hebb
takes place in one or both cells such
1904-1985
that A's efficiency, as one of the cells
firing B, is increased. (Hebb, The
organization of behavior, 1949)
Hebbian Learning (Cont.)
• In other words: if two neurons fire “close in time”
then strength of synaptic connection between
them increases.

∆wij (t ) = ην iν j g (tvi , tv j )
v1

“close”

v3
v2

tvi

timetime

tv j

• Weights reflect correlation between firing events.
Bee Navigation Problem
• A bee has to correlate signs in order to find the
feeder (S. W. Zhang et al., Honeybee Memory:
Navigation by Associative Grouping and Recall
of Visual Stimuli, 1998).
Spiking Neural Networks
• Short term memory is needed in order to solve
this problem.
• Spiking Neural Networks (SNN) have “built in”
short term memory.
Perceptron
x1

x1

w1

x 2 w2
w3
x3 w
0
bias

Spiking Neuron

∑

y

y = Activation _ Function(∑ wi xi )

w1

x 2 w2
x3

∫

y

w3
E (t ) = E (t − 1)decay +

t

∫∑w x

i i

t −1

If E(t)>threshold -> emit spike; E(t+1)=0
SNN Solution
Input

N1_1
i1,v*
i1
i2
i3

∫

v1_1 i2,v*

N2_1
i4,v*

i4

∫

Output

N1_2

∫

v1_2

L R

N2_2
v2_1 i3,v*

∫

v2_1

Stage 1
Stage 2

v2_2
v2_2
v3_1

i5

N3_1

N3_2

i6
i6,v*

∫

v3_1

i5,v*

∫

v3_2
v3_2

Stage 1
Stage 2
Evolution of Weights
Input

N1_1

∫

i1,v*
i1
i2
i3

∫

v1_1 i2,v*

N2_1
i4,v*

i4

∫

Output

N1_2
v1_2

L R

N2_2

∫

v2_1 i3,v*

v2_1
v2_2
v3_1

i5

N3_1

N3_2

i6
i6,v*

Iteration 25

Stage 1
Stage 2

v2_2

∫

N1_1

v3_1

∫

i5,v*

v3_2
v3_2

N1_2

N2_1

N2_2

N3_1

N3_2

0.5188

0.1881

1.0356

0.2256

N1_1

N/A

0.0872

N1_2

0.0872

N/A

0.17953

0.4338

0.0831

0.4935

N2_1

0.5188

0.17953

N/A

0.0931

2.4867

0.2107

N2_2

0.1881

0.4338

0.0931

N/A

0.1325

0.31944

N3_1

1.0356

0.0831

2.4867

0.1325

N/A

0.1296

N3_2

0.2256

0.4935

0.2107

0.31944

0.1296

N/A
Evolution of Weights (Cont.)
Input

N1_1

∫

i1,v*
i1
i2
i3

∫

v1_1 i2,v*

N2_1
i4,v*

i4

∫

Output

N1_2
v1_2

L R

N2_2

∫

v2_1 i3,v*

v2_1
v2_2
v3_1

i5

N3_1

N3_2

i6
i6,v*

Iteration 50

Stage 1
Stage 2

v2_2

∫

N1_1

v3_1

∫

i5,v*

v3_2
v3_2

N1_2

N2_1

N2_2

N3_1

N3_2

N1_1

N/A

0.0743

0.9007

0.1819

2.6617

0.2256

N1_2

0.0743

N/A

0.0973

1.7223

-0.6328

0.6212

N2_1

0.9007

0.0973

N/A

0.0652

8.579

0.2107

N2_2

0.2107

1.7223

0.0652

N/A

0.0453

0.4476

N3_1

0.6212

-0.6328

8.579

0.0453

N/A

0.0978

N3_2

0.2256

0.6212

0.2107

0.4476

0.0978

N/A
rHebb Algorithm
Assumption#1 - Hebbian Learning:
The strength of the connections between the neurons that
fire “close in time” increases (subject to assumption#2 ).
∆wij = η ⋅ f ( vi , v j , ∆tij ) = η ⋅ vi ⋅ v j ⋅ g ( ∆tij ) = η ⋅ vi ⋅ v j ⋅ e

− k1∆tij

vi, vi outputs of neurons i and j
Assumption#2

∆tij - difference in time between firings of neurons i and j

Reward controls plasticity (the ability to change weights)
Therefore, the changes in weights as defined by Hebbian
learning occur not all the time but are controlled in sign
and intensity by the reward.
− k ∆t
∆wij = η ⋅ reward ⋅ vi ⋅ v j ⋅ g ( ∆tij ) = η ⋅ reward ⋅ vi ⋅ v j ⋅ e 1 ij
rHebb Algorithm (Cont.)
Assumption#3
Temporal credit assignment
vi

vj

i

j

reward
r

time

∆wij = η ⋅ reward ⋅ vi ⋅ v j ⋅ g ( ∆tij ) ⋅ h( ∆tir , ∆t jr ) =

∆wij = η ⋅ reward ⋅ vi ⋅ v j ⋅ e

− k1∆tij

∆wij = η ⋅ reward ⋅ vi ⋅ v j ⋅ e

⋅e

− k 2 ( ∆tir + ∆t jr )

− k1∆tij − k 2 ∆tir − k 2 ∆t jr
rHebb Algorithm (Cont.)
• Weights in rHebb reflect not correlations, but
utilities.
• Related work: C.M. Pennartz, Reinforcement
Learning by Hebbian Synapses with Adaptive
Thresholds, Neurocince, 1997 (Caltech).
• rHebb is biologically plausible.
• Could “reward control plasticity” learning axiom be
used for training not only spiking neural networks,
but other kinds of networks?
Movie Recommendation Problem
Problem: given a list of favorite movies, a wish
list, and demographic information (age, gender,
zip code) try to come up with a movie
recommendation list.
Matrix Reloaded
Matrix

13th Floor

12 Monkeys
th
Johnny Mnemonic 5 Element
Constantine

- Seen or in queue movies.
Set HQ (history + queue).
- Possible recommendations. Set A.
Reinforcement Learning in Movie
Network
•

Let’s assume that for some movies in H set we have personal ratings. For
combined HQ set we can introduce a reward function G(HQi) that returns 0.5
for unseen, and neutral movies, +1 for the best movie ever made, and 0 for
the worst movie according to a personal taste.

•

For example: H = <13th Floor, 5th Element >, Q = <Matrix>.
G(HQi) = <0.7, 0.5, 0.5>.

Matrix

Johnny Mnemonic

•
•

Matrix Reloaded
13th Floor
12 Monkeys

5th Element
Constantine
Let’s introduce the set R that is composed of movies that our system
recommended. For example above, let R = Q = <Matrix>
At some point user watches a movie in R or some other movie not in R. If
user watches a movie in R system gets a reward G(HQi). If not then system
gets reward = - small_penalty. Thus, we can get a sequence of <m, reward>
pairs for each user. Let’s call this set T.
Reinforcement Learning in Movie
Network (Cont.)
• The set T could be used to adjust connections in Cij in
reinforcement learning mode. Consider a <m, reward>
pair. Given m we know average cumulative flow through
each edge and vertex to all vertices in H set. Let’s
denotes them as fn(i,m,H) and fe(i,j,m,H).
• Then we can update edges as follows:

∆Cij = η e ⋅ reward ⋅ fe(i, j , m, H )
• And vertices:

∆Cii = η n ⋅ reward ⋅ fn(i, m, H )

• Thus, we got updated Cij and we can do it for entire
population, a segment or a customer.
Reinforcement Learning in Networks
• Generally, flow in network and Hebbian learning are
examples of eligibility trace e(t) that is a degree of
participation of each functional unit in decision making
that caused the action.
• e(t) * reward pattern is an example of doing structural
credit assignment in a learner. It’s very intuitive:
distribute reward according to participation.
• Structural credit assignment is different in each kind of
learner, but the pattern e(t) * reward is a good rule of
thumb to introduce RL.
• In general, RL can have some place when the only
feedback that we have from environment is a
reinforcement signal that doesn’t allow the use of
supervised learning.
An Approach for Solving NetFlix Problem
• Bipartite graph representation:
u1
u2
u3

m1
m1

m2

m2

m3

m4

0.3

0

0

0

u2

m3

u1

-0.1

0

0

0.9

u3

0.1

0.4

-0.2

0

m4

• After finding network equilibrium:
u1
u2
u3

m1

m1

m2

m3

m4

m2

u1

0.3

0.1

-0.1

0.2

u2

-0.1

0.1

0.15

0.9

m3

u3

0.1

0.4

-0.2

0.05

m4
Example: Influence Networks for Economy
• Consider the following network where nodes represent
companies and connections represent mutual
influences. Stock price could be considered as the
output of a company node similar to the output of a
neuron.
• We can apply Hebbian learning = the strength of
connection between two neurons that fire at the same
time increases.
Intel

IBM

Input: stock price s(t) for any company ci
Output: Strength of influence connections
w(ci,cj)

Xerox

AMD

Learning Rule (Hebbian learning):
∆wci c j (∆t ) = η (∆si (∆t ) ⋅ ∆s j (∆t ))

21
Example: Influence Networks for
Economy (Cont.)
• We can easily extend simple Hebbian Learning to
“Fuzzy” Hebbian Learning:
∆t

∆w = η ∫ f ci (t ) f c j (t )dt / ∆t
0

×

Defuzzyfication

=

(center of gravity)
ˆ
∆w = η ∫ f (t )dt / ∆t

• We can train network for various delta t. The
meaning of weight becomes timed correlation of
stock price.
Portfolio Analyses
Intel

HP

- some stock in portfolio
IBM
- no stock in portfolio
Apple

Xerox

AMD
P&G

In general: portfolio is represented
by weights on vertices

• Characteristics we can compute: diversity, coverage,
competition within portfolio, company influence.
• Maximum flow between any two vertices could be used to
calculate cumulative influence.
• Key idea in stock purchase recommendations: if a fluctuation
in a stock price of one company is observed then the network
can tell the stock price of which companies will start to
fluctuate.
Multi Aspect Influence Model
Intel1
L1
evt
Xerox1
Intel2
L2
evt
Xerox2

s(t)

IBM1 Stock price could be seen as a form of

reinforcement signal reflecting future
expected reward of a company allowing
AMD1 the use of reinforcement learning.
Each “plane” Li models separate dimension
IBM2 of influences. It can represent involvement
of a company in different sectors of
economy. In general, each plane could be
AMD2
a result of decomposition into some other
more abstract basis.
IBM3

Intel3
L3
evt
Xerox3

AMD3

Each company node cj exist on each Li.
Forming vertical subnetworks that could be
used to model transformations within a
company.
Reward Controls Plasticity Learning
Principle
• -e(t) * error pattern used in supervised learning
for long time.
• e(t) * reward is a generic principle to introduce
reinforcement learning.
• e(t) could be any causality inducing principle,
not necessarily Hebbian learning.
• There is huge area of applicability. Depending
on application, reward could be defined as
number of users clicking on online ad or
number of messages caught by anti-spam
system.
• Q&A

More Related Content

PPT
nural network ER. Abhishek k. upadhyay
PPT
Principles of soft computing-Associative memory networks
PDF
Fundamental, An Introduction to Neural Networks
PPT
Back propagation
PPTX
Artificial neural network
PPTX
A temporal classifier system using spiking neural networks
PPTX
04 Multi-layer Feedforward Networks
PPTX
nural network ER. Abhishek k. upadhyay
Principles of soft computing-Associative memory networks
Fundamental, An Introduction to Neural Networks
Back propagation
Artificial neural network
A temporal classifier system using spiking neural networks
04 Multi-layer Feedforward Networks

What's hot (20)

PDF
Artificial Neural Networks Lect3: Neural Network Learning rules
PPSX
Perceptron (neural network)
PDF
03 neural network
PDF
Artificial Neural Networks Lect2: Neurobiology & Architectures of ANNS
PDF
The Back Propagation Learning Algorithm
PPT
Nural network ER. Abhishek k. upadhyay
PDF
Artificial Neuron network
PPT
2.5 backpropagation
PPT
backpropagation in neural networks
DOCX
Backpropagation
PPT
Ann
PPTX
Neurally Controlled Robot That Learns
PDF
Machine Learning: Introduction to Neural Networks
POT
Artificial neural network for concrete mix design
PPT
Adaline madaline
PPTX
Artificial Neural Network
PPTX
Artificial neural networks (2)
PPTX
Artificial neural network - Architectures
PDF
Introduction to Applied Machine Learning
ODP
Artificial Neural Network
Artificial Neural Networks Lect3: Neural Network Learning rules
Perceptron (neural network)
03 neural network
Artificial Neural Networks Lect2: Neurobiology & Architectures of ANNS
The Back Propagation Learning Algorithm
Nural network ER. Abhishek k. upadhyay
Artificial Neuron network
2.5 backpropagation
backpropagation in neural networks
Backpropagation
Ann
Neurally Controlled Robot That Learns
Machine Learning: Introduction to Neural Networks
Artificial neural network for concrete mix design
Adaline madaline
Artificial Neural Network
Artificial neural networks (2)
Artificial neural network - Architectures
Introduction to Applied Machine Learning
Artificial Neural Network
Ad

Similar to Learning in Networks: were Pavlov and Hebb right? (20)

PDF
Python + Tensorflow: how to earn money in the Stock Exchange with Deep Learni...
PPT
Artificial Neural Networks-Supervised Learning Models
PPT
Artificial Neural Networks-Supervised Learning Models
PPT
Artificial Neural Networks-Supervised Learning Models
PPTX
Visualization of Deep Learning
PDF
JAISTサマースクール2016「脳を知るための理論」講義04 Neural Networks and Neuroscience
PDF
Generative modeling with Convolutional Neural Networks
PPT
Unsupervised-learning.ppt
PPTX
Deep learning from scratch
PPTX
19 - Neural Networks I.pptx
PPTX
Artificial Intelligence, Machine Learning and Deep Learning
PPTX
Unit iii update
PPTX
Batch normalization presentation
PPT
this is a Ai topic neural network ML_Lecture_4.ppt
PPTX
Introduction to Deep learning and H2O for beginner's
PDF
Week 3 Deep Learning And POS Tagging Hands-On
PDF
Matching networks for one shot learning
PDF
GDRR Opening Workshop - Deep Reinforcement Learning for Asset Based Modeling ...
PDF
Deep learning for molecules, introduction to chainer chemistry
PDF
Convolutional Neural Networks (CNN)
Python + Tensorflow: how to earn money in the Stock Exchange with Deep Learni...
Artificial Neural Networks-Supervised Learning Models
Artificial Neural Networks-Supervised Learning Models
Artificial Neural Networks-Supervised Learning Models
Visualization of Deep Learning
JAISTサマースクール2016「脳を知るための理論」講義04 Neural Networks and Neuroscience
Generative modeling with Convolutional Neural Networks
Unsupervised-learning.ppt
Deep learning from scratch
19 - Neural Networks I.pptx
Artificial Intelligence, Machine Learning and Deep Learning
Unit iii update
Batch normalization presentation
this is a Ai topic neural network ML_Lecture_4.ppt
Introduction to Deep learning and H2O for beginner's
Week 3 Deep Learning And POS Tagging Hands-On
Matching networks for one shot learning
GDRR Opening Workshop - Deep Reinforcement Learning for Asset Based Modeling ...
Deep learning for molecules, introduction to chainer chemistry
Convolutional Neural Networks (CNN)
Ad

Recently uploaded (20)

PPTX
Skill Development Program For Physiotherapy Students by SRY.pptx
PPTX
Revamp in MTO Odoo 18 Inventory - Odoo Slides
PPTX
Odoo 18 Sales_ Managing Quotation Validity
PPTX
An introduction to Prepositions for beginners.pptx
PPTX
Open Quiz Monsoon Mind Game Prelims.pptx
PPTX
ACUTE NASOPHARYNGITIS. pptx
PPTX
IMMUNIZATION PROGRAMME pptx
PPTX
Nursing Management of Patients with Disorders of Ear, Nose, and Throat (ENT) ...
PPTX
Congenital Hypothyroidism pptx
PPTX
family health care settings home visit - unit 6 - chn 1 - gnm 1st year.pptx
PDF
Mga Unang Hakbang Tungo Sa Tao by Joe Vibar Nero.pdf
PDF
Phylum Arthropoda: Characteristics and Classification, Entomology Lecture
PDF
High Ground Student Revision Booklet Preview
PPTX
PPTs-The Rise of Empiresghhhhhhhh (1).pptx
PDF
LDMMIA Reiki Yoga S2 L3 Vod Sample Preview
PPTX
An introduction to Dialogue writing.pptx
PDF
Module 3: Health Systems Tutorial Slides S2 2025
PDF
3.The-Rise-of-the-Marathas.pdfppt/pdf/8th class social science Exploring Soci...
PPTX
Introduction and Scope of Bichemistry.pptx
PDF
5.Universal-Franchise-and-Indias-Electoral-System.pdfppt/pdf/8th class social...
Skill Development Program For Physiotherapy Students by SRY.pptx
Revamp in MTO Odoo 18 Inventory - Odoo Slides
Odoo 18 Sales_ Managing Quotation Validity
An introduction to Prepositions for beginners.pptx
Open Quiz Monsoon Mind Game Prelims.pptx
ACUTE NASOPHARYNGITIS. pptx
IMMUNIZATION PROGRAMME pptx
Nursing Management of Patients with Disorders of Ear, Nose, and Throat (ENT) ...
Congenital Hypothyroidism pptx
family health care settings home visit - unit 6 - chn 1 - gnm 1st year.pptx
Mga Unang Hakbang Tungo Sa Tao by Joe Vibar Nero.pdf
Phylum Arthropoda: Characteristics and Classification, Entomology Lecture
High Ground Student Revision Booklet Preview
PPTs-The Rise of Empiresghhhhhhhh (1).pptx
LDMMIA Reiki Yoga S2 L3 Vod Sample Preview
An introduction to Dialogue writing.pptx
Module 3: Health Systems Tutorial Slides S2 2025
3.The-Rise-of-the-Marathas.pdfppt/pdf/8th class social science Exploring Soci...
Introduction and Scope of Bichemistry.pptx
5.Universal-Franchise-and-Indias-Electoral-System.pdfppt/pdf/8th class social...

Learning in Networks: were Pavlov and Hebb right?

  • 1. Learning In Networks: From Spiking Neural Nets To Graphs Victor Miagkikh Machine Learning Specialist, Cisco Systems Inc. Ironport business unit
  • 2. Roadmap • • • • Introduction to Hebbian learning Bee navigation problem Introduction to Spiking Neural Networks(SNN) rHebb algorithm: Hebbian learning augmented with reward controls plasticity learning principle • Reinforcement learning applications for movies and stock purchase recommendations • Methodology for introducing reinforcement learning in networks • Q&A
  • 3. Networks • There are many different kinds of networks around us
  • 4. Learning in Networks • Suppose we would like to add a connection between San Francisco and Orlando. Is it a good thing to do?
  • 5. Neural Networks • There is no external analyst in the brain who changes connections between neurons. What principles make learning possible?
  • 6. Hebbian Learning • Let us assume that the persistence or repetition of a reverberatory activity (or "trace") tends to induce lasting cellular changes that add to its stability.… When an axon of cell A is near enough to excite a cell B and repeatedly or persistently takes part in firing it, some growth process or metabolic change Donald Olding Hebb takes place in one or both cells such 1904-1985 that A's efficiency, as one of the cells firing B, is increased. (Hebb, The organization of behavior, 1949)
  • 7. Hebbian Learning (Cont.) • In other words: if two neurons fire “close in time” then strength of synaptic connection between them increases. ∆wij (t ) = ην iν j g (tvi , tv j ) v1 “close” v3 v2 tvi timetime tv j • Weights reflect correlation between firing events.
  • 8. Bee Navigation Problem • A bee has to correlate signs in order to find the feeder (S. W. Zhang et al., Honeybee Memory: Navigation by Associative Grouping and Recall of Visual Stimuli, 1998).
  • 9. Spiking Neural Networks • Short term memory is needed in order to solve this problem. • Spiking Neural Networks (SNN) have “built in” short term memory. Perceptron x1 x1 w1 x 2 w2 w3 x3 w 0 bias Spiking Neuron ∑ y y = Activation _ Function(∑ wi xi ) w1 x 2 w2 x3 ∫ y w3 E (t ) = E (t − 1)decay + t ∫∑w x i i t −1 If E(t)>threshold -> emit spike; E(t+1)=0
  • 10. SNN Solution Input N1_1 i1,v* i1 i2 i3 ∫ v1_1 i2,v* N2_1 i4,v* i4 ∫ Output N1_2 ∫ v1_2 L R N2_2 v2_1 i3,v* ∫ v2_1 Stage 1 Stage 2 v2_2 v2_2 v3_1 i5 N3_1 N3_2 i6 i6,v* ∫ v3_1 i5,v* ∫ v3_2 v3_2 Stage 1 Stage 2
  • 11. Evolution of Weights Input N1_1 ∫ i1,v* i1 i2 i3 ∫ v1_1 i2,v* N2_1 i4,v* i4 ∫ Output N1_2 v1_2 L R N2_2 ∫ v2_1 i3,v* v2_1 v2_2 v3_1 i5 N3_1 N3_2 i6 i6,v* Iteration 25 Stage 1 Stage 2 v2_2 ∫ N1_1 v3_1 ∫ i5,v* v3_2 v3_2 N1_2 N2_1 N2_2 N3_1 N3_2 0.5188 0.1881 1.0356 0.2256 N1_1 N/A 0.0872 N1_2 0.0872 N/A 0.17953 0.4338 0.0831 0.4935 N2_1 0.5188 0.17953 N/A 0.0931 2.4867 0.2107 N2_2 0.1881 0.4338 0.0931 N/A 0.1325 0.31944 N3_1 1.0356 0.0831 2.4867 0.1325 N/A 0.1296 N3_2 0.2256 0.4935 0.2107 0.31944 0.1296 N/A
  • 12. Evolution of Weights (Cont.) Input N1_1 ∫ i1,v* i1 i2 i3 ∫ v1_1 i2,v* N2_1 i4,v* i4 ∫ Output N1_2 v1_2 L R N2_2 ∫ v2_1 i3,v* v2_1 v2_2 v3_1 i5 N3_1 N3_2 i6 i6,v* Iteration 50 Stage 1 Stage 2 v2_2 ∫ N1_1 v3_1 ∫ i5,v* v3_2 v3_2 N1_2 N2_1 N2_2 N3_1 N3_2 N1_1 N/A 0.0743 0.9007 0.1819 2.6617 0.2256 N1_2 0.0743 N/A 0.0973 1.7223 -0.6328 0.6212 N2_1 0.9007 0.0973 N/A 0.0652 8.579 0.2107 N2_2 0.2107 1.7223 0.0652 N/A 0.0453 0.4476 N3_1 0.6212 -0.6328 8.579 0.0453 N/A 0.0978 N3_2 0.2256 0.6212 0.2107 0.4476 0.0978 N/A
  • 13. rHebb Algorithm Assumption#1 - Hebbian Learning: The strength of the connections between the neurons that fire “close in time” increases (subject to assumption#2 ). ∆wij = η ⋅ f ( vi , v j , ∆tij ) = η ⋅ vi ⋅ v j ⋅ g ( ∆tij ) = η ⋅ vi ⋅ v j ⋅ e − k1∆tij vi, vi outputs of neurons i and j Assumption#2 ∆tij - difference in time between firings of neurons i and j Reward controls plasticity (the ability to change weights) Therefore, the changes in weights as defined by Hebbian learning occur not all the time but are controlled in sign and intensity by the reward. − k ∆t ∆wij = η ⋅ reward ⋅ vi ⋅ v j ⋅ g ( ∆tij ) = η ⋅ reward ⋅ vi ⋅ v j ⋅ e 1 ij
  • 14. rHebb Algorithm (Cont.) Assumption#3 Temporal credit assignment vi vj i j reward r time ∆wij = η ⋅ reward ⋅ vi ⋅ v j ⋅ g ( ∆tij ) ⋅ h( ∆tir , ∆t jr ) = ∆wij = η ⋅ reward ⋅ vi ⋅ v j ⋅ e − k1∆tij ∆wij = η ⋅ reward ⋅ vi ⋅ v j ⋅ e ⋅e − k 2 ( ∆tir + ∆t jr ) − k1∆tij − k 2 ∆tir − k 2 ∆t jr
  • 15. rHebb Algorithm (Cont.) • Weights in rHebb reflect not correlations, but utilities. • Related work: C.M. Pennartz, Reinforcement Learning by Hebbian Synapses with Adaptive Thresholds, Neurocince, 1997 (Caltech). • rHebb is biologically plausible. • Could “reward control plasticity” learning axiom be used for training not only spiking neural networks, but other kinds of networks?
  • 16. Movie Recommendation Problem Problem: given a list of favorite movies, a wish list, and demographic information (age, gender, zip code) try to come up with a movie recommendation list. Matrix Reloaded Matrix 13th Floor 12 Monkeys th Johnny Mnemonic 5 Element Constantine - Seen or in queue movies. Set HQ (history + queue). - Possible recommendations. Set A.
  • 17. Reinforcement Learning in Movie Network • Let’s assume that for some movies in H set we have personal ratings. For combined HQ set we can introduce a reward function G(HQi) that returns 0.5 for unseen, and neutral movies, +1 for the best movie ever made, and 0 for the worst movie according to a personal taste. • For example: H = <13th Floor, 5th Element >, Q = <Matrix>. G(HQi) = <0.7, 0.5, 0.5>. Matrix Johnny Mnemonic • • Matrix Reloaded 13th Floor 12 Monkeys 5th Element Constantine Let’s introduce the set R that is composed of movies that our system recommended. For example above, let R = Q = <Matrix> At some point user watches a movie in R or some other movie not in R. If user watches a movie in R system gets a reward G(HQi). If not then system gets reward = - small_penalty. Thus, we can get a sequence of <m, reward> pairs for each user. Let’s call this set T.
  • 18. Reinforcement Learning in Movie Network (Cont.) • The set T could be used to adjust connections in Cij in reinforcement learning mode. Consider a <m, reward> pair. Given m we know average cumulative flow through each edge and vertex to all vertices in H set. Let’s denotes them as fn(i,m,H) and fe(i,j,m,H). • Then we can update edges as follows: ∆Cij = η e ⋅ reward ⋅ fe(i, j , m, H ) • And vertices: ∆Cii = η n ⋅ reward ⋅ fn(i, m, H ) • Thus, we got updated Cij and we can do it for entire population, a segment or a customer.
  • 19. Reinforcement Learning in Networks • Generally, flow in network and Hebbian learning are examples of eligibility trace e(t) that is a degree of participation of each functional unit in decision making that caused the action. • e(t) * reward pattern is an example of doing structural credit assignment in a learner. It’s very intuitive: distribute reward according to participation. • Structural credit assignment is different in each kind of learner, but the pattern e(t) * reward is a good rule of thumb to introduce RL. • In general, RL can have some place when the only feedback that we have from environment is a reinforcement signal that doesn’t allow the use of supervised learning.
  • 20. An Approach for Solving NetFlix Problem • Bipartite graph representation: u1 u2 u3 m1 m1 m2 m2 m3 m4 0.3 0 0 0 u2 m3 u1 -0.1 0 0 0.9 u3 0.1 0.4 -0.2 0 m4 • After finding network equilibrium: u1 u2 u3 m1 m1 m2 m3 m4 m2 u1 0.3 0.1 -0.1 0.2 u2 -0.1 0.1 0.15 0.9 m3 u3 0.1 0.4 -0.2 0.05 m4
  • 21. Example: Influence Networks for Economy • Consider the following network where nodes represent companies and connections represent mutual influences. Stock price could be considered as the output of a company node similar to the output of a neuron. • We can apply Hebbian learning = the strength of connection between two neurons that fire at the same time increases. Intel IBM Input: stock price s(t) for any company ci Output: Strength of influence connections w(ci,cj) Xerox AMD Learning Rule (Hebbian learning): ∆wci c j (∆t ) = η (∆si (∆t ) ⋅ ∆s j (∆t )) 21
  • 22. Example: Influence Networks for Economy (Cont.) • We can easily extend simple Hebbian Learning to “Fuzzy” Hebbian Learning: ∆t ∆w = η ∫ f ci (t ) f c j (t )dt / ∆t 0 × Defuzzyfication = (center of gravity) ˆ ∆w = η ∫ f (t )dt / ∆t • We can train network for various delta t. The meaning of weight becomes timed correlation of stock price.
  • 23. Portfolio Analyses Intel HP - some stock in portfolio IBM - no stock in portfolio Apple Xerox AMD P&G In general: portfolio is represented by weights on vertices • Characteristics we can compute: diversity, coverage, competition within portfolio, company influence. • Maximum flow between any two vertices could be used to calculate cumulative influence. • Key idea in stock purchase recommendations: if a fluctuation in a stock price of one company is observed then the network can tell the stock price of which companies will start to fluctuate.
  • 24. Multi Aspect Influence Model Intel1 L1 evt Xerox1 Intel2 L2 evt Xerox2 s(t) IBM1 Stock price could be seen as a form of reinforcement signal reflecting future expected reward of a company allowing AMD1 the use of reinforcement learning. Each “plane” Li models separate dimension IBM2 of influences. It can represent involvement of a company in different sectors of economy. In general, each plane could be AMD2 a result of decomposition into some other more abstract basis. IBM3 Intel3 L3 evt Xerox3 AMD3 Each company node cj exist on each Li. Forming vertical subnetworks that could be used to model transformations within a company.
  • 25. Reward Controls Plasticity Learning Principle • -e(t) * error pattern used in supervised learning for long time. • e(t) * reward is a generic principle to introduce reinforcement learning. • e(t) could be any causality inducing principle, not necessarily Hebbian learning. • There is huge area of applicability. Depending on application, reward could be defined as number of users clicking on online ad or number of messages caught by anti-spam system. • Q&A