Reinforced, Incremental and Cross-lingual Event Detection From Social Messages(2022)
github address: GitHub - RingBDStack/FinEvent: Code for "Reinforced, Incremental and Cross-lingual Event Detection From Social Messages"
FinEvent是GNN event detection method for social streaming messages,利用强化学习RL从加权异构图中挑选neighbor,然后利用GAT作为聚合器Aggregator将邻居节点embedding聚合成最终的single node embedding,继而输入到DBSCAN中做event detection。
其中,cross-lingual是基于迁移学习transfer learning做的,target domain:少量标签数据。
目录
2. problem formulation and notations
3.2 Incremental Learning Framework
3.3 cross-lingual transferring mechanism
4.1 heterogeneous: HINs(heterogeneous information networks)
4.2 multi-agent reinforced weighted multi-relational Graph Neural Network framework(MarGNN)
4.2.1 Reinforced Neighbor Selection 强化学习挑选邻居节点
4.2.2 Weighted Relation-aware Neighbor Aggregation 加权的关系感知邻居聚合
4.3 Balanced Sampling Strategy based Contrastive Learning Mechanism(BasCL)
4.4 DRL-DBSCAN: Deep Reinforcement Learning(DRL) guided DSSCAN model
6.2 online(incremental evaluation)
6.4 Cross-lingual Transferring Evaluation
Abstract
- background: detecting hot social events
streaming nature of social messages -> incremental models
problem: ambiguous events features, dispersive text contents, and multiple languages => low accuracy and generalization ability。模糊的事件特征,分散的文本内容,多语言 => 低准确率和泛化能力
solution: FinEvent
1) model social messages into heterogeneous graphs: rich meta-semantics & diverse meta-relations.
-> convert them to weighted multi-relational messages graphs 加权多关系消息图
2) solution: new reinforced weighted multi-relational GNN framework
multi-agent reinforcement learning to select optimal aggregation thresholds 多代理强化学习学习最佳聚合阈值
-> 扩展:reinforcement learning 本质就是学习最佳参数、阈值。
->获得 social message embeddings 向量
problem: long-tail problem in social event detection.
solution: balanced sampling strategy 均衡采样策略 + contrastive learning mechanism 对比学习机制
-> incremental social message representation learning
3) Deep Reinforcement Learning + DBSCAN 深度强化学习 + DBSCAN聚类
- the optimal minimum number of samples -> to form a cluster
- the optimal minimum distance between two clusters -> in social event detection masks
4) incremental social message representation learning
- knowledge preservation
- GNN
-> 实现cross-lingual social event detection.
1. Introduction
1) meaning
2) social event: the combination of social messages
1.1 现有研究方法问题
problem-1: event-related heterogeneous elements 事件相关的异构元素
solution-1: HIN(heterogeneous information networks),2017年提出
--> 扩展:那我用个2019年提出的HAN不过分吧
problem-2: how to learn more discriminative embedding of social messages 如何学习一个更有区别性的social messages embedding向量
因为contents of messages: overlapping重叠、redundant冗余、discrete离散、noisy nature of messages stream 消息流的噪音特性 <- semantically rich event detection task
=> challenge-1: how to model social messages and design a more discriminative and explanatory social message embedding framework 如何学习一个更有区别性和解释性的消息向量
problem-3: the number of messages(samples) contained in each event is relatively imbalanced 事件消息分布不均衡
=> challenge-2:long-tail problem 数据长尾分布问题
降低检测方法性能performance,造成差的推广性generalization
solution: streaming clustering technology 消息流聚类技术
problem-4: incremental detection on streaming messages and cross-lingual detection 增量检测和跨语言检测
time attributes, the number of social events also increases in social message streams因为时间属性,消息流中的事件数目也在变化
solution: semantic incremental event detection framework 语义增量事件检测框架
problem-5: cross-lingual messages lead to inconsistencies in the semantic embedding space of underlying words or entities 跨语言消息导致底层单词或实体在语义嵌入空间中不一致!
=> challenge-3: how to implement cross-lingual social event detection, and event generalize to low-resource language messages data. 如何实现跨语言社交事件检测,并向低资源语言扩展。
Fin-Event Method
1.2 Contributions
1. weighted multi-relational graphs -> preserve richer structural and statistical feature
2. MarGNN framework
- RL -> learn optimal preserving thresholds to select top-p neighbors
- reasonably retains and integrates the most top-p valuable semantic and structural information from each relation 整合top-p的语义和结构信息 of each relation
3. DRL-DBSCAN
-> to realize social event clustering detection tasks without manual parameters
4. Crlme
-> cross-lingual social event detection
2. problem formulation and notations
1) social stream
2) social event
a set of correlated social messages that discuss the same real-world happening event
3) HIN
4) weighted multi-relational message graph
5) social event detection algorithm
6) incremental social event detection algorithm
7) cross-lingual social event detection algorithm
3. Incremental Model life-cycle
incremental life-cycle mechanism -> streaming nature
3.1 life-cycle mechanism
message graph construction <-> model training & detecting
- pre-training stage, small social messages in advance -> initial weighted multi-relational message graph 初始化加权多关系消息图
- detection stage, update the weighted multi-relational message graph with new message block 更新加权多关系消息图
- maintenance stage, remove obsolete messages -> re-trained model with updated graph 用更新后的图重新训练模型
3.2 Incremental Learning Framework
problem: generalization challenges in incremental social event detection
solution: incremental learning + life-cycle mechanism
our architecture ->processess various elements in social streams
message embedding
- HIN -> to extract different relations according to meta-path instances
- update its embedding space
- GNNs-> tune parameter + preserve helpful knowledge
- MarGNN
3.3 cross-lingual transferring mechanism
problem: cross-lingual problem
solution:
- preserved parameters in MarGNN in the detection stage 保留检测阶段MarGNN的参数
- extend MarGNN by rescuming the training process using incoming data in the maintenance stage 在维护阶段用新数据继续训练extend MarGNN model
4. FinEvent Method
proposal: FinEvent(reinForced, incremental, and cross-lingual social Event detection architecture from steaming social messages)
Fin-Event method可以概括为以下5部分:
- preprocessing
- message embedding
- training
- detection
- transferring
4.1 heterogeneous: HINs(heterogeneous information networks)
organize event-related elements and relations(表示异构图的某一元素) of various types into one unified graph structure.将不同种类的元素和关系组织成一个统一的图结构
problem: previous methods converting heterogeneous graph to homogeneous graph by using meta-path instances。将异构图转换成同构图,容易丢失语义semantics和结构信息structural information。
solution: a weighted multi-relational graph。加权多关系图
model the association between social messages, reserving the number of meta-path instances as different weight of edge/relation.
input: original social streaming messages 原始社交消息 -> HIN model ->
mid: heterogeneous social message graph(HIN) 异构社交消息图 to prevent the loss of heterogeneous information -> mapping ->
output: weighted multi-relational graph 加权多关系图 to save richer connection information
nodes: a series of message collections M with d-dimensional features.
edges: belonging to different relations will be established respectively
4.2 multi-agent reinforced weighted multi-relational Graph Neural Network framework(MarGNN)
Essence本质:Aggregator可以将neighbor embedding合并为node embedding
essence: 多代理强化学习 learn optimal weights -> select neighbor nodes,然后加权的多关系图神经网络 -> 生成social message embedding vector 不是event embedding vector
- GNN. learn representations from semantic and structural information of social messages 从社交消息的语义和结构信息中学习embedding representation
- multi-agent Actor-critic algorithm(AC) -> to learn optimal numbers/thresholds for each relation. 多代理强化学习最佳number和阈值 for each relation
-> guide intra-relation and inter-relation messages aggregations.
input: weighted multi-relational graph neural network 加权多关系图神经网络
mid: RL select neighbors for different relations 和 obtain the aggregation of all messages using multi-agent reinforcement learning 多代理强化学习为不同关系挑选neighbors & 消息聚合权重
output: GNN 生成social message embedding <- containing semantics and structural information
4.2.1 Reinforced Neighbor Selection 强化学习挑选邻居节点
problem: some meaningless links between social information 社交信息间的无效连接relation
solution: sample each relation before aggregation to retain neighbors with high semantic and structural connections 在聚合之前sampling relation(即 select neighbors under each relation),以保留高语义和结构信息的连接
problem: different relations in the multi-relational graph have different degrees of impurities and collectively affect the embedding results 多关系图中不同的relations有不同的不纯度,并联合影响最终的embedding vector。
solution: a collaborative learning method 合作学习方法 去寻找平衡 to find the balance between different relations
problem: previous neighbors selecting methods: Bernoulli Multi-armed Bandit process or attention mechanism no longer applicable in increasing detection 以前的邻居选择方法: 伯努利多臂强盗过程或注意力机制都不再适用于增量检测
solution: multi-agent reinforcement learning performs top-p neighbor sampling before aggregation 多代理强化学习在聚合前采样top-p的neighbors
- sort the neighbors of each node under the relation r
- establish an agent for each relation as the selector
RL how to select neighbors? the agent of each relation will learn in the game how to find the balance between relations in the task of streaming social detection
four elements(Nagg;Aagg; Sagg;Ragg),
- state: preserving thresholds of different relations jointly affect the final aggregation effect
- preserving thresholds of all relations 所有关系预留的阈值作为weights-> aggregating neighbor node representation 聚合邻居节点embedding表示-> calculate the average weighted distance under one relation 计算某个neighbor relation下的平均加权距离
- Action: the preserving threshold p under relation r in epoch k
- Reward: find the best aggregation scheme to obtain the best clustering performance of the message <- NMI互信息,即找到all relations 最佳的聚合权重aggregating weights
- Optimization: Actor-critic algorithm -> to select actions according to the state through the actor-network and finally obtains the same reward to update the loss function.
4.2.2 Weighted Relation-aware Neighbor Aggregation 加权的关系感知邻居聚合
为了更好的指导weighted multi-relational Graph Neural Network(MarGNN) to learn the message embedding
intra-relation aggregation 关系内聚合
- participating neighbor messages are controlled by the preserving threshold 参与的邻居消息controlled by预留的阈值
- the process expressed as the aggregation process of the message mi belonging to relation r at the l-th layer这个过程可以被表示为属于关系r的消息mi在第l-th层的聚合过程
input: embedding vector hj of each neighbor message mj of message mi 邻居消息嵌入向量
mid: summation aggregation operator of all neighbor messages embedding 所有邻居节点embedding 求和聚合操作
Notice: 在l-1层向l层传递时,mj作为mi的邻居节点,也可以其他节点的中心节点!
model: GAT(multi-head attention mechanism of Graph Attention Netowrk)
multi-heads:用到多个query对一段文本进行多次attention操作,其中每个query都关注到原文不同的部分,相当于重复做多次单层attention。
output: 经过multi-attention拼接平均处理后得到 comprehensive neighbor message embedding vector under each relation r,即一个中心节点在each relation下只有一个综合neighbor embedding vector
inter-relation aggregation 关系间聚合
- 中心节点 relation embeddings 拼接
- the preserving threshold of each relation is used as the weight of the relation embedding 预留阈值在relation关系(异构图不同属性)间聚合时作为relation embedding的权重
input: 中心节点mi在relation r下的comprehensive neighbor embedding vector,当然mi有多个relations下的neighbor embedding vector
mid1: splicing aggregation operator, e.g. concatenation、sum、MLP
mid2: 然后与上一层的inter-relation embedding拼接
model: GAT(multi-head attention mechanism of Graph Attention Netowrk)
output: 中心节点mi在l-th layer最终的embedding representation。
4.3 Balanced Sampling Strategy based Contrastive Learning Mechanism(BasCL)
problem: number of event classes in incremental event detection constantly changing 事件类数持续变化
solution: contrastive learning 对比学习
-> it focuses on learning the common features between similar instances and distinguishing differences between non-similar instances. 它集中于学习相似实例的共同特征,划分非相似实例的差异。
problem: long-tail problem in social event detection 长尾分布
solution: contrastive learning 对比学习
besides, contrastive learning contains more cluster-like structure information, which can benefit the downstream event clustering tasks.
- solution: triplet losses -> to balance a large number of negative samples and a small number of positive samples of the same event class.
periodically up-to-date embedding space,定期更新嵌入空间 -> we first sample a positive sample mi+ and a negative sample mi- to construct triplet loss and update the embedding of the message in the direction of the positive sample. <- Euclidean distance
- solution: global-local pair loss -> to preserve the graph structure information in the process of detecting long-tail events incrementally. 保存图结构信息-> to make better use of the influence of similar structural information by minimizing the cross-entropy of global summary and local message representation.
solution: a Balanced sampling strategy based Contrastive learning mechanism, BasCL
基于均衡采样策略的对比学习机制 -> used to train the GNN
4.4 DRL-DBSCAN: Deep Reinforcement Learning(DRL) guided DSSCAN model
DBSCAN: automatically adjust the number of classes
problem: DBSCAn still has two parameters (the distance parameter e and the minimum sample number parameter minPts) that need to be manually adjusted and cannot adapt to match message blocks with different data distributions in the constantly changing message input.
solution: DRL-DBSCAN,深度强化学习指导的DBSCAN, to obtain a stable clustering effect of social events in the multi-round parameter interaction with DBSCAN<- based on learned social messages embeddings
-> achieve social event clustering detection tasks automatically
多代理强化学习 multi-agent DRL: Twin Delayed Deep Deterministic policy gradient algorithm(TD3)
- agent: parameter adjustment system
- environment: incremental social data
- process: Markov decision process(Sclu, Aclu, Rclu)
- Sclu, state--clustering situation
- Aclu: action--t-Distributed stochastic Neighbor Embedding, like learning rate 步长, to prevent the curse of dimensionality and speed up the DBSCAN processing speed
- Rclu: Reward--Variance Ratio Criterion to stimulate the agent
- Optimization: Twin Delayed Deep Deterministic policy gradient algorithm
to learn optimal parameters: minPts, minimum number of samples, -minimum distance between two clusters
4.5 transferring: cross-lingual social message embedding method(Crlme)
transferring the parameters of MarGNN to improve the performance of embedding on target-language messages(non-English)
problem: non-English languages with insufficient original information that cannot reuse the training process of the English model.
solution: we directly inherit the parameters θ preserved in English model fE training as parameter θ of the non-English model fNoE when detecting non-English events.
LNMAP model -> map Non-English message m to English semantic space
4.6 Maintenace Strategies
1) All message strategy
keeping all the messages.
- detection stage, insert newly arrived message block into G
- maintenance stage, continue the training process using all the messages in G.
impractical -> eventually exceed the embedding space capacity of the message encoder
2) Relevant message strategy
keeping messages that are related to the newly arrived ones.
- detection stage, insert the newly arrived message block into G
- maintenance stage, first remove messages that are not connected to any messages that arrived during the last time window and then continue training using all the messages in G
3) Latest message strategy
keeping the latest message block
- detection stage, use only the newly arrived message block to reconstruct G
- maintenance stage, continue training with all the messages in G, which only involves the latest message block.
proposed FinEvent
- initialize a weighted multi-relational graph G
- when new messages arrive, update the graph
- inserting the new messages node
- establishing a connection
- regularly delete expired nodes and edges
- neighbor selector of MarGNN
- performing model single-language detection
- aggregation module
- BasCL
- DRL-DBSCAN
5. Experiment
5.1 Data
Twitter dataset(Building a large-scale corpus for evaluating event detection on twitter)
- 68841 manually labeled tweets
- 503 event classes
- 4 weeks(29 days)
- 3 relations:
- M-U-M(message-user-message)
- M-L-M(message-location-message)
- M-E-M(message-entity-message)
French Twitter dataset:
- 64513 labeled tweets
- 257 event classes
- 3 weeks(23 days)
5.2 Baselines
- word2vec
- LDA
- WMD
- BERT
- BiLSTM
- PP-GCN
- EventX
- KPGNN
5.3 Experiment Setting
- python 3.7.3
- pytorch 1.8.1
- 64 core Intel Xeon CPU E5-2680 v4@2.40GHz with 512GB RAM and 1xNVIDIA Tesla P100-PICE GPU
5.4 Evaluation Metrics
- NMI: normalized mutual information
- AMI: adjusted mutual information
- ARI: adjusted rand index
6. Evaluation
6.1 offline evaluation
6.2 online(incremental evaluation)
6.2.1 Ablation Study
6.3 study on RL process
6.3.1 preserving thresholds
6.3.2 DRL-DBSCAN
6.4 Cross-lingual Transferring Evaluation
6.5 Time Analysis
7. Related work
- social event detection method
- document-pivot(DP) methods
- feature-pivot(FP) methods
- their application scenarios
- offline
- online
- different techniques and mechanisms
- incremental clustering
- community detection
- topic models
problem:
- These methods are limited by the latest knowledge as they ignore the rich semantics and structural information contained in the social streams to some extent 忽略了丰富的语义和结构信息
- have too few parameters to preserve the learned knowledge
8. Conclusion
FinEvent a reinforced, incremental, and cross-lingual social event detection architecture from steaming social messages.
9. Fin-Event Codes Analysis
Fin-Event,先用intra_agg分别求出 meta-path based word_embedding, user_id_embedding, entity_embedding,然后用inter_agg将这三种不同meta-path embeddings合并为一个综合的final embedding。
1) problem 1: 没有考虑语序
在构建word adjacency matrix时,sampled words中word直接连在Graph中,没有考虑语序问题。
e.g. I love Uassica; Jassica loves me,这是两个事,不考虑语序就成一个事了!
2) 邻接矩阵(adjacency matrix)是图的最基本的实现方式,有二维坐标矩阵转化。
3) Fin-Event中的mask指的是index,train_mask=train_idx,而不是HAN中的bool类型
4) problem-2: cal_similarity_node_edge中r_data[1]应该是写反了 -》r_data[0]
5) filtered_multi-r_data将entity_neighbor从48万多减少到10万多
7) validation, extract_features是重新计算的pre embeddings。