大模型提取三元组关系

一只肥泡泡

已于 2025-07-31 23:35:13 修改

阅读量151

点赞数 7

CC 4.0 BY-SA版权

文章标签：知识图谱语言模型

于 2025-07-31 23:32:58 首次发布

本文链接：https://blog.csdn.net/weixin_47903162/article/details/149815764

大模型提取三元组关系

大模型可以抽取逻辑关系，辅助推理。

!pip install snownlp
!pip install langchain
!pip install langchain_community
!pip install dashscope

1 文本处理和分割

1.1 用snowlp抽取

from snownlp import SnowNLP

# 初始化三元组列表
triplets = []
texts = '''
李明浏览了数码相机的页面，购买了一台数码相机。
李明咨询了客服关于数码相机的保修信息。
李明浏览了户外旅行用品的页面，购买了一个帐篷。
李明咨询了客服关于便携式炉具的使用方法。
李明参与了摄影爱好者的线上社区活动。
李明分享了他的摄影作品到社区，并获得的好评。
'''.strip().split('\n')      # 使用strip去除首尾空白，使用split()按行分割文本

抽取三元组：

for sentence in texts:
  s = SnowNLP(sentence)
  words = [word for word, tag in s.tags if tag in ('nr', 'n', 'v')]  # 抽取名词和动词
  # 假设三元组格式为(实体1，关系，实体2)
  if len(words) >= 3:
    triplets.append((words[0], words[2], words[1]))

for triplet in triplets:
  print(triplet)

三元组结果为：
(‘李’, ‘浏览’, ‘明’)
(‘李’, ‘服’, ‘明’)
(‘李’, ‘浏览’, ‘明’)
(‘李’, ‘咨询’, ‘明’)
(‘李’, ‘参与’, ‘明’)
(‘李’, ‘分享’, ‘明’)
发现没能准确识别，下面将查看大模型的语义深度解析的提取结果。

1.2 用大模型抽取

加载大模型

import os
from langchain_community.llms import Tongyi
DASHSCOPE_API_KEY = "xxxxxx"
os.environ["DASHSCOPE_API_KEY"] = DASHSCOPE_API_KEY
llm = Tongyi()

创建知识图谱索引

from langchain.indexes import GraphIndexCreator
from langchain.chains import GraphQAChain
from langchain.graphs.networkx_graph import KnowledgeTriple

index_creator = GraphIndexCreator(llm=llm)
f_index_creator = GraphIndexCreator(llm=llm)
final_graph = f_index_creator.from_text('')

texts = '''
李明浏览了数码相机的页面，购买了一台数码相机。
李明咨询了客服关于数码相机的保修信息。
李明浏览了户外旅行用品的页面，购买了一个帐篷。
李明咨询了客服关于便携式炉具的使用方法。
李明参与了摄影爱好者的线上社区活动。
李明分享了他的摄影作品到社区，并获得的好评。
'''

文本分割和三元组生成：

for text in texts.split('\n'):
  triplets  = index_creator.from_text(text)
  for(node1, node2, relation) in triplets.get_triples():
    final_graph.add_triple(KnowledgeTriple(node1, node2, relation))
    print("====================")
    print(node1, '  ', relation, '  ', node2)
triplets = final_graph.get_triples()
for triplet in triplets:
  print(triplet)

结果为：

====================
李明    浏览了    数码相机的页面
====================
李明    购买了一台    数码相机
====================
李明    咨询了    客服
====================
客服    提供    保修信息
====================
保修信息    关于    数码相机
====================
李明    浏览了    户外旅行用品的页面
====================
李明    购买了一个    帐篷
====================
李明    咨询了    客服
====================
客服    提供了关于    便携式炉具的使用方法
====================
李明    参与了    摄影爱好者线上社区活动
====================
李明    分享了    他的摄影作品
====================
李明    获得了    好评
====================
他的摄影作品    到    社区
('李明', '浏览了', '户外旅行用品的页面')
('李明', '购买了一台', '数码相机')
('李明', '咨询了', '客服')
('李明', '购买了一个', '帐篷')
('李明', '参与了', '摄影爱好者线上社区活动')
('李明', '分享了', '他的摄影作品')
('李明', '获得了', '好评')
('客服', '提供', '保修信息')
('客服', '提供了关于', '便携式炉具的使用方法')
('保修信息', '关于', '数码相机')
('他的摄影作品', '到', '社区')

2 基于三元组的知识图谱绘制

知识图谱绘制：

import networkx as nx
import matplotlib.pyplot as plt
G = nx.DiGraph()
G.add_edges_from((source, target, {'relation':relation}) for source, relation, target in final_graph.get_triples())

plt.figure(figsize=(10,5), dpi=300)
pos = nx.spring_layout(G, k=0.1, seed=1)
edge_labels = nx.get_edge_attributes(G, 'relation')
nx.draw_networkx_edge_labels(G, pos, edge_labels=edge_labels, font_size=8, font_family='WenQuanYi Micro Hei')
nx.draw_networkx(G, pos, node_size=1500, node_color='lightblue', linewidths=0.25, font_size=10, font_weight='bold', with_labels=True, font_family='WenQuanYi Micro Hei')
plt.axis('off')
plt.show()

3 测试基于知识图谱的问答

chain = GraphQAChain.from_llm(llm, graph=final_graph, verbose=True)
chain.run('李明未来可能购买什么类型的产品？')

GraphQAChain 的内部 prompt 可能是英文，导致出现英文输出。

chain.run('中文输出，李明未来可能购买什么类型的产品？')

结果为：

> Entering new GraphQAChain chain...
Entities Extracted:
李明
Full Context:
李明 户外旅行用品的页面 浏览了
李明 数码相机 购买了一台
李明 客服 咨询了
李明 帐篷 购买了一个
李明 摄影爱好者线上社区活动 参与了
李明 他的摄影作品 分享了
李明 好评 获得了

> Finished chain.
根据提供的知识 triplet，李明已经购买了数码相机和帐篷，参与了摄影爱好者线上社区活动，并分享了他的摄影作品。这表明他对摄影和户外活动有兴趣。因此，李明未来可能购买与摄影或户外活动相关的产品，例如：\n\n- 更高端的摄影器材（如镜头、三脚架、闪光灯等）\n- 户外装备（如睡袋、登山包、野餐用具等）\n- 图像处理软件或相关配件\n\n这些产品类型符合他在摄影和户外活动方面的兴趣。

参考文献：Langchain实战：大模型应用开发实例