attention mechanis 的数学理解

luoganttcc

于 2020-11-02 21:05:46 发布

阅读量364

点赞数

分类专栏： keras

本文链接：https://blog.csdn.net/luoganttcc/article/details/109457218

版权

keras 专栏收录该内容

37 篇文章

订阅专栏

博客指出attention本质是加权求和，从算法本质看是三次矩阵运算，帮助读者深入理解attention的核心原理。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

参考链接

attention其实就是一个加权求和

import numpy as np
import tensorflow as tf

a=np.array((list(range(3*4)))).reshape((3,4))*1.0
b=a+3.0
katten=tf.keras.layers.Attention()(
    [a, b
    ])

print('keras attention=',katten)

从算法本质上理解attention ，就是三次矩阵运算

weight=a@b.T
weight1=tf.nn.softmax(weight)
attent=weight1@b

print('my attention=',attent)

keras attention= tf.Tensor(
[[11. 12. 13. 14.]
 [11. 12. 13. 14.]
 [11. 12. 13. 14.]], shape=(3, 4), dtype=float32)
my attention= tf.Tensor(
[[11. 12. 13. 14.]
 [11. 12. 13. 14.]
 [11. 12. 13. 14.]], shape=(3, 4), dtype=float64)