TF乘法之multiply、matmul、*

"*"和tf.multiply 属于元素级别的相乘，两个矩阵或者向量维度一直，对应位置相乘维度保持不变
multiply(x,y,name=None)—实现元素级别的相乘
1）注意：x与y要有相同的数据类型，要是int都是int，要是float都是float 否则会报错
2）若y为数，x为向量或矩阵，则用y乘以x中的每一个元素：

x2 = tf.constant([[1.0, 1.1, 1.2], [1.3, 1.4, 1.5], [1.6, 1.7, 1.8]])
y2 = tf.constant(2.0)#这里的值同样需要是float型，若是int型，则会报错
z2 = tf.multiply(x2, y2)

结果为：[[ 2. 2.20000005 2.4000001 ]
[ 2.5999999 2.79999995 3. ]
[ 3.20000005 3.4000001 3.5999999 ]]
3) 若y为向量，x为矩阵，则必须满足：若y是行向量，则元素个数应与x的行数相等；若y是列向量，则需与x的列数相等：

x2 = tf.constant([[1.0, 2.0, 3.0], [1.0, 2.0, 3.0], [1.0, 2.0, 3.0], [1.0, 2.0, 3.0]])  # 4*3
y2 = tf.constant([1.0, 1, 2])  # 1*3
z2 = tf.multiply(x2, y2) # 等价于 z2= x2*y2
print("列元素一直自动复制行维度与相乘矩阵保持一致:", z2)y3 = tf.constant([[1.0], [1], [2], [3]])  # 4*1
z3 = tf.multiply(x2, y2)# 等价于 z3 = y3 * z3
print("行元素一直自动复制列维度与相乘矩阵保持一致:", z3)

列元素一直自动复制行维度与相乘矩阵保持一致: tf.Tensor(
[[1. 2. 6.]
[1. 2. 6.]
[1. 2. 6.]
[1. 2. 6.]], shape=(4, 3), dtype=float32) tf.Tensor(
[[1. 2. 6.]
[1. 2. 6.]
[1. 2. 6.]
[1. 2. 6.]], shape=(4, 3), dtype=float32)
行元素一直自动复制列维度与相乘矩阵保持一致: tf.Tensor(
[[1. 2. 6.]
[1. 2. 6.]
[1. 2. 6.]
[1. 2. 6.]], shape=(4, 3), dtype=float32) tf.Tensor(
[[ 1. 2. 6.]
[ 1. 2. 6.]
[ 2. 4. 12.]
[ 3. 6. 18.]], shape=(4, 3), dtype=float32)

#coding=utf-8
import  tensorflow as tf
if __name__ == '__main__':a = tf.constant([1 ,2, 3])b = tf.constant([2 ,3, 4])res_ab = a*bprint("res_ab", res_ab)m_a = tf.constant([[1 ,2, 3],[1, 2, 3]])m_b = tf.constant([[2, 3, 4],[2, 3, 4]])res_mab = m_a * m_bprint("res_mab", res_mab)

tf.matmul 符合数学上一般矩阵乘法的定义，注意matmul 对多维矩阵就是最后两个维度进行变换相乘

#coding=utf-8
import  tensorflow as tfif __name__ == '__main__':print("--------------matmul-------------------")m_a = tf.constant([[1 ,2, 3],[1, 2, 3]])m_b = tf.constant([[2, 3, 4],[2, 3, 4]])mult_res = tf.matmul(m_a, m_b, transpose_b=True)print("mult_res:",mult_res)

MultiHeadAttention 实现代码

q = self.wq(q)  # (batch_size, seq_len, d_model)
k = self.wk(k)  # (batch_size, seq_len, d_model)
v = self.wv(v)  # (batch_size, seq_len, d_model)q = self.split_heads(q, batch_size)  # (batch_size, num_heads, seq_len_q, depth)
k = self.split_heads(k, batch_size)  # (batch_size, num_heads, seq_len_k, depth)
v = self.split_heads(v, batch_size)  # (batch_size, num_heads, seq_len_v, depth)# scaled_attention.shape == (batch_size, num_heads, seq_len_q, depth)
# attention_weights.shape == (batch_size, num_heads, seq_len_q, seq_len_k)
scaled_attention, attention_weights = scaled_dot_product_attention(q, k, v, mask)

按照论文中的思路需要计算多个head 的scale dot product attention，再将attention_weight与计算结果，再concat，基本按照普通的实现方式就是串行执行
但是利用矩阵变换及矩阵相乘的并行计算，现在将x的embedding通过fc 映射成一个 head**depth长度的向量，然后进行拆解成将[batch, seq_len, num_head, depth]的向量，然后调整成 scale_dot_product_attetion 可以处理的维度，qkv均这样处理，然后进行attention计算，得到最终结果[batch,_size, num_head, seq_length, depth]，再进行一次维度变化，去掉num_head这一维，depth变化为num_head*depth

# 缩放点积注意力
def scaled_dot_product_attention(q ,k ,v ,mask):'''Args:-q : shape==(...,seq_len_q,depth)-k : shape==(...,seq_len_k,depth)-v : shape==(...,seq_len_v,depth_v)- seq_len_k = seq_len_v- mask: shape == (...,seq_len_q,seq_len_k) 点积return:output:weighted sumattention_weights:weights of attention'''# shape == (...,seq_len_q,seq_len_k)# embedding 向量算法内积# 矩阵乘法的最后一维进行相乘，其他模块基本不动matmul_qk =tf.matmul(q, k, transpose_b=True)dk = tf.cast(tf.shape(k)[-1], tf.float32)scaled_attention_logits = matmul_qk / tf.math.sqrt(dk)if mask is not None:# 10的负九次方比较大，会使得需要掩盖的数据在softmax的时候趋近0scaled_attention_logits += (mask * -1e9)# shape == (...,seq_len_q,seq_len_k)attention_weights = tf.nn.softmax(scaled_attention_logits, axis=-1)# shape==(...,seq_len_q,depth_v)output = tf.matmul(attention_weights, v)return output, attention_weightsdef print_scaled_dot_attention(q, k, v):temp_out, temp_att = scaled_dot_product_attention(q, k, v, None)print("Attention weights are:")print(temp_att)print("Outputs are:")print(temp_out)

从scale_dot_product_attetion 体会到的矩阵乘法的向量乘法意义：

weight = tf.constant([[1, 2, 3, 1], [4, 5, 6, 1], [7, 8, 9, 1]], dtype=tf.float32) # (3 ,4)
value = tf.constant([[1, 1, 1], [1, 1, 1], [1, 1, 1], [1, 1, 1]], dtype=tf.float32) # (4, 3)
weight_value =tf.matmul(weight, value)
print(matmul_qk)

tf.Tensor(
[[ 7. 7. 7.]
[16. 16. 16.]
[25. 25. 25.]], shape=(3, 3), dtype=float32)

weight向量（3，4）与 value (4,3)向量进行矩阵乘法
从直观意义上讲是对 value 每个行向量进行加权求和，weight列的序号，对应 value行向量序号，然后让对应单个值与一个向量进行相乘，在讲4列整体累加，这个符合矩阵乘法的几何意义。

这个是权重计算原理

还有一个是scale_dot_product
scale 是指softmax
dot_product指的是点积计算权重
两个矩阵相乘，其实计算的这个行向量和每一个列向量的相关性，需要注意一点 q向量不需要转置，k向量需要在mutli前进行转置

import tensorflow as tfw = tf.Variable([[0.4], [1.2]], dtype=tf.float32) # w.shape: [2, 1]
x = tf.Variable([range(1,6), range(5,10)], dtype=tf.float32) # x.shape: [2, 5]
y = w * x     # 等同于 y = tf.multiply(w, x)   y.shape: [2, 5]sess = tf.Session()
init = tf.global_variables_initializer()
sess.run(init)print sess.run(w)
print sess.run(x)
print sess.run(y)

Python 之 numpy 和 tensorflow 中的各种乘法（点乘和矩阵乘）
https://blog.csdn.net/weixin_45459911/article/details/107852351
Tensorflow函数学习笔记2—tf.multipy和tf.matmul
tf中multiply、matmul、dot、batch_dot区别

TF乘法之multiply、matmul、*相关推荐

大整数乘法--leetcode Multiply Strings
大整数乘法本文转载自http://www.cnblogs.com/TenosDoIt/p/3735309.html 我们在日常的大整数计算中,通常是把它转化为字符型计算.这道题的思路就和我们小学计算 ...
numpy矩阵乘法中的multiply，matmul和dot
用numpy做矩阵运算时,少不了用到矩阵乘法.本文帮你迅速区分multiply, matmul和dot的区别. numpy官方文档中的说明:(想深入了解可以一戳) multiply: https:// ...
Numpy中矩阵向量乘法np.dot()及np.multiply()以及*区别
Numpy中的矩阵向量乘法分别是np.dot(a,b).np.multiply(a,b) 以及*,刚开始接触的时候比较模糊,于是自己整理了一下.先来介绍理论,然后再结合例子深入了解一下. 数组矩阵 ...
Numpy库的三种矩阵乘法
本文介绍了 Numpy 库支持的三种矩阵乘法. 1. 元素级乘法使用 multiply 函数或 * 运算符实现元素之间的乘法 import numpy as np# 创建两个矩阵 m = np.ar ...
numpy 数组和矩阵的乘法
1. 当为array的时候,默认d*f就是对应元素的乘积,multiply也是对应元素的乘积,dot(d,f)会转化为矩阵的乘积, dot点乘意味着相加,而multiply只是对应元素相乘,不相加 2 ...
java bigdecimal乘法_Java BigDecimal类型的加减乘除运算
加法:add 减法:subtract 乘法:multiply 除法:divide 可参考下面代码: BigDecimal bignum1 = new BigDecimal("10" ...
what does tf.no_op do and tf.control_dependencies work？
- 控制依赖 with tf.control_dependencies([train_step, variables_averages_op]):train_op = tf.no_op(name='t ...
numpy 数组与矩阵的乘法理解
1. 当为array的时候,默认d*f就是对应元素的乘积,multiply也是对应元素的乘积,dot(d,f)会转化为矩阵的乘积, dot点乘意味着相加,而multiply只是对应元素相乘,不相加 2 ...
Pytorch：矩阵乘法总结
1.矩阵相乘 (1)二维矩阵乘法:torch.mm(mat1, mat2, out=None) → Tensor 该函数一般只用来计算两个二维矩阵的矩阵乘法,并且不支持broadcast操作. 代码例 ...

TF乘法之multiply、matmul、*

TF乘法之multiply、matmul、*相关推荐

最新文章

热门文章