- #1
Avatrin
- 245
- 6
- TL;DR Summary
- Trainable matrices in Tensorflow & Keras for Attention
Hi
Several attention mechanisms require trainable matrices and vectors. I have been trying to learn how to implement this in Tensorflow w/ Keras. Every implementation I see use the Dense layer from Keras, but I have a tendency to get lost trying to understand why and what they do afterwards. Sometimes it seems contradictory (for instance, the shapes of the matrices here and here).
Are there any good explanations for how the Dense layer works, and especially, how it works when it's treated like a matrix instead of a fully connected layer. I mean, some tutorials use matmul, but how does that work when the batch size is larger than one?
I hope somebody here can help me
Several attention mechanisms require trainable matrices and vectors. I have been trying to learn how to implement this in Tensorflow w/ Keras. Every implementation I see use the Dense layer from Keras, but I have a tendency to get lost trying to understand why and what they do afterwards. Sometimes it seems contradictory (for instance, the shapes of the matrices here and here).
Are there any good explanations for how the Dense layer works, and especially, how it works when it's treated like a matrix instead of a fully connected layer. I mean, some tutorials use matmul, but how does that work when the batch size is larger than one?
I hope somebody here can help me