We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
normalization seems different from the paper #attention is all you need#
in paper, normalization layer stays after mha and feed forward layer, in torchnlp, it stays before them
x = inputs # Layer Normalization x_norm = self.layer_norm_mha(x) # Multi-head attention y = self.multi_head_attention(x_norm, x_norm, x_norm) # Dropout and residual x = self.dropout(x + y) # Layer Normalization x_norm = self.layer_norm_ffn(x) # Positionwise Feedforward y = self.positionwise_feed_forward(x_norm) # Dropout and residual y = self.dropout(x + y)
The text was updated successfully, but these errors were encountered:
Yes it's from the updated Transformer model. You can find the Tensorflow version maintained by the Authors here
Sorry, something went wrong.
No branches or pull requests
normalization seems different from the paper #attention is all you need#
in paper, normalization layer stays after mha and feed forward layer, in torchnlp, it stays before them
The text was updated successfully, but these errors were encountered: