Add & Norm #11

xuwenshen · 2019-01-18T13:02:28Z

normalization seems different from the paper #attention is all you need#

in paper, normalization layer stays after mha and feed forward layer, in torchnlp, it stays before them

    x = inputs
    
    # Layer Normalization
    x_norm = self.layer_norm_mha(x)
    
    # Multi-head attention
    y = self.multi_head_attention(x_norm, x_norm, x_norm)
    
    # Dropout and residual
    x = self.dropout(x + y)
    
    # Layer Normalization
    x_norm = self.layer_norm_ffn(x)
    
    # Positionwise Feedforward
    y = self.positionwise_feed_forward(x_norm)
    
    # Dropout and residual
    y = self.dropout(x + y)

The text was updated successfully, but these errors were encountered:

kolloldas · 2019-01-19T04:08:39Z

Yes it's from the updated Transformer model. You can find the Tensorflow version maintained by the Authors here

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add & Norm #11

Add & Norm #11

xuwenshen commented Jan 18, 2019

kolloldas commented Jan 19, 2019

Add & Norm #11

Add & Norm #11

Comments

xuwenshen commented Jan 18, 2019

kolloldas commented Jan 19, 2019