Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tutorial1 - BiLSTMPOSTagger,don't have pack pad? #8

Open
actforjason opened this issue Mar 11, 2021 · 3 comments
Open

Tutorial1 - BiLSTMPOSTagger,don't have pack pad? #8

actforjason opened this issue Mar 11, 2021 · 3 comments

Comments

@actforjason
Copy link

For varying-length sequences,Why don't have pack pad?

def forward(self, text):
        embedded = self.dropout(self.embedding(text))

        outputs, (hidden, cell) = self.lstm(embedded)
       
        predictions = self.fc(self.dropout(outputs))

        return predictions
@actforjason
Copy link
Author

Oh,there is a line
self.embedding = nn.Embedding(input_dim, embedding_dim, padding_idx=pad_idx)
point out padding_idx=pad_idx,but why isn't pack_padded_sequence&pad_packed_sequence needed?

@bentrevett
Copy link
Owner

You only need to use pack_padded_sequence and pad_packed_sequence if you want to do something with the final hidden (and cell) states, denoted in the code as hidden and cell, respectively from outputs, (hidden, cell) = self.lstm(embedded). This is because the packing and padding allows us to get the final hidden state from the last non-pad element in each sequence within the batch.

When doing POS tagging we want to do something with a sequence of hidden states, which we get from the outputs variable from outputs, (hidden, cell) = self.lstm(embedded). Yes, some of those hidden states will be from padded tokens, but I don't believe there is a nice way in PyTorch to get a sequence of hidden states up to a certain point. However, this doesn't really matter because when we pass the padding tag index to the CrossEntropyLoss we're telling PyTorch not to calculate losses over these pad tokens. We're only losing the time taken to run the LSTM over the pad tokens, which should be minimized by the BucketIterator that is designed to reduce the amount of padding within a batch.

The padding_idx=pad_idx argument to nn.Embedding does something different. It means whenever a token that has been numericalized to pad_idx is passed to the nn.Embedding it will return a tensor of zeros at that position, see: https://stackoverflow.com/a/61173091

@actforjason
Copy link
Author

actforjason commented Mar 13, 2021

Thank you for your reply.

It means whenever a token that has been numericalized to pad_idx is passed to the nn.Embedding it will return a tensor of zeros at that position.

But why we need a tensor of zeros at the position of pad_idx ?Will the parts of pad that haven't been packed but still here affect the training of LSTM (as well as bidirectional=True)?OR, what's the purpose of PACK other than reducing training time?

Besides,can I use pack_padded_sequence and pad_packed_sequence instead of padding_idx=pad_idx

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants