Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory leak on python 3.10.* #35434

Open
2 of 4 tasks
KhoiTrant68 opened this issue Dec 27, 2024 · 1 comment
Open
2 of 4 tasks

Memory leak on python 3.10.* #35434

KhoiTrant68 opened this issue Dec 27, 2024 · 1 comment
Labels

Comments

@KhoiTrant68
Copy link

KhoiTrant68 commented Dec 27, 2024

System Info

A memory leak is observed when using the KVEmbedding class with Python version 3.10.*. The same code does not exhibit the memory leak issue when running on Python 3.8.11. The issue may arise due to differences in how Python 3.10.* handles memory allocation, deallocation, or compatibility with the libraries used.


Setup:

  1. Environment:

    • Python 3.8.11 (No memory leak observed)
    • Python 3.10.* (Memory leak occurs)
  2. Dependencies:

    • tokenizers==0.20.3
    • torch==2.0.1+cu117
    • torchvision==0.15.2+cu117
    • tqdm==4.67.0
    • transformers==4.46.0

Attempts to Resolve:
We tried various strategies to address the memory leak, but none were successful. These include:

  1. Explicit Garbage Collection:
    • Used gc.collect() to manually invoke garbage collection after each batch.
  2. Variable Deletion:
    • Explicitly deleted intermediate variables with del to release memory.
  3. CUDA Cache Management:
    • Used torch.cuda.empty_cache() to free up GPU memory.
  4. Library Versions:
    Tried multiple versions of tokenizers and transformers libraries but observed no improvement.

Despite these efforts, the memory leak persisted in Python 3.10.*.


Call for Assistance: We have exhausted our efforts to identify and resolve the memory leak issue. If anyone with expertise in Python memory management, PyTorch, or Hugging Face Transformers can assist, we would greatly appreciate your help

Who can help?

@sgugger @thomwolf @ArthurZucker

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

import torch
import torch.nn.functional as F
from transformers import AutoModel
from transformers import AutoTokenizer


class KVEmbedding:
    def __init__(self, device):
        self.device = device

        # Load tokenizer and model from pretrained multilingual-e5-small
        self.tokenizer = AutoTokenizer.from_pretrained("intfloat/multilingual-e5-small")
        self.model = AutoModel.from_pretrained("intfloat/multilingual-e5-small").to(self.device)

        self.model.eval()  # Set model to evaluation mode

    def average_pool(self, last_hidden_states, attention_mask):
        # Apply mask to hidden states, set masked positions to 0
        last_hidden = last_hidden_states.masked_fill(~attention_mask[..., None].bool(), 0.0)
        # Average the hidden states along the sequence dimension
        return last_hidden.sum(dim=1) / attention_mask.sum(dim=1)[..., None]

    def embedding(self, l_transcription, batch_size=32):
        # Tokenize input transcriptions
        batch_dict = self.tokenizer(
            l_transcription,
            max_length=512,
            padding=True,
            truncation=True,
            return_tensors="pt",
        ).to(self.device)

        # Create batches
        input_ids, attention_mask = batch_dict["input_ids"], batch_dict["attention_mask"]
        num_batches = (len(input_ids) + batch_size - 1) // batch_size
        embeddings_list = []

        with torch.no_grad():
            for i in range(num_batches):
                start, end = i * batch_size, (i + 1) * batch_size
                batch_input_ids, batch_attention_mask = input_ids[start:end], attention_mask[start:end]
                outputs = self.model(input_ids=batch_input_ids, attention_mask=batch_attention_mask)
                embeddings = self.average_pool(outputs.last_hidden_state, batch_attention_mask)
                embeddings = abs(F.normalize(embeddings, p=2, dim=1))
                embeddings_list.append(embeddings)

        # Concatenate all batch embeddings
        all_embeddings = torch.cat(embeddings_list, dim=0).detach().cpu().numpy()
        return all_embeddings

Expected behavior

No memory leaks occur on Python 3.10.*.

@duchieuphan2k1
Copy link

Same issue here? Any updates?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants