Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

boto3 client with smart_open memory leak #3313

Open
1 task
jia-zhengwei opened this issue Nov 29, 2024 · 2 comments
Open
1 task

boto3 client with smart_open memory leak #3313

jia-zhengwei opened this issue Nov 29, 2024 · 2 comments
Assignees
Labels
bug This issue is a confirmed bug. p2 This is a standard priority issue response-requested Waiting on additional info and feedback. third-party

Comments

@jia-zhengwei
Copy link

Describe the bug

When use boto3 to read remote oss parquet file get memory leak.

image

Regression Issue

  • Select this option if this issue appears to be a regression.

Expected Behavior

No memory leak

Current Behavior

As the method be called many times, the memory leak happend.

Reproduction Steps

Run this script many times

self.client = boto3.client(
    "s3",
    aws_access_key_id=access_key_id,
    aws_secret_access_key=secret_access_key,
    endpoint_url=endpoint,
    config=Config(
        s3={"addressing_style": "virtual", "signature_version": "s3v4"}
        ),
)
with smart_open.open(
    f"s3://{bucket_name}/{file_path}",
    mode="rb",
    transport_params={"client": self.client},
) as f:
    parquet_file = pq.ParquetFile(f)
    ...

self.client.close()

Possible Solution

I guess may related to smart_open object not be closed

Additional Information/Context

no

SDK version used

boto3==1.34.54 botocore==1.34.54

Environment details (OS name and version, etc.)

centos

@jia-zhengwei jia-zhengwei added bug This issue is a confirmed bug. needs-triage This issue or PR still needs to be triaged. labels Nov 29, 2024
@tim-finnigan tim-finnigan self-assigned this Dec 4, 2024
@tim-finnigan
Copy link
Contributor

Thanks for reaching out. Can you please share your debug logs (with any sensitive info redacted) by adding boto3.set_stream_logger('') to your script to help with further investigation? And can you share any more details regarding the memory leak? If you're able to provide a memory profiling analysis that could be helpful as well. Otherwise if the issue here is with smart_open then you may want to consider opening an issue in that repository.

@tim-finnigan tim-finnigan added response-requested Waiting on additional info and feedback. third-party p2 This is a standard priority issue and removed needs-triage This issue or PR still needs to be triaged. labels Dec 4, 2024
@jia-zhengwei
Copy link
Author

jia-zhengwei commented Dec 9, 2024

After added log(boto3.set_stream_logger("boto3", level=logging.DEBUG)), it shows nothing new log.

It's not smart_open bug.

I have a class which will init new instance of boto3 client, and each call will raise the memory up.

image
image
image

def print_memory_used():
    import threading

    import psutil

    def monitor_memory():
        pid = os.getpid()
        process = psutil.Process(pid)

        while True:
            memory_info = process.memory_info()
            memory_usage_mb = memory_info.rss / (1024**2)  # MB
            logger.info(
                f"[Memory Monitor] Process ID: {pid}, Memory Usage: {memory_usage_mb:.2f} MB"  # noqa 
            )
            time.sleep(1)

    memory_thread = threading.Thread(target=monitor_memory, daemon=True)
    memory_thread.start()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug This issue is a confirmed bug. p2 This is a standard priority issue response-requested Waiting on additional info and feedback. third-party
Projects
None yet
Development

No branches or pull requests

2 participants