We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sglang/python/sglang/srt/model_executor/cuda_graph_runner.py
Lines 103 to 105 in 2125898
From torch 2.5 version, we should not need such a large cache size limit. Is it possible for someone to double check and remove the override?
NA
The text was updated successfully, but these errors were encountered:
It was added at #2069 As I remember, we should set, otherwise something with FlashInfer will fail
Sorry, something went wrong.
We may test whether reducing the size is also compatible (not directly deleting).
I see. Maybe a better way is to make FlashInfer kernels torch.compile compatible?
No branches or pull requests
Describe the bug
sglang/python/sglang/srt/model_executor/cuda_graph_runner.py
Lines 103 to 105 in 2125898
From torch 2.5 version, we should not need such a large cache size limit. Is it possible for someone to double check and remove the override?
Reproduction
NA
Environment
NA
The text was updated successfully, but these errors were encountered: