You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
1. I have searched related issues but cannot get the expected help.
2. The bug has not been fixed in the latest version.
3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.
prompt = "Give me a short introduction to large language model."
messages = [
{"role": "system", "content": "You are Qwen, created by Alibaba Cloud. You are a helpful assistant."},
{"role": "user", "content": prompt300},
{"role": "assistant", "content": prompt600}
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=False
)
import time
start_time = time.time()
results_queue = Queue()
threads = []
for i in range(50):
thread = threading.Thread(target=send_request, args=(i, text, results_queue, client))
thread.start()
threads.append(thread)
for thread in threads:
thread.join()
end_time = time.time()
print(end_time- start_time)
completion_list = [None for _ in range(50)]
for _ in range(50):
index, result = results_queue.get()
completion_list[index] = result
res_json = json.loads(completion_list[0].text)
Environment
2024-12-27 10:51:32,563 - modelscope - INFO - PyTorch version 2.4.0 Found.
2024-12-27 10:51:32,564 - modelscope - INFO - Loading ast index from /home/yanhui_he/.cache/modelscope/ast_indexer
2024-12-27 10:51:32,717 - modelscope - INFO - Loading done! Current index file version is 1.13.3, with md5 cac1c2695a261ce83ddea2be8560cb8b and a total number of 972 components indexed
WARNING 12-27 10:51:34 cuda.py:22] You are using a deprecated pynvml package. Please install nvidia-ml-py instead, and make sure to uninstall pynvml. When both of them are installed, pynvml will take precedence and cause errors. See https://pypi.org/project/pynvml for more information.
Python: 3.10.13 (main, Sep 11 2023, 13:44:35) [GCC 11.2.0]
CUDA available: True
GPU 0,1,2,3,4,5,6,7,8,9: NVIDIA A100 80GB PCIe
GPU 0,1,2,3,4,5,6,7,8,9 Compute Capability: 8.0
CUDA_HOME: /usr/local/cuda-12.1
NVCC: Cuda compilation tools, release 12.1, V12.1.105
CUDA Driver Version: 550.54.15
PyTorch: 2.4.0+cu121
sglang: 0.4.1
flashinfer: 0.1.6+cu121torch2.4
triton: 3.0.0
transformers: 4.44.0
torchao: 0.7.0
numpy: 1.26.4
aiohttp: 3.9.5
fastapi: 0.115.0
hf_transfer: Module Not Found
huggingface_hub: 0.24.5
interegular: 0.3.3
modelscope: 1.13.3
orjson: 3.9.15
packaging: 23.2
psutil: 5.9.6
pydantic: 2.9.2
multipart: 0.0.9
zmq: 26.0.3
uvicorn: 0.29.0
uvloop: 0.19.0
vllm: 0.6.1.post2
xgrammar: Module Not Found
openai: 1.47.1
anthropic: 0.25.8
litellm: Module Not Found
decord: Module Not Found
NVIDIA Topology:
�[4mGPU0 GPU1 GPU2 GPU3 GPU4 GPU5 GPU6 GPU7 GPU8 GPU9 CPU Affinity NUMA Affinity GPU NUMA ID�[0m
GPU0 X NV12 PXB PXB PXB SYS SYS SYS SYS SYS 0-31,64-95 0 N/A
GPU1 NV12 X PIX PXB PXB SYS SYS SYS SYS SYS 0-31,64-95 0 N/A
GPU2 PXB PIX X PXB PXB SYS SYS NV12 SYS SYS 0-31,64-95 0 N/A
GPU3 PXB PXB PXB X NV12 SYS SYS SYS SYS SYS 0-31,64-95 0 N/A
GPU4 PXB PXB PXB NV12 X SYS SYS SYS SYS SYS 0-31,64-95 0 N/A
GPU5 SYS SYS SYS SYS SYS X NV12 PXB PXB PXB 32-63,96-127 1 N/A
GPU6 SYS SYS SYS SYS SYS NV12 X PIX PXB PXB 32-63,96-127 1 N/A
GPU7 SYS SYS NV12 SYS SYS PXB PIX X PXB PXB 32-63,96-127 1 N/A
GPU8 SYS SYS SYS SYS SYS PXB PXB PXB X NV12 32-63,96-127 1 N/A
GPU9 SYS SYS SYS SYS SYS PXB PXB PXB NV12 X 32-63,96-127 1 N/A
Legend:
X = Self
SYS = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
PHB = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
PXB = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
PIX = Connection traversing at most a single PCIe bridge
NV# = Connection traversing a bonded set of # NVLinks
ulimit soft: 655350
The text was updated successfully, but these errors were encountered:
Checklist
Describe the bug
I wonder if the problem is caused by return_logprob?
I ran into the similar problem when I use vllm,and have found the issues about it:
vllm-project/vllm#5067
vllm-project/vllm#1532
vllm-project/vllm#5907
https://github.com/vllm-project/vllm/pull/5355
So maybe the reasons are the same? : CUDA memory used by calculating prompt_logprobs is not counted in profile-running
Reproduction
from openai import OpenAI
import requests
import threading
from queue import Queue
import requests
import json
from transformers import AutoModelForCausalLM, AutoTokenizer
client = None
model_name = "./models/Qwen2.5-0.5B-Instruct"
tokenizer = AutoTokenizer.from_pretrained(model_name)
prompt = "Give me a short introduction to large language model."
messages = [
{"role": "system", "content": "You are Qwen, created by Alibaba Cloud. You are a helpful assistant."},
{"role": "user", "content": prompt300},
{"role": "assistant", "content": prompt600}
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=False
)
def send_request(index, text, queue, client):
response = requests.post(
"http://10.48.2.2:30000/generate",
json={
"text": text,
"sampling_params": {
"temperature": 0,
"max_new_tokens": 1,
},
"return_logprob": True,
"logprob_start_len": 3000
},
)
import time
start_time = time.time()
results_queue = Queue()
threads = []
for i in range(50):
thread = threading.Thread(target=send_request, args=(i, text, results_queue, client))
thread.start()
threads.append(thread)
for thread in threads:
thread.join()
end_time = time.time()
print(end_time- start_time)
completion_list = [None for _ in range(50)]
for _ in range(50):
index, result = results_queue.get()
completion_list[index] = result
res_json = json.loads(completion_list[0].text)
Environment
2024-12-27 10:51:32,563 - modelscope - INFO - PyTorch version 2.4.0 Found.
2024-12-27 10:51:32,564 - modelscope - INFO - Loading ast index from /home/yanhui_he/.cache/modelscope/ast_indexer
2024-12-27 10:51:32,717 - modelscope - INFO - Loading done! Current index file version is 1.13.3, with md5 cac1c2695a261ce83ddea2be8560cb8b and a total number of 972 components indexed
WARNING 12-27 10:51:34 cuda.py:22] You are using a deprecated
pynvml
package. Please installnvidia-ml-py
instead, and make sure to uninstallpynvml
. When both of them are installed,pynvml
will take precedence and cause errors. See https://pypi.org/project/pynvml for more information.Python: 3.10.13 (main, Sep 11 2023, 13:44:35) [GCC 11.2.0]
CUDA available: True
GPU 0,1,2,3,4,5,6,7,8,9: NVIDIA A100 80GB PCIe
GPU 0,1,2,3,4,5,6,7,8,9 Compute Capability: 8.0
CUDA_HOME: /usr/local/cuda-12.1
NVCC: Cuda compilation tools, release 12.1, V12.1.105
CUDA Driver Version: 550.54.15
PyTorch: 2.4.0+cu121
sglang: 0.4.1
flashinfer: 0.1.6+cu121torch2.4
triton: 3.0.0
transformers: 4.44.0
torchao: 0.7.0
numpy: 1.26.4
aiohttp: 3.9.5
fastapi: 0.115.0
hf_transfer: Module Not Found
huggingface_hub: 0.24.5
interegular: 0.3.3
modelscope: 1.13.3
orjson: 3.9.15
packaging: 23.2
psutil: 5.9.6
pydantic: 2.9.2
multipart: 0.0.9
zmq: 26.0.3
uvicorn: 0.29.0
uvloop: 0.19.0
vllm: 0.6.1.post2
xgrammar: Module Not Found
openai: 1.47.1
anthropic: 0.25.8
litellm: Module Not Found
decord: Module Not Found
NVIDIA Topology:
�[4mGPU0 GPU1 GPU2 GPU3 GPU4 GPU5 GPU6 GPU7 GPU8 GPU9 CPU Affinity NUMA Affinity GPU NUMA ID�[0m
GPU0 X NV12 PXB PXB PXB SYS SYS SYS SYS SYS 0-31,64-95 0 N/A
GPU1 NV12 X PIX PXB PXB SYS SYS SYS SYS SYS 0-31,64-95 0 N/A
GPU2 PXB PIX X PXB PXB SYS SYS NV12 SYS SYS 0-31,64-95 0 N/A
GPU3 PXB PXB PXB X NV12 SYS SYS SYS SYS SYS 0-31,64-95 0 N/A
GPU4 PXB PXB PXB NV12 X SYS SYS SYS SYS SYS 0-31,64-95 0 N/A
GPU5 SYS SYS SYS SYS SYS X NV12 PXB PXB PXB 32-63,96-127 1 N/A
GPU6 SYS SYS SYS SYS SYS NV12 X PIX PXB PXB 32-63,96-127 1 N/A
GPU7 SYS SYS NV12 SYS SYS PXB PIX X PXB PXB 32-63,96-127 1 N/A
GPU8 SYS SYS SYS SYS SYS PXB PXB PXB X NV12 32-63,96-127 1 N/A
GPU9 SYS SYS SYS SYS SYS PXB PXB PXB NV12 X 32-63,96-127 1 N/A
Legend:
X = Self
SYS = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
PHB = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
PXB = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
PIX = Connection traversing at most a single PCIe bridge
NV# = Connection traversing a bonded set of # NVLinks
ulimit soft: 655350
The text was updated successfully, but these errors were encountered: