[Bug] LLVM/Triton issue #2613

NilayYadav · 2024-12-27T09:01:31Z

Checklist

1. I have searched related issues but cannot get the expected help.
2. The bug has not been fixed in the latest version.
3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.
4. If the issue you raised is not a bug but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
5. Please use English, otherwise it will be closed.

Describe the bug

I'm facing an issue with LLVM/Triton when trying to run any model on Beam cloud using SGLang. Interestingly the same setup including dependencies and Docker image works without issues on other cloud platforms.

Here's the error message I'm receiving: python3.10: /source/llvm-project/llvm/lib/Support/MemoryBuffer.cpp:54: void llvm::MemoryBuffer::init(const char *, const char *, bool): Assertion `(!RequiresNullTerminator || BufEnd[0] == 0) && "Buffer is not null terminated!"' failed.

Full Error:

[2024-12-27 08:57:46] server_args=ServerArgs(model_path='meta-llama/Meta-Llama-3.1-8B-Instruct', 
tokenizer_path='meta-llama/Meta-Llama-3.1-8B-Instruct', tokenizer_mode='auto', skip_tokenizer_init=False, load_format='auto', 
trust_remote_code=False, dtype='auto', kv_cache_dtype='auto', quantization=None, context_length=None, device='cuda', 
served_model_name='meta-llama/Meta-Llama-3.1-8B-Instruct', chat_template=None, is_embedding=False, revision=None, host='0.0.0.0', 
port=30000, mem_fraction_static=0.88, max_running_requests=None, max_total_tokens=None, chunked_prefill_size=2048, 
max_prefill_tokens=16384, schedule_policy='lpm', schedule_conservativeness=1.0, cpu_offload_gb=0, tp_size=1, stream_interval=1, 
random_seed=531270821, constrained_json_whitespace_pattern=None, watchdog_timeout=300, download_dir=None, base_gpu_id=0, 
log_level='info', log_level_http=None, log_requests=False, show_time_cost=False, enable_metrics=False, decode_log_interval=40, 
api_key=None, file_storage_pth='SGLang_storage', enable_cache_report=False, dp_size=1, load_balance_method='round_robin', 
dist_init_addr=None, nnodes=1, node_rank=0, json_model_override_args='{}', enable_double_sparsity=False, ds_channel_config_path=None, 
ds_heavy_channel_num=32, ds_heavy_token_num=256, ds_heavy_channel_type='qk', ds_sparse_decode_threshold=4096, lora_paths=None, 
max_loras_per_batch=8, attention_backend='flashinfer', sampling_backend='flashinfer', grammar_backend='outlines', 
disable_radix_cache=False, disable_jump_forward=False, disable_cuda_graph=False, disable_cuda_graph_padding=False, 
disable_outlines_disk_cache=False, disable_custom_all_reduce=False, disable_mla=False, disable_overlap_schedule=False, 
enable_mixed_chunk=False, enable_dp_attention=False, enable_torch_compile=False, torch_compile_max_bs=32, cuda_graph_max_bs=8, 
torchao_config='', enable_nan_detection=False, enable_p2p_check=False, triton_attention_reduce_in_fp32=False, 
num_continuous_decode_steps=1, delete_ckpt_after_loading=False)
config.json: 100% 855/855 [00:00<00:00, 5.50MB/s]
tokenizer_config.json: 100% 55.4k/55.4k [00:00<00:00, 55.5MB/s]
tokenizer.json: 100% 9.09M/9.09M [00:00<00:00, 42.0MB/s]
special_tokens_map.json: 100% 296/296 [00:00<00:00, 2.73MB/s]
[2024-12-27 08:57:55 TP0] Init torch distributed begin.
[rank0]:[W1227 08:57:55.030727993 ProcessGroupGloo.cpp:712] Warning: Unable to resolve hostname to a (local) address. Using the 
loopback address as fallback. Manually set the network interface to bind to with GLOO_SOCKET_IFNAME. (function operator())
[2024-12-27 08:57:55 TP0] Load weight begin. avail mem=21.96 GB
[2024-12-27 08:57:56 TP0] Using model weights format ['*.safetensors']
model-00001-of-00004.safetensors: 100% 4.98G/4.98G [00:05<00:00, 863MB/s] 
model-00002-of-00004.safetensors: 100% 5.00G/5.00G [00:05<00:00, 868MB/s] 
model-00003-of-00004.safetensors: 100% 4.92G/4.92G [00:05<00:00, 847MB/s] 
model-00004-of-00004.safetensors: 100% 1.17G/1.17G [00:02<00:00, 510MB/s] 
model.safetensors.index.json: 100% 23.9k/23.9k [00:00<00:00, 193MB/s]
Loading safetensors checkpoint shards: 100% 4/4 [00:02<00:00,  1.48it/s]
1.66it/s]
[2024-12-27 08:58:18 TP0] Load weight end. type=LlamaForCausalLM, dtype=torch.bfloat16, avail mem=6.92 GB
[2024-12-27 08:58:18 TP0] Memory pool end. avail mem=1.54 GB
[2024-12-27 08:58:18 TP0] Capture cuda graph begin. This can take up to several minutes.
python: /source/llvm-project/llvm/lib/Support/MemoryBuffer.cpp:54: void llvm::MemoryBuffer::init(const char *, const char *, bool): 
Assertion `(!RequiresNullTerminator || BufEnd[0] == 0) && "Buffer is not null terminated!"' failed.

More details (Python Version, Platform, CUDA Version, LLVM Version, LD_LIBRARY_PATH, Installed Packages, Nvidia SMI and Triton path):

{"message": "{'Python Version': '3.11.11 (main, Dec  4 2024, 08:55:07) [GCC 11.4.0]', 'Platform': 
'Linux-6.5.0-1018-oracle-x86_64-with-glibc2.35', 'CUDA Version': 'Error: /bin/sh: 1: nvcc: not found\\n', 'LLVM Version': 'Error: 
/bin/sh: 1: llvm-config: not found\\n', 'LD_LIBRARY_PATH': 
'/usr/lib/x86_64-linux-gnu:/usr/lib/worker/x86_64-linux-gnu:/usr/local/nvidia/lib64:/usr/local/cuda-12.3/targets/x86_64-linux/lib:$LD_
LIBRARY_PATH', 'Installed Packages': 'Package                           Version\\n--------------------------------- 
-------------------------\\naiohappyeyeballs                  2.4.4\\naiohttp                           3.11.11\\naiosignal           
1.3.2\\nannotated-types                   0.7.0\\nanthropic                         0.42.0\\nanyio                             
4.7.0\\nasttokens                         3.0.0\\nattrs                             24.3.0\\nbetterproto-beta9                 
2.0.0b7\\nblinker                           1.4\\nbson                              0.5.10\\ncertifi                           
2024.12.14\\ncharset-normalizer                3.4.1\\nclick                             8.1.8\\ncloudpickle                       
3.1.0\\ncompressed-tensors                0.6.0\\ncryptography                      3.4.8\\ncuda-python                       
12.6.2.post1\\ndatasets                          3.2.0\\ndbus-python                       1.2.18\\ndecorator                         
5.1.1\\ndecord                            0.6.0\\ndill                              0.3.8\\ndiskcache                         
5.6.3\\ndistlib                           0.3.9\\ndistro                            1.9.0\\ndistro-info                       
1.1+ubuntu0.2\\ndnspython                         2.7.0\\neinops                            0.8.0\\nemail_validator                   
2.2.0\\nexecuting                         2.1.0\\nfastapi                           0.115.6\\nfastapi-cli                       
0.0.7\\nfilelock                          3.16.1\\nflashinfer                        0.2.0.post1+cu124torch2.4\\nfrozenlist           
1.5.0\\nfsspec                            2024.9.0\\ngguf                              0.10.0\\ngrpcio                            
1.60.0\\ngrpclib                           0.4.7\\ngunicorn                          20.1.0\\nh11                               
0.14.0\\nh2                                4.1.0\\nhf_transfer                       0.1.8\\nhpack                             
4.0.0\\nhttpcore                          1.0.7\\nhttplib2                          0.20.2\\nhttptools                         
0.6.4\\nhttpx                             0.27.2\\nhuggingface-hub                   0.27.0\\nhyperframe                        
6.0.1\\nidna                              3.10\\nimportlib_metadata                8.5.0\\niniconfig                         
2.0.0\\ninteregular                       0.3.3\\nipython                           8.31.0\\njedi                              
0.19.2\\njeepney                           0.7.1\\nJinja2                            3.1.5\\njiter                             
0.8.2\\njsonschema                        4.23.0\\njsonschema-specifications         2024.10.1\\nkeyring                           
23.5.0\\nlark                              1.2.2\\nlaunchpadlib                      1.10.16\\nlazr.restfulclient                
0.14.4\\nlazr.uri                          1.0.6\\nlitellm                           1.55.12\\nllvmlite                          
0.43.0\\nlm-format-enforcer                0.10.6\\nmarkdown-it-py                    3.0.0\\nMarkupSafe                        
3.0.2\\nmatplotlib-inline                 0.1.7\\nmdurl                             0.1.2\\nmistral_common                    
1.5.1\\nmodelscope                        1.21.0\\nmore-itertools                    8.10.0\\nmpmath                            
1.3.0\\nmsgpack                           1.1.0\\nmsgspec                           0.18.6\\nmultidict                         
cio                      1.6.0\\nnetworkx                          3.4.2\\nnumba                             0.60.0\\nnumpy           
1.26.4\\nnvidia-cublas-cu12                12.1.3.1\\nnvidia-cuda-cupti-cu12            12.1.105\\nnvidia-cuda-nvrtc-cu12            
12.1.105\\nnvidia-cuda-runtime-cu12          12.1.105\\nnvidia-cudnn-cu12                 9.1.0.70\\nnvidia-cufft-cu12                
11.0.2.54\\nnvidia-curand-cu12                10.3.2.106\\nnvidia-cusolver-cu12              11.4.5.107\\nnvidia-cusparse-cu12        
12.1.0.106\\nnvidia-ml-py                      12.560.30\\nnvidia-nccl-cu12                  2.20.5\\nnvidia-nvjitlink-cu12           
12.6.85\\nnvidia-nvtx-cu12                  12.1.105\\noauthlib                          3.2.0\\nopenai                            
1.58.1\\nopencv-python-headless            4.10.0.84\\norjson                            3.10.12\\noutlines                          
0.0.46\\npackaging                         24.2\\npandas                            2.2.3\\nparso                             
0.8.4\\npartial-json-parser               0.2.1.1.post4\\npexpect                           4.9.0\\npillow                            
10.4.0\\npip                               24.3.1\\npluggy                            1.5.0\\nprometheus_client                 
0.21.1\\nprometheus-fastapi-instrumentator 7.0.0\\nprompt_toolkit                    3.0.48\\npropcache                         
0.2.1\\nprotobuf                          5.29.2\\npsutil                            6.1.1\\nptyprocess                        
0.7.0\\npure_eval                         0.2.3\\npy-cpuinfo                        9.0.0\\npyairports                        
2.1.1\\npyarrow                           18.1.0\\npybind11                          2.13.6\\npycountry                         
24.6.1\\npydantic                          2.10.4\\npydantic_core                     2.27.2\\nPygments                          
2.18.0\\nPyGObject                         3.42.1\\nPyJWT                             2.3.0\\npyparsing                         
2.4.7\\npytest                            8.3.4\\npython-apt                        2.4.0+ubuntu4\\npython-dateutil                   
2.9.0.post0\\npython-dotenv                     1.0.1\\npython-multipart                  0.0.20\\npytz                              
2024.2\\nPyYAML                            6.0.2\\npyzmq                             26.2.0\\nray                               
2.40.0\\nreferencing                       0.35.1\\nregex                             2024.11.6\\nrequests                          
2.32.3\\nrich                              13.9.4\\nrich-toolkit                      0.12.0\\nrpds-py                           
0.22.3\\nsafetensors                       0.4.5\\nSecretStorage                     3.3.1\\nsentencepiece                     
0.2.0\\nsetuptools                        70.3.0\\nsglang                            0.4.0\\nshellingham                       
1.5.4\\nsix                               1.17.0\\nsniffio                           1.3.1\\nstack-data                        
0.6.3\\nstarlette                         0.41.3\\nsympy                             1.13.3\\ntiktoken                          
0.7.0\\ntokenizers                        0.21.0\\ntorch                             2.4.0\\ntorchao                           
0.7.0\\ntorchvision                       0.19.0\\ntqdm                              4.67.1\\ntraitlets                         
5.14.3\\ntransformers                      4.47.1\\ntriton                            3.0.0\\ntyper                             
0.15.1\\ntyping_extensions                 4.12.2\\ntzdata                            2024.2\\nunattended-upgrades               
0.1\\nurllib3                           2.3.0\\nuvicorn                           0.34.0\\nuvloop                            
0.21.0\\nvllm                              0.6.3.post1\\nwadllib                           1.3.6\\nwatchdog                          
\\nwcwidth                           0.2.13\\nwebsockets                        14.1\\nwheel                             
0.45.1\\nxformers                          0.0.27.post2\\nxgrammar                          0.1.8\\nxxhash                            
3.5.0\\nyarl                              1.18.3\\nzipp                              3.21.0\\n', 'Nvidia SMI': 'Fri Dec 27 08:56:22 
2024       \\n+---------------------------------------------------------------------------------------+\\n| NVIDIA-SMI 535.161.07     
Driver Version: 535.161.07   CUDA Version: 12.3     
|\\n|-----------------------------------------+----------------------+----------------------+\\n| GPU  Name                 
Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |\\n| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | 
GPU-Util  Compute M. |\\n|                                         |                      |               MIG M. 
|\\n|=========================================+======================+======================|\\n|   0  NVIDIA A10                     
On  | 00000000:31:00.0 Off |                    0 |\\n|  0%   43C    P0              57W / 150W |      0MiB / 23028MiB |      0%      
Default |\\n|                                         |                      |                  N/A 
|\\n+-----------------------------------------+----------------------+----------------------+\\n                                      
\\n+---------------------------------------------------------------------------------------+\\n| Processes:                           
|\\n|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |\\n|        ID   ID                      
Usage      |\\n|=======================================================================================|\\n|

Reproduction

python -m sglang.launch_server --model-path meta-llama/Meta-Llama-3.1-8B-Instruct \
--port 30000 --host 0.0.0.0

Environment

[2024-12-27 09:00:42] INFO _client.py:1038: HTTP Request: GET 
https://raw.githubusercontent.com/BerriAI/litellm/main/model_prices_and_context_window.json "HTTP/1.1 200 OK"
Python: 3.11.11 (main, Dec  4 2024, 08:55:07) [GCC 11.4.0]
CUDA available: True
GPU 0: NVIDIA A10
GPU 0 Compute Capability: 8.6
CUDA_HOME: None
PyTorch: 2.4.0+cu121
sglang: 0.4.0
flashinfer: 0.2.0.post1+cu124torch2.4
triton: 3.0.0
transformers: 4.47.1
torchao: 0.7.0
numpy: 1.26.4
aiohttp: 3.11.11
fastapi: 0.115.6
hf_transfer: 0.1.8
huggingface_hub: 0.27.0
interegular: 0.3.3
modelscope: 1.21.0
orjson: 3.10.12
packaging: 24.2
psutil: 6.1.1
pydantic: 2.10.4
multipart: 0.0.20
zmq: 26.2.0
uvicorn: 0.34.0
uvloop: 0.21.0
vllm: 0.6.3.post1
openai: 1.58.1
anthropic: 0.42.0
decord: 0.6.0
NVIDIA Topology: 
        GPU0 NIC0    NIC1    CPU Affinity    NUMA Affinity   GPU NUMA ID
GPU0     X      NODE    SYS     0-31,64-95      0               N/A
NIC0    NODE     X      SYS                             
NIC1    SYS     SYS      X                              

Legend:

  X    = Self
  SYS  = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
  NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
  PHB  = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
  PXB  = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
  PIX  = Connection traversing at most a single PCIe bridge
  NV#  = Connection traversing a bonded set of # NVLinks

NIC Legend:

  NIC0: 
  NIC1: 


ulimit soft: 1048576

The text was updated successfully, but these errors were encountered:

NilayYadav changed the title ~~[Bug]~~ [Bug] LLVM/Triton issue Dec 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] LLVM/Triton issue #2613

[Bug] LLVM/Triton issue #2613

NilayYadav commented Dec 27, 2024

[Bug] LLVM/Triton issue #2613

[Bug] LLVM/Triton issue #2613

Comments

NilayYadav commented Dec 27, 2024

Checklist

Describe the bug

Reproduction

Environment