Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failure when start the model using TGI 3 #2819

Open
2 of 4 tasks
hahmad2008 opened this issue Dec 10, 2024 · 0 comments
Open
2 of 4 tasks

Failure when start the model using TGI 3 #2819

hahmad2008 opened this issue Dec 10, 2024 · 0 comments

Comments

@hahmad2008
Copy link

hahmad2008 commented Dec 10, 2024

System Info

I tried to serve llama3.1-8b using TGI on A10 (24G) on context length 4k.
coomand:

 docker run --gpus all -it --rm -p 8000:80 ghcr.io/huggingface/text-generation-inference:3.0.0   --model-id NousResearch/Meta-Llama-3.1-8B-Instruct --max-total-tokens 4096  --dtype bfloat16 
  • However it work with the same command using image ghcr.io/huggingface/text-generation-inference:2.2.0

but i got the following error:

2024-12-10T21:24:12.674619Z  INFO text_generation_launcher: Starting Webserver
2024-12-10T21:24:12.849356Z  INFO text_generation_router_v3: backends/v3/src/lib.rs:125: Warming up model
2024-12-10T21:25:42.531534Z ERROR warmup{max_input_length=None max_prefill_tokens=8192 max_total_tokens=Some(4096) max_batch_size=None}:warmup: text_generation_router_v3::client: backends/v3/src/client/mod.rs:45: Server error: transport error
Error: Backend(Warmup(Generation("transport error")))
2024-12-10T21:25:42.679824Z ERROR text_generation_launcher: Webserver Crashed
2024-12-10T21:25:42.684321Z  INFO text_generation_launcher: Shutting down shards
2024-12-10T21:25:42.698301Z ERROR shard-manager: text_generation_launcher: Shard complete standard error output:

2024-12-10 21:23:52.620 | INFO     | text_generation_server.utils.import_utils:<module>:80 - Detected system cuda
/opt/conda/lib/python3.11/site-packages/text_generation_server/layers/gptq/triton.py:242: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.
  @custom_fwd(cast_inputs=torch.float16)
/opt/conda/lib/python3.11/site-packages/mamba_ssm/ops/selective_scan_interface.py:158: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.
  @custom_fwd
/opt/conda/lib/python3.11/site-packages/mamba_ssm/ops/selective_scan_interface.py:231: FutureWarning: `torch.cuda.amp.custom_bwd(args...)` is deprecated. Please use `torch.amp.custom_bwd(args..., device_type='cuda')` instead.
  @custom_bwd
/opt/conda/lib/python3.11/site-packages/mamba_ssm/ops/triton/layernorm.py:507: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.
  @custom_fwd
/opt/conda/lib/python3.11/site-packages/mamba_ssm/ops/triton/layernorm.py:566: FutureWarning: `torch.cuda.amp.custom_bwd(args...)` is deprecated. Please use `torch.amp.custom_bwd(args..., device_type='cuda')` instead.
  @custom_bwd
/opt/conda/lib/python3.11/site-packages/torch/distributed/c10d_logger.py:79: FutureWarning: You are using a Backend <class 'text_generation_server.utils.dist.FakeGroup'> as a ProcessGroup. This usage is deprecated since PyTorch 2.0. Please use a public API of PyTorch Distributed instead.
  return func(*args, **kwargs) rank=0
2024-12-10T21:25:42.700830Z ERROR shard-manager: text_generation_launcher: Shard process was signaled to shutdown with signal 9 rank=0

Information

  • Docker
  • The CLI directly

Tasks

  • An officially supported command
  • My own modifications

Reproduction

 docker run --gpus all -it --rm -p 8000:80 ghcr.io/huggingface/text-generation-inference:3.0.0   --model-id NousResearch/Meta-Llama-3.1-8B-Instruct --max-total-tokens 4096  --dtype bfloat16 

Expected behavior

Should serve the model successfully

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant