Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Entire system crashes when get to warm up model #2853

Open
1 of 4 tasks
ad-astra-video opened this issue Dec 17, 2024 · 1 comment
Open
1 of 4 tasks

Entire system crashes when get to warm up model #2853

ad-astra-video opened this issue Dec 17, 2024 · 1 comment

Comments

@ad-astra-video
Copy link

ad-astra-video commented Dec 17, 2024

System Info

model=meta-llama/Llama-3.3-70B-Instruct
# share a volume with the Docker container to avoid downloading weights every run
volume=/srv/ai/data/tgi

docker run --gpus "1,2,3,4" --shm-size 1g -e HF_TOKEN=[TOKEN] -p 8080:80 -v $volume:/data \
ghcr.io/huggingface/text-generation-inference:3.0.0 \
--model-id $model \
--quantize eetq \
--cuda-memory-fraction 0.95

4x 3090 tis, epyc cpu, 256gb ram

Information

  • Docker
  • The CLI directly

Tasks

  • An officially supported command
  • My own modifications

Reproduction

Run docker command above

2024-12-17T17:23:53.961980Z  INFO text_generation_launcher: Using prefill chunking = True
2024-12-17T17:23:54.547663Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=3
2024-12-17T17:23:54.547663Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-12-17T17:23:54.558361Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=1
2024-12-17T17:23:54.572348Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=2
2024-12-17T17:23:54.821433Z  INFO text_generation_launcher: Server started at unix:///tmp/text-generation-server-1
2024-12-17T17:23:54.821492Z  INFO text_generation_launcher: Server started at unix:///tmp/text-generation-server-3
2024-12-17T17:23:54.821530Z  INFO text_generation_launcher: Server started at unix:///tmp/text-generation-server-2
2024-12-17T17:23:54.847944Z  INFO shard-manager: text_generation_launcher: Shard ready in 150.41845764s rank=3
2024-12-17T17:23:54.858639Z  INFO shard-manager: text_generation_launcher: Shard ready in 150.432820265s rank=1
2024-12-17T17:23:54.872643Z  INFO shard-manager: text_generation_launcher: Shard ready in 150.439607673s rank=2
2024-12-17T17:23:55.047221Z  INFO text_generation_launcher: Server started at unix:///tmp/text-generation-server-0
2024-12-17T17:23:55.048286Z  INFO shard-manager: text_generation_launcher: Shard ready in 150.622573521s rank=0
2024-12-17T17:23:55.115403Z  INFO text_generation_launcher: Starting Webserver
2024-12-17T17:23:55.210971Z  INFO text_generation_router_v3: backends/v3/src/lib.rs:125: Warming up model
2024-12-17T17:23:55.231460Z  INFO text_generation_launcher: Using optimized Triton indexing kernels.

After this the server dies and have to manuall power cycle

Full logs trying smaller model and tried disabling cuda-graphs

2024-12-17T18:03:27.087401Z  INFO text_generation_launcher: Args {
    model_id: "Qwen/Qwen2.5-32B-Instruct",
    revision: None,
    validation_workers: 2,
    sharded: None,
    num_shard: None,
    quantize: Some(
        Eetq,
    ),
    speculate: None,
    dtype: None,
    kv_cache_dtype: None,
    trust_remote_code: false,
    max_concurrent_requests: 128,
    max_best_of: 2,
    max_stop_sequences: 4,
    max_top_n_tokens: 5,
    max_input_tokens: None,
    max_input_length: None,
    max_total_tokens: None,
    waiting_served_ratio: 0.3,
    max_batch_prefill_tokens: None,
    max_batch_total_tokens: None,
    max_waiting_tokens: 20,
    max_batch_size: None,
    cuda_graphs: Some(
        [
            0,
        ],
    ),
    hostname: "4eee9dca0df9",
    port: 80,
    shard_uds_path: "/tmp/text-generation-server",
    master_addr: "localhost",
    master_port: 29500,
    huggingface_hub_cache: None,
    weights_cache_override: None,
    disable_custom_kernels: false,
    cuda_memory_fraction: 0.95,
    rope_scaling: None,
    rope_factor: None,
    json_output: false,
    otlp_endpoint: None,
    otlp_service_name: "text-generation-inference.router",
    cors_allow_origin: [],
    api_key: None,
    watermark_gamma: None,
    watermark_delta: None,
    ngrok: false,
    ngrok_authtoken: None,
    ngrok_edge: None,
    tokenizer_config_path: None,
    disable_grammar_support: false,
    env: false,
    max_client_batch_size: 4,
    lora_adapters: None,
    usage_stats: On,
    payload_limit: 2000000,
    enable_prefill_logprobs: false,
}
2024-12-17T18:03:27.088023Z  INFO hf_hub: Token file not found "/data/token"
2024-12-17T18:03:28.994330Z  INFO text_generation_launcher: Using attention flashinfer - Prefix caching true
2024-12-17T18:03:28.994349Z  INFO text_generation_launcher: Sharding model on 4 processes
2024-12-17T18:03:29.030950Z  WARN text_generation_launcher: Unkown compute for card nvidia-geforce-rtx-3090-ti
2024-12-17T18:03:29.064926Z  INFO text_generation_launcher: Default `max_batch_prefill_tokens` to 4096
2024-12-17T18:03:29.065078Z  INFO download: text_generation_launcher: Starting check and download process for Qwen/Qwen2.5-32B-Instruct
2024-12-17T18:03:32.104130Z  INFO text_generation_launcher: Files are already present on the host. Skipping download.
2024-12-17T18:03:32.680081Z  INFO download: text_generation_launcher: Successfully downloaded weights for Qwen/Qwen2.5-32B-Instruct
2024-12-17T18:03:32.680348Z  INFO shard-manager: text_generation_launcher: Starting shard rank=1
2024-12-17T18:03:32.680364Z  INFO shard-manager: text_generation_launcher: Starting shard rank=0
2024-12-17T18:03:32.680439Z  INFO shard-manager: text_generation_launcher: Starting shard rank=3
2024-12-17T18:03:32.686107Z  INFO shard-manager: text_generation_launcher: Starting shard rank=2
2024-12-17T18:03:35.215815Z  INFO text_generation_launcher: Using prefix caching = True
2024-12-17T18:03:35.215842Z  INFO text_generation_launcher: Using Attention = flashinfer
2024-12-17T18:03:42.713034Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=3
2024-12-17T18:03:42.714007Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-12-17T18:03:42.714678Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=1
2024-12-17T18:03:42.721143Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=2
2024-12-17T18:03:52.722256Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=3
2024-12-17T18:03:52.723416Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-12-17T18:03:52.723960Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=1
2024-12-17T18:03:52.730231Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=2
2024-12-17T18:04:02.731685Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=3
2024-12-17T18:04:02.733008Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-12-17T18:04:02.733511Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=1
2024-12-17T18:04:02.739340Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=2
2024-12-17T18:04:12.740983Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=3
2024-12-17T18:04:12.742778Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-12-17T18:04:12.743260Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=1
2024-12-17T18:04:12.748509Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=2
2024-12-17T18:04:22.750201Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=3
2024-12-17T18:04:22.752482Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-12-17T18:04:22.753057Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=1
2024-12-17T18:04:22.757785Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=2
2024-12-17T18:04:32.759340Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=3
2024-12-17T18:04:32.762067Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-12-17T18:04:32.762852Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=1
2024-12-17T18:04:32.767034Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=2
2024-12-17T18:04:42.768492Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=3
2024-12-17T18:04:42.771758Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-12-17T18:04:42.772535Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=1
2024-12-17T18:04:42.776268Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=2
2024-12-17T18:04:52.777706Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=3
2024-12-17T18:04:52.781362Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-12-17T18:04:52.782289Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=1
2024-12-17T18:04:52.785605Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=2
2024-12-17T18:05:02.786995Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=3
2024-12-17T18:05:02.790997Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-12-17T18:05:02.792054Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=1
2024-12-17T18:05:02.794933Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=2
2024-12-17T18:05:12.796209Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=3
2024-12-17T18:05:12.800615Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-12-17T18:05:12.802012Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=1
2024-12-17T18:05:12.804257Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=2
2024-12-17T18:05:22.805536Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=3
2024-12-17T18:05:22.810307Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-12-17T18:05:22.811833Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=1
2024-12-17T18:05:22.813416Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=2
2024-12-17T18:05:32.814759Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=3
2024-12-17T18:05:32.819792Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-12-17T18:05:32.821590Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=1
2024-12-17T18:05:32.821834Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=2
2024-12-17T18:05:42.824027Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=3
2024-12-17T18:05:42.829566Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-12-17T18:05:42.830560Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=2
2024-12-17T18:05:42.831422Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=1
2024-12-17T18:05:52.833387Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=3
2024-12-17T18:05:52.839573Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-12-17T18:05:52.840175Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=2
2024-12-17T18:05:52.841278Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=1
2024-12-17T18:06:01.763800Z  INFO text_generation_launcher: Using prefill chunking = True
2024-12-17T18:06:02.627022Z  INFO text_generation_launcher: Server started at unix:///tmp/text-generation-server-1
2024-12-17T18:06:02.627076Z  INFO text_generation_launcher: Server started at unix:///tmp/text-generation-server-3
2024-12-17T18:06:02.627110Z  INFO text_generation_launcher: Server started at unix:///tmp/text-generation-server-0
2024-12-17T18:06:02.642621Z  INFO shard-manager: text_generation_launcher: Shard ready in 149.940971364s rank=3
2024-12-17T18:06:02.649583Z  INFO shard-manager: text_generation_launcher: Shard ready in 149.948711278s rank=0
2024-12-17T18:06:02.650706Z  INFO shard-manager: text_generation_launcher: Shard ready in 149.949875248s rank=1
2024-12-17T18:06:02.848613Z  INFO text_generation_launcher: Server started at unix:///tmp/text-generation-server-2
2024-12-17T18:06:02.849891Z  INFO shard-manager: text_generation_launcher: Shard ready in 150.143446295s rank=2
2024-12-17T18:06:02.909856Z  INFO text_generation_launcher: Starting Webserver
2024-12-17T18:06:03.001599Z  INFO text_generation_router_v3: backends/v3/src/lib.rs:125: Warming up model
2024-12-17T18:06:03.023245Z  INFO text_generation_launcher: Using optimized Triton indexing kernels.

Expected behavior

No system crash

@KreshLaDoge
Copy link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants