[Feature] DeepSeek V3 optimization #2591

zhyncs · 2024-12-26T08:52:39Z

libratiger · 2024-12-26T10:38:29Z

Very quick response !
I understand that the overlap scheduler is model-independent and is a general optimization that should be supported by default.
At least some special optimizations are needed?

merrymercy · 2024-12-26T11:33:36Z

The overlap scheduler is model-independent but has not been supported when using dp attention. We have a private branch for this and will upstream it soon.

fengyang95 · 2024-12-26T13:27:00Z

Is the memory sufficient for an 8 gpus instance? This model size is too large.

zhyncs · 2024-12-26T15:10:45Z

Is the memory sufficient for an 8 gpus instance? This model size is too large.

671B works on H200 * 8 with FP8 (671 < 141 * 8)

zhyncs · 2024-12-26T16:39:53Z

Hi @fengyang95 You can also consider multi node.

If you do not have GPUs with large enough memory, please try multi-node tensor parallelism (help 1 help 2).

zhyncs · 2024-12-26T18:59:38Z

FYI Due to the tight schedule, SGLang v0.4.1 currently only provides preliminary support for DeepSeek V3. To make it run more cost-efficiently, we need to complete most of the optimizations mentioned above. If you are interested in any of the above optimizations, feel free to join the SGLang Slack for discussions or contribute a PR. We hope to complete these optimizations quickly and appreciate any discussion and contributions.

zhyncs · 2024-12-27T17:47:27Z

Update: SGLang v0.4.1.post1 supports CUDA Graph for DeepSeek V3, please use the latest version.

pip install "sglang[all]==0.4.1.post1" --find-links https://flashinfer.ai/whl/cu124/torch2.4/flashinfer

zhyncs added enhancement New feature or request performance quant LLM Quantization labels Dec 26, 2024

zhyncs assigned HaiShaw, merrymercy, ispobock, HandH1998, zhyncs and Ying1123 Dec 26, 2024

zhyncs pinned this issue Dec 26, 2024

zhyncs added the high priority label Dec 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] DeepSeek V3 optimization #2591

[Feature] DeepSeek V3 optimization #2591

zhyncs commented Dec 26, 2024 •

edited

Loading

libratiger commented Dec 26, 2024

merrymercy commented Dec 26, 2024 •

edited

Loading

fengyang95 commented Dec 26, 2024 •

edited

Loading

zhyncs commented Dec 26, 2024

zhyncs commented Dec 26, 2024

zhyncs commented Dec 26, 2024

zhyncs commented Dec 27, 2024

[Feature] DeepSeek V3 optimization #2591

[Feature] DeepSeek V3 optimization #2591

Comments

zhyncs commented Dec 26, 2024 • edited Loading

Checklist

Usage

Features

Related resources

libratiger commented Dec 26, 2024

merrymercy commented Dec 26, 2024 • edited Loading

fengyang95 commented Dec 26, 2024 • edited Loading

zhyncs commented Dec 26, 2024

zhyncs commented Dec 26, 2024

zhyncs commented Dec 26, 2024

zhyncs commented Dec 27, 2024

zhyncs commented Dec 26, 2024 •

edited

Loading

merrymercy commented Dec 26, 2024 •

edited

Loading

fengyang95 commented Dec 26, 2024 •

edited

Loading