v0.15.4 Patch Release
What's Changed
- Update version.txt after 0.15.3 release by @loadams in #6652
- Fix expert grad scaling problem with ZeRO optimizer by @wyooyw in #6546
- Add attribute check for language_model when replace last linear module by @Yejing-Lai in #6650
- fix init_device_mesh for torch 2.4 by @Lzhang-hub in #6614
- Fix dynamo issue by @oraluben in #6527
- sequence parallel for uneven heads by @inkcherry in #6392
- Add fallback for is_compiling by @tohtana in #6663
- Update profiler registration check by @loadams in #6668
- Add support for H100/sm_90 arch compilation by @loadams in #6669
- Update Gaudi2 docker image by @loadams in #6677
- Update gaudi2 docker version to latest release (1.18) by @raza-sikander in #6648
- Update base docker image for A6000 GPU tests by @loadams in #6681
- Remove packages that no longer need to be updated in the latest container by @loadams in #6682
- Fix training of pipeline based peft's lora model by @xuanhua in #5477
- Update checkout action to latest version by @loadams in #5021
- Add attribute check to support git-base autotp by @Yejing-Lai in #6688
- fix memcpy issue on backward for zero-infinity by @xylian86 in #6670
- Free memory in universal checkpointing tests by @tohtana in #6693
- Explictly set device when reusing dist env by @tohtana in #6696
- Update URL in README Pipeline Status for Huawei Ascend NPU by @xuedinge233 in #6706
- Pin transformers to 4.45.2 in nv-ds-chat workflow by @loadams in #6710
- [Bug Fix] Support threads_per_head < 64 for wavefront size of 64 by @jagadish-amd in #6622
- Use one param coordinator for both train/inference scenarios by @tohtana in #6662
- Update yapf version by @loadams in #6721
- Update flake8 version by @loadams in #6722
- Switch what versions of python are supported by @loadams in #5676
New Contributors
Full Changelog: v0.15.3...v0.15.4