Intel Neural Compressor Release 3.2

Latest

Latest

thuang6 released this 28 Dec 13:17

· 0 commits to master since this release

ec73b6c

Highlights
Features
Improvements
Bug Fixes
Validated Hardware
Validated Configurations

Highlights

Aligned with Habana 1.19 release with the improvements on FP8 and INT4 quantization for Intel® Gaudi® AI accelerator
INT4 weight-only quantization on Intel® Arc™ B-Series Graphics GPU (code-named BattleMage)

Features

Saving and loading FP8 checkpoint on Gaudi
Loading vLLM/llm-compressor compatible FP8 checkpoint on Gaudi
Arbitrary scale method support on Gaudi
AutoRound INT4 weight-only quantization on Gaudi
Block-wise calibration for LLM on Gaudi
INT4 weight-only quantization on BattleMage

Improvements

Improve FP8 performance by setting scale as scalar tensor on Gaudi
Integrate AutoRound 0.4.2 with VLM quantization improvements
Improve safetensors loading for layer-wise quantization in Transformers-like API
Improve non-contiguous weight saving in Transformers-like API

Bug Fixes

Fix layer-wise quantization issue in GPTQ on client GPU
Fix glm-4-9b model out-of-memory issue on BattleMage

Validated Hardware 

Intel Gaudi Al Accelerators (Gaudi 2 and 3)
Intel Xeon Scalable processor (4th, 5th, 6th Gen)
Intel Core Ultra Processors (Series 1 and 2)
Intel Data Center GPU Max Series (1100)
Intel Arc B-Series Graphics GPU (B580)

Validated Configurations

Centos 8.4 & Ubuntu 22.04 & Win 11
Python 3.9, 3.10, 3.11, 3.12
PyTorch/IPEX 2.3, 2.4, 2.5

Assets 2