Releases: huggingface/optimum-intel
Releases · huggingface/optimum-intel
v1.21.0: SD3, Flux, MiniCPM, NanoLlava, VLM Quantization, XPU, PagedAttention
What's Changed
OpenVINO
Diffusers
VLMs Modeling
- MiniCPMv support by @eaidova in #972
- NanoLlava support by @eaidova in #969
- Phi3v support by @eaidova in #977
NNCF
- Quantization support for CausalVisualLMs by @nikita-savelyevv in #951
- NF4 data type support for OV weight compression by @l-bat in #988
- NNCF 2.14 new features support by @nikita-savelyevv in #997
IPEX
INC
- Layer-wise quantization support by @changwangss in #1040
New Contributors
- @emmanuel-ferdman made their first contribution in #974
- @mvafin made their first contribution in #1033
Full Changelog: v1.20.0...v1.21.0
v1.20.1: Patch release
- Fix lora unscaling in diffusion pipelines by @eaidova in #937
- Fix compatibility with diffusers < 0.25.0 by @eaidova in #952
- Allow to use SDPA in clip models by @eaidova in #941
- Updated OVPipelinePart to have separate ov_config by @e-ddykim in #957
- Symbol use in optimum: fix misprint by @jane-intel in #948
- Fix temporary directory saving by @eaidova in #959
- Disable warning about tokenizers version for ov tokenizers >= 2024.5 by @eaidova in #962
- Restore original model_index.json after save_pretrained call by @eaidova in #961
- Add v4.46 transformers support by @echarlaix in #960
v1.20.0: multi-modal and OpenCLIP models support, transformers v4.45
OpenVINO
Multi-modal models support
Adding OVModelForVisionCausalLM
by @eaidova in #883
OpenCLIP models support
Adding OpenCLIP models support by @sbalandi in #857
from optimum.intel import OVModelCLIPVisual, OVModelCLIPText
visual_model = OVModelCLIPVisual.from_pretrained(model_name_or_path)
text_model = OVModelCLIPText.from_pretrained(model_name_or_path)
image = processor(image).unsqueeze(0)
text = tokenizer(["a diagram", "a dog", "a cat"])
image_features = visual_model(image).image_features
text_features = text_model(text).text_features
Diffusion pipeline
Adding OVDiffusionPipeline
to simplify diffusers model loading by @IlyasMoutawwakil in #889
model_id = "stabilityai/stable-diffusion-xl-base-1.0"
- pipeline = OVStableDiffusionXLPipeline.from_pretrained(model_id)
+ pipeline = OVDiffusionPipeline.from_pretrained(model_id)
image = pipeline("sailing ship in storm by Leonardo da Vinci").images[0]
NNCF GPTQ support
GPTQ support by @nikita-savelyevv in #912
Transformers v4.45
Transformers v4.45 support by @echarlaix in #902
Subfolder
Remove the restriction for the model's config to be in the model's subfolder by @tomaarsen in #933
New Contributors
- @jane-intel made their first contribution in #696
- @andreyanufr made their first contribution in #903
- @MaximProshin made their first contribution in #905
- @tomaarsen made their first contribution in #931
v1.19.0: SentenceTransformers OpenVINO support
- Support SentenceTransformers models inference by @aleksandr-mokrov in #865
from optimum.intel import OVSentenceTransformer
model_id = "sentence-transformers/all-mpnet-base-v2"
model = OVSentenceTransformer.from_pretrained(model_id, export=True)
sentences = ["This is an example sentence", "Each sentence is converted"]
embeddings = model.encode(sentences)
- Infer if the model needs to be exported or not by @echarlaix in #825
from optimum.intel import OVModelForCausalLM
- model = OVModelForCausalLM.from_pretrained("gpt2", export=True)
+ model = OVModelForCausalLM.from_pretrained("gpt2")
Compatible with transformers>=4.36,<=4.44
Full Changelog: v1.18.0...v1.19.0
v1.18.3: Patch release
Full Changelog: v1.18.2...v1.18.3
v1.18.2: Patch release
- Fix model patching for internlm2 by @eaidova in #814
- Fix loading models from cache by @eaidova in #820
- Disable tpp for un-verified models by @jiqing-feng in #822
- Update default NNCF configurationsby @KodiaqQ in #824
- Fix update causal mask for transformers 4.42 by @eaidova in #852
- Fix bf16 inference accuracy for mistral, phi3, dbrx by @eaidova in #833
- Revert rotary embedding patching for recovering gpu accuracy by @eaidova in #855
- Support transformers 4.43 by @IlyasMoutawwakil in #856
Full Changelog: v1.18.1...v1.18.2
v1.18.1: Patch release
- OV configurations alignment by @KodiaqQ in #787
- Enable transformers v4.42.0 by @echarlaix in #793
- Deprecate onnx/ort model export and quantization by @IlyasMoutawwakil in #795
- Free memory after model export by @eaidova in #800
- Update config import path for neural-compressor v2.6 by @changwangss in #801
- Pin library name to transformers for feature extraction by @IlyasMoutawwakil in #804
Full Changelog: v1.18.0...v1.18.1
v1.18.0: Arctic, Jais, OpenVINO pipelines
OpenVINO
- Enable Arctic, Jais export by @eaidova in #726
- Enable GLM-4 export by @eaidova in #776
- Move data-driven quantization after model export for text-generation models by @nikita-savelyevv in #721
- Create default token_type_ids when needed for inference by @echarlaix #757
- Resolve default int4 config for local models by @eaidova in #760
- Update to NNCF 2.11 by @nikita-savelyevv in #763
- Fix quantization config by @echarlaix in #773
- Expose trust remote code argument when generating calibration dataset for datasets >= v2.20.0 by @echarlaix #767
- Add pipelines by @echarlaix in #740
from optimum.intel.pipelines import pipeline
# Load openvino model
ov_pipe = pipeline("text-generation", "helenai/gpt2-ov", accelerator="openvino")
# Load pytorch model and convert it to openvino before inference
pipe = pipeline("text-generation", "gpt2", accelerator="openvino")
IPEX
- Enable IPEX patching for llama for >= v2.3 by @jiqing-feng in #725
- Refactor llama modeling for IPEX patching by @faaany in #728
- Refactor model loading by @jiqing-feng in #752
v1.17.2: Patch release
- Fix compatibility with transformers < v4.39.0 by @echarlaix in #754
v1.17.1: Patch release
- Add setuptools to fix issue with Python 3.12 by @helena-intel in #747
- Disable warnings by @helena-intel in #748
- Fix Windows TemporaryDirectory issue by @helena-intel in #749
- Fix generation config loading and saving by @eaidova in #750