You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@endomorphosis may I ask how you exported the Qwen/Qwen2-7B model? Are you following this guide for NPU inference? If possible please check if the issue still present on OpenVINO 2024.6 version, download here.
In addition please provide more details about your environment (CPU SKU, Windows 10/11, NPU driver version, Python version, etc.).
I upgraded the packages to the latest versions, and I tried both int8 and int4 quantization methods, and while they work on GPU, currently on NPU the fail silently and cause the entire process to exit (on windows)
OpenVINO Version
2024.5
Operating System
Windows System
Device used for inference
NPU
Framework
None
Model used
Qwen2 7b
Issue description
[ERROR] 21:40:27.063 [vpux-compiler] Got Diagnostic at loc(fused<{name = "self.model.layers.1.self_attn.k_proj.weight/fq_weights_1", type = "Multiply"}>["self.model.layers.1.self_attn.k_proj.weight/fq_weights_1", "artificial_fq", "fq_in"]) : failed to legalize operation 'IE.FakeQuantize' that was explicitly marked illegal
loc(fused<{name = "self.model.layers.1.self_attn.k_proj.weight/fq_weights_1", type = "Multiply"}>["self.model.layers.1.self_attn.k_proj.weight/fq_weights_1", "artificial_fq", "fq_in"]): error: failed to legalize operation 'IE.FakeQuantize' that was explicitly marked illegal
[ERROR] 21:40:27.065 [vpux-compiler] Failed Pass HandleFakeQuantHasNegativeScales on Operation loc(fused<{name = "main", type = "Func"}>["main"])
[ERROR] 21:40:27.065 [vpux-compiler] Failed Pass mlir::detail::OpToOpPassAdaptor on Operation loc(fused<{name = "module", type = "Module"}>["module"])
Step-by-step reproduction
ov_model = ov_genai.LLMPipeline(model_dst_path, device=device)
Relevant log output
Issue submission checklist
The text was updated successfully, but these errors were encountered: