IR model under 2024.4.1 and 2022.1 #27601

Jsy0220 · 2024-11-19T06:35:42Z

Hi, I want to upgrade openvino version from 2022.1.0 to 2024.4.1 recently, and have some questions:

Does it need to re-export IR model using OVC of 2024.4.1 ? or just use old one generated by MO of 2022.1.0 ?
Is compress_to_fp16 a new option in 2024.4.1 compared to 2022.1.0 ? Does it affect inference time and memory in runtime ？
I found that the results of the same IR model (generated by MO of 2022.1.0) have difference between 2022.1.0 and 2024.4.1, is it normal ?

The text was updated successfully, but these errors were encountered:

slyalin · 2024-11-19T07:21:40Z

Old IR should work. Try it first. I think you have already tried because of question 3.
Compress_to_fp16 is weight compression in IR to fp16 format. It affects the size of the IR stored in a file saving 50% of all compressible fp32 weights. It may slightly affect accuracy, inference performance is expected to be the same except, probably, loading time depending on the model, if you compare IR with compression and without compression (and not 2022 vs. 2024). If you experience accuracy issues, try disabling the compression.
There is no expectation on the bit exactness of the inference, but the result should still be valid. Do you have valid results?

Jsy0220 · 2024-11-19T07:55:33Z

@slyalin Okay, Thank you !!
For Q.3, just to confirm. There exists difference in binary between two results, and a 1e-5 error in Float for my model. I think it is acceptable.
And one more question, Is there any performance affect if using old IR in 2024 ? because I want to share the IR between two versions during upgrade transition.

slyalin · 2024-11-19T08:42:45Z

Is there any performance affect if using old IR in 2024 ?

Theoretically, 2 years difference can affect the performance because the IR conversion/transformation pipeline has been changed since then. But I would consider this as a bug if it doesn't involve some unavoidable edge cases. So using old IR if it works in both runtimes should be OK. If you can re-convert the IR with the latest version and compare, please do and share your results.

Jsy0220 · 2024-11-19T09:08:56Z

@slyalin okay, thank you

Jsy0220 · 2024-11-21T04:08:41Z

@slyalin hi, I want to ask more about performance config for CPU, I used to config vino for single-threaded inference by following configs in 2022.1.0

And I found that the configs above are deprecated in 2024.4.1, So
1.
InferenceEngine::PluginConfigParams::KEY_CPU_THREADS_NUM -> ov::inference_num_threads ？
InferenceEngine::PluginConfigParams::KEY_CPU_THROUGHPUT_STREAMS -> ov::num_streams ？
InferenceEngine::PluginConfigParams::KEY_CPU_BIND_THREAD -> ???
Is that right ? and which config can be used to replace InferenceEngine::PluginConfigParams::KEY_CPU_BIND_THREAD ?
2. what does InferenceEngine::PluginConfigParams::KEY_CPU_BIND_THREAD do ？I found that it is default YES on linux.

dmitry-gorokhov · 2024-11-21T09:45:27Z

Hi @Jsy0220,
InferenceEngine::PluginConfigParams::KEY_CPU_BIND_THREAD should be replaced by ov::hint::enable_cpu_pinning.
Please check https://docs.openvino.ai/2024/openvino-workflow/running-inference/inference-devices-and-modes/cpu-device/performance-hint-and-thread-scheduling.html for more details regarding OV threading properties.

Jsy0220 · 2024-11-21T10:10:02Z

@dmitry-gorokhov okay, and two more question:

what's relationship between ov::affinity and ov::hint::enable_cpu_pinning ? can I replace it with ov::affinity ?
I found that it is default YES on linux. Suppose there are two threads both in which run a vino with single-threaded config, if ov::hint::enable_cpu_pinning is YES, it seems that two thread are run in one core and CPU usage is about 100%, otherwise CPU usage can be about 200%. Is that normal ? and what kind of case should be set YES or NO?

dmitry-gorokhov · 2024-11-25T09:12:58Z

@Jsy0220

ov::affinity was deprecated and replaced with ov::hint::enable_cpu_pinning. ov::Affinity::NONE is mapped on ov::hint::enable_cpu_pinning == False and others on ov::hint::enable_cpu_pinning == True.
You mean you have 2 threads in the app and each thread created and runs each own compiled_model?
In that case behavior is normal. Each compiled_model doesn't know about another one so tries to make pinning starting from 0th core. In case pinning is disabled OS is responsible for thread schesuling and dispatch them on different physical cores.
We are working on the solution which allows compiled_model to reserse some cores, so prevents another compiled_model to be pinned on the same cores. You can try it already now: Reserving CPU resource in CPU inference #27321.

ilya-lavrenov assigned rkazants Nov 19, 2024

andrei-kochin assigned slyalin Nov 19, 2024

rkazants removed their assignment Nov 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

IR model under 2024.4.1 and 2022.1 #27601

IR model under 2024.4.1 and 2022.1 #27601

Jsy0220 commented Nov 19, 2024

slyalin commented Nov 19, 2024

Jsy0220 commented Nov 19, 2024

slyalin commented Nov 19, 2024

Jsy0220 commented Nov 19, 2024

Jsy0220 commented Nov 21, 2024

dmitry-gorokhov commented Nov 21, 2024

Jsy0220 commented Nov 21, 2024 •

edited

Loading

dmitry-gorokhov commented Nov 25, 2024

IR model under 2024.4.1 and 2022.1 #27601

IR model under 2024.4.1 and 2022.1 #27601

Comments

Jsy0220 commented Nov 19, 2024

slyalin commented Nov 19, 2024

Jsy0220 commented Nov 19, 2024

slyalin commented Nov 19, 2024

Jsy0220 commented Nov 19, 2024

Jsy0220 commented Nov 21, 2024

dmitry-gorokhov commented Nov 21, 2024

Jsy0220 commented Nov 21, 2024 • edited Loading

dmitry-gorokhov commented Nov 25, 2024

Jsy0220 commented Nov 21, 2024 •

edited

Loading