Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IR model under 2024.4.1 and 2022.1 #27601

Open
Jsy0220 opened this issue Nov 19, 2024 · 8 comments
Open

IR model under 2024.4.1 and 2022.1 #27601

Jsy0220 opened this issue Nov 19, 2024 · 8 comments
Assignees

Comments

@Jsy0220
Copy link

Jsy0220 commented Nov 19, 2024

Hi, I want to upgrade openvino version from 2022.1.0 to 2024.4.1 recently, and have some questions:

  1. Does it need to re-export IR model using OVC of 2024.4.1 ? or just use old one generated by MO of 2022.1.0 ?
  2. Is compress_to_fp16 a new option in 2024.4.1 compared to 2022.1.0 ? Does it affect inference time and memory in runtime ?
  3. I found that the results of the same IR model (generated by MO of 2022.1.0) have difference between 2022.1.0 and 2024.4.1, is it normal ?
@slyalin
Copy link
Contributor

slyalin commented Nov 19, 2024

  1. Old IR should work. Try it first. I think you have already tried because of question 3.
  2. Compress_to_fp16 is weight compression in IR to fp16 format. It affects the size of the IR stored in a file saving 50% of all compressible fp32 weights. It may slightly affect accuracy, inference performance is expected to be the same except, probably, loading time depending on the model, if you compare IR with compression and without compression (and not 2022 vs. 2024). If you experience accuracy issues, try disabling the compression.
  3. There is no expectation on the bit exactness of the inference, but the result should still be valid. Do you have valid results?

@Jsy0220
Copy link
Author

Jsy0220 commented Nov 19, 2024

@slyalin Okay, Thank you !!
For Q.3, just to confirm. There exists difference in binary between two results, and a 1e-5 error in Float for my model. I think it is acceptable.
And one more question, Is there any performance affect if using old IR in 2024 ? because I want to share the IR between two versions during upgrade transition.

@slyalin
Copy link
Contributor

slyalin commented Nov 19, 2024

Is there any performance affect if using old IR in 2024 ?

Theoretically, 2 years difference can affect the performance because the IR conversion/transformation pipeline has been changed since then. But I would consider this as a bug if it doesn't involve some unavoidable edge cases. So using old IR if it works in both runtimes should be OK. If you can re-convert the IR with the latest version and compare, please do and share your results.

@Jsy0220
Copy link
Author

Jsy0220 commented Nov 19, 2024

@slyalin okay, thank you

@rkazants rkazants removed their assignment Nov 19, 2024
@Jsy0220
Copy link
Author

Jsy0220 commented Nov 21, 2024

@slyalin hi, I want to ask more about performance config for CPU, I used to config vino for single-threaded inference by following configs in 2022.1.0
WeChatWorkScreenshot_58824650-a7d4-4b41-ac04-9558e98e70b5
And I found that the configs above are deprecated in 2024.4.1, So
1.
InferenceEngine::PluginConfigParams::KEY_CPU_THREADS_NUM -> ov::inference_num_threads
InferenceEngine::PluginConfigParams::KEY_CPU_THROUGHPUT_STREAMS -> ov::num_streams
InferenceEngine::PluginConfigParams::KEY_CPU_BIND_THREAD -> ???
Is that right ? and which config can be used to replace InferenceEngine::PluginConfigParams::KEY_CPU_BIND_THREAD ?
2. what does InferenceEngine::PluginConfigParams::KEY_CPU_BIND_THREAD do ?I found that it is default YES on linux.

@dmitry-gorokhov
Copy link
Contributor

Hi @Jsy0220,
InferenceEngine::PluginConfigParams::KEY_CPU_BIND_THREAD should be replaced by ov::hint::enable_cpu_pinning.
Please check https://docs.openvino.ai/2024/openvino-workflow/running-inference/inference-devices-and-modes/cpu-device/performance-hint-and-thread-scheduling.html for more details regarding OV threading properties.

@Jsy0220
Copy link
Author

Jsy0220 commented Nov 21, 2024

@dmitry-gorokhov okay, and two more question:

  1. what's relationship between ov::affinity and ov::hint::enable_cpu_pinning ? can I replace it with ov::affinity ?
  2. I found that it is default YES on linux. Suppose there are two threads both in which run a vino with single-threaded config, if ov::hint::enable_cpu_pinning is YES, it seems that two thread are run in one core and CPU usage is about 100%, otherwise CPU usage can be about 200%. Is that normal ? and what kind of case should be set YES or NO?

@dmitry-gorokhov
Copy link
Contributor

@Jsy0220

  1. ov::affinity was deprecated and replaced with ov::hint::enable_cpu_pinning. ov::Affinity::NONE is mapped on ov::hint::enable_cpu_pinning == False and others on ov::hint::enable_cpu_pinning == True.
  2. You mean you have 2 threads in the app and each thread created and runs each own compiled_model?
    In that case behavior is normal. Each compiled_model doesn't know about another one so tries to make pinning starting from 0th core. In case pinning is disabled OS is responsible for thread schesuling and dispatch them on different physical cores.
    We are working on the solution which allows compiled_model to reserse some cores, so prevents another compiled_model to be pinned on the same cores. You can try it already now: Reserving CPU resource in CPU inference #27321.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants