-
Notifications
You must be signed in to change notification settings - Fork 780
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NEPooling3dLayer performance issue #1107
Comments
I prepared a benchdnn reference reproducer and checked it on Ampere server. Benchdnn
The last benchdnn command gives me ACL
The last command gives me |
Hi @alvoron Could you please try rebuilding the library with Hope this helps |
I rebuilt ACL:
and got P.S.
That's why I set |
Hi @alvoron
Can you please point us to the actual reference implementation you're using? How do you make the measurements for both backends reference and ACL? Is it a single binary you're using? |
Hi @alvoron I made some changes to our validation suite to assess the performance, see the results below, neon backend is much faster than our reference code.
|
May we refer to I assume, my benchdnn command equals to ACL kernel configuration. Please let me know if I missed something. |
Hi @alvoron I tried this on Ampere altra using See below, the output of the build without ACL:
Can you please double check on your side? |
Hi @alvoron Please see below
Hope this helps |
@morgolock
Below is a single test run with
I'm wondering why your oneDNN results is much slower than mine: 5.19 ms vs 0.8 ms.
|
Output of 'strings libarm_compute.so | grep arm_compute_version':
arm_compute_version=v24.02.1 Build options: {'neon': '1', 'opencl': '0', 'openmp': '0', 'cppthreads': '1', 'arch': 'armv8.6-a', 'Werror': 'false', 'validation_tests': '1', 'os': 'macos', 'build': 'native', 'fixed_format_kernels': '1'} Git hash=b'f2eda6665c12d568e179f5b0e7a24ccdc0ac824d'
Platform:
Apple M2 Pro
Operating System:
macOS 13.4
Problem description:
NEPooling3dLayer
provides twice much latency rather than reference C++ pooling implementation: 6.5 ms vs 3.5 ms.Reproducer
How reproducer was built
The reproducer gives ~6500 microseconds on my M2 Pro, which is twice slower than reference C++ implementation of Pooling.
Could you please review potential performance issues in
NEPooling3dLayer
?The text was updated successfully, but these errors were encountered: