Releases: DLTcollab/sse2neon
Releases · DLTcollab/sse2neon
v1.8.0
What's Changed
- Fix Clang showing incorrect GCC version warning by @brechtvl in #623
- Restore options for precision of div/rcp/sqrt/rsqrt by @brechtvl in #626
- Optimize CRC intrinisics for targets lacking of CRC extension by @Cuda-Chen in #627
- test: Avoid errors when cross compile by gcc-8.3/9.2 by @howjmay in #625
- Improve unsupported target message by @ankith26 in #630
- Fix with _mm_div_ps when SSE2NEON_PRECISE_DIV=1 by @sergeyvfx in #631
- Use unaligned data types for unaligned intrinsics by @Logikable in #632
- fix: Fix uninitialized parameters by @howjmay in #636
- fix: Disable optimization to avoid pontential errors by @howjmay in #640
- Fix minor typos in the sse2neon header by @ankith26 in #641
- fix: Fix strict-aliasing errors by @howjmay in #638
- fix test_mm_dp_pd test by @alexorlov124 in #643
- Add support for clang-cl on Windows by @anthony-linaro in #633
- Fix performance regression after OPTNONE changes by @sergeyvfx in #646
- Allow optimization and use fesetround(), fegetround() by @howjmay in #642
- CI: Bump dependency by @jserv in #650
- Allow to specify -DSSE2NEON_SUPPRESS_WARNINGS to avoid the #warning about optimization issues by @rouault in #651
- Add _MM_SHUFFLE2() macro for shuffle parameter for _mm_shuffle_pd() by @rouault in #652
- README.md: mention GDAL by @rouault in #654
- CI: Update Arm GNU Toolchain and use Ubuntu 24.04 by @jserv in #656
- Fix undefined mm{malloc,free} with LLVM/MinGW by @jserv in #657
New Contributors
- @ankith26 made their first contribution in #630
- @sergeyvfx made their first contribution in #631
- @Logikable made their first contribution in #632
- @alexorlov124 made their first contribution in #643
Full Changelog: v1.7.0...v1.8.0
v1.7.0
What's Changed
- refactor: Add missing ARM64 implementation by @howjmay in #576
- test: Build/run with crypto and/or crc by @howjmay in #574
- doc: Describe the right coverage of SSE2NEON_PRECISE_MINMAX by @howjmay in #578
- refactor: Reimplement _mm_movelh_ps for Arm64 by @howjmay in #579
- tests: Cover all immediate numbers by @howjmay in #584
- test: Use macro for validate results by @howjmay in #585
- Improve precision of mm{rsqrt,sqrt,rcp,div}_{ps,ss} conversions by @Cuda-Chen in #580
- Fix MSVC compile issues by @toxieainc in #588
- Tweak MSVC ifdef guard for _BitScanForward64 by @aqrit in #592
- Add notice that NEON handles certain IEEE single-precision values by @Cuda-Chen in #593
- Add infinity test in
test_mm_{max,min}_{pd,sd}
by @Cuda-Chen in #594 - Remove Kahan algorithm in
_mm_dp_ps
by @Cuda-Chen in #597 - MSVC support by @anthony-linaro in #596
- test: Cover all the valid imm range in tests by @howjmay in #586
- Add test running for MSVC to CI by @anthony-linaro in #598
- Align result to SSE when input is 0.0f/-0.0f in mm_rsqrt{ps, ss} by @Cuda-Chen in #599
- fix: Fix exceeding width of type warning by @howjmay in #601
- docs: Fix the typos by @howjmay in #603
- docs: Fix the typos by @spacemiqote in #605
- Fix build for gcc-13 and 32 bit arm systems. by @balister in #609
- Fix unused parameters warning by @anakinxc in #610
- Fixed gcc strict prototype and other build errors by @mnjdhl in #611
- Fix
_mm_cmplt_sd
and_mm_cmpnlt_sd
test cases by @Cuda-Chen in #612 - disambiguate vector type to avoid errors depending on lax conversion … by @JoachimSchurig in #614
- docs: fix typo failback by @howjmay in #616
- Introduce fast and deterministic RNG by @Cuda-Chen in #615
- fix: Fix typo nand by @howjmay in #617
- fix: Fix MSVC warnings by @howjmay in #604
- Add A32 support in CI by @Cuda-Chen in #620
- Fix _mm_test_mix_ones_zeros and _mm_testnzc_si128 by @aqrit in #621
New Contributors
- @anthony-linaro made their first contribution in #596
- @spacemiqote made their first contribution in #605
- @anakinxc made their first contribution in #610
- @mnjdhl made their first contribution in #611
- @JoachimSchurig made their first contribution in #614
Full Changelog: v1.6.0...v1.7.0
v1.6.0
What's Changed
- 100% intrinsics coverage for SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2 and AES extension.
- Implement
_rdtsc
by @Cuda-Chen in #532 - Improve
_mm_srai_epi32
to handle complex arguments by @Developer-Ecosystem-Engineering in #533 - Implement
_mm_cmpestri
and_mm_cmpestrm
by @Cuda-Chen in #534 - Implement five
_mm_cmpestr
by @Cuda-Chen in #552 - Implement
_mm_cmpistri
and_mm_cmpistrm
by @Cuda-Chen in #553 - Implement five
_mm_cmpistr
by @Cuda-Chen in #555 - tests: Fix warnings raised by clang++ by @Cuda-Chen in #540
- Exclude
_mm_malloc
/free
definitions on Windows by @invertego in #541 - Remove designated initialization of an array by @invertego in #542
- Reintroduce
ext
-based implementations for shift intrinsics by @AymenQ in #543 - Improve performance of float-to-integer intrinsics by @AymenQ in #546
- Support
__builtin_shuffle
as an alternative to__builtin_shufflevector
by @AymenQ in #545 - Improve performance of various intrinsics by @AymenQ in #549
- Vectorize
_mm_minpos_epu16
by @AymenQ in #551 - Align
_mm_prefetch
behavior to document by @howjmay in #550 - Add clang/Windows build by @invertego in #556
- Test all valid immediates in
_mm_dp_pd
by @Cuda-Chen in #557 - Optimize
_mm_aesenclast_si128
for Arm64 by @howjmay in #561 - Implement
_mm_aesdec_si128
by @howjmay in #559 - Implement
_mm_aesdeclast_si128
by @howjmay in #565 - Implement
_mm_aesimc_si128
by @howjmay in #567 - Optimize
aeskeygenassist_si128
for Arm64 by @howjmay in #569 - Update Intel intrinsics document links by @howjmay in #570
New Contributors
- @Cuda-Chen made their first contribution in #532
- @Developer-Ecosystem-Engineering made their first contribution in #533
- @balister made their first contribution in #535
- @invertego made their first contribution in #541
- @AymenQ made their first contribution in #543
Full Changelog: v1.5.1...v1.6.0
v1.5.1
What's Changed
- fix: Fix dividing zero error in validateFloatError by @howjmay in #515
- Fix compilation with standardized C compilers by @jserv in #516
- Fix _mm_storel_epi64 by @andrewevstyukhin in #517
- Add support for 32-bit targets on ARMv8 architectures by @jonathanhue in #520
- Use CRC and directed rounding intrinsics on A32 by @jonathanhue in #522
- fix: Fix alignment in tests by @howjmay in #523
New Contributors
- @sleepybishop made their first contribution in #508
- @luzpaz made their first contribution in #509
- @andrewevstyukhin made their first contribution in #517
- @jonathanhue made their first contribution in #520
Full Changelog: v1.5.0...v1.5.1