Releases · DLTcollab/sse2neon

25 Dec 16:09

jserv

v1.8.0

3cf6976

v1.8.0 Latest

Latest

What's Changed

Fix Clang showing incorrect GCC version warning by @brechtvl in #623
Restore options for precision of div/rcp/sqrt/rsqrt by @brechtvl in #626
Optimize CRC intrinisics for targets lacking of CRC extension by @Cuda-Chen in #627
test: Avoid errors when cross compile by gcc-8.3/9.2 by @howjmay in #625
Improve unsupported target message by @ankith26 in #630
Fix with _mm_div_ps when SSE2NEON_PRECISE_DIV=1 by @sergeyvfx in #631
Use unaligned data types for unaligned intrinsics by @Logikable in #632
fix: Fix uninitialized parameters by @howjmay in #636
fix: Disable optimization to avoid pontential errors by @howjmay in #640
Fix minor typos in the sse2neon header by @ankith26 in #641
fix: Fix strict-aliasing errors by @howjmay in #638
fix test_mm_dp_pd test by @alexorlov124 in #643
Add support for clang-cl on Windows by @anthony-linaro in #633
Fix performance regression after OPTNONE changes by @sergeyvfx in #646
Allow optimization and use fesetround(), fegetround() by @howjmay in #642
CI: Bump dependency by @jserv in #650
Allow to specify -DSSE2NEON_SUPPRESS_WARNINGS to avoid the #warning about optimization issues by @rouault in #651
Add _MM_SHUFFLE2() macro for shuffle parameter for _mm_shuffle_pd() by @rouault in #652
README.md: mention GDAL by @rouault in #654
CI: Update Arm GNU Toolchain and use Ubuntu 24.04 by @jserv in #656
Fix undefined mm{malloc,free} with LLVM/MinGW by @jserv in #657

New Contributors

@ankith26 made their first contribution in #630
@sergeyvfx made their first contribution in #631
@Logikable made their first contribution in #632
@alexorlov124 made their first contribution in #643

Full Changelog: v1.7.0...v1.8.0

Contributors

sergeyvfx, brechtvl, and 8 other contributors

Assets 2

25 Dec 20:41

jserv

v1.7.0

1a577cf

v1.7.0

What's Changed

refactor: Add missing ARM64 implementation by @howjmay in #576
test: Build/run with crypto and/or crc by @howjmay in #574
doc: Describe the right coverage of SSE2NEON_PRECISE_MINMAX by @howjmay in #578
refactor: Reimplement _mm_movelh_ps for Arm64 by @howjmay in #579
tests: Cover all immediate numbers by @howjmay in #584
test: Use macro for validate results by @howjmay in #585
Improve precision of mm{rsqrt,sqrt,rcp,div}_{ps,ss} conversions by @Cuda-Chen in #580
Fix MSVC compile issues by @toxieainc in #588
Tweak MSVC ifdef guard for _BitScanForward64 by @aqrit in #592
Add notice that NEON handles certain IEEE single-precision values by @Cuda-Chen in #593
Add infinity test in test_mm_{max,min}_{pd,sd} by @Cuda-Chen in #594
Remove Kahan algorithm in _mm_dp_ps by @Cuda-Chen in #597
MSVC support by @anthony-linaro in #596
test: Cover all the valid imm range in tests by @howjmay in #586
Add test running for MSVC to CI by @anthony-linaro in #598
Align result to SSE when input is 0.0f/-0.0f in mm_rsqrt{ps, ss} by @Cuda-Chen in #599
fix: Fix exceeding width of type warning by @howjmay in #601
docs: Fix the typos by @howjmay in #603
docs: Fix the typos by @spacemiqote in #605
Fix build for gcc-13 and 32 bit arm systems. by @balister in #609
Fix unused parameters warning by @anakinxc in #610
Fixed gcc strict prototype and other build errors by @mnjdhl in #611
Fix _mm_cmplt_sd and _mm_cmpnlt_sd test cases by @Cuda-Chen in #612
disambiguate vector type to avoid errors depending on lax conversion … by @JoachimSchurig in #614
docs: fix typo failback by @howjmay in #616
Introduce fast and deterministic RNG by @Cuda-Chen in #615
fix: Fix typo nand by @howjmay in #617
fix: Fix MSVC warnings by @howjmay in #604
Add A32 support in CI by @Cuda-Chen in #620
Fix _mm_test_mix_ones_zeros and _mm_testnzc_si128 by @aqrit in #621

New Contributors

@anthony-linaro made their first contribution in #596
@spacemiqote made their first contribution in #605
@anakinxc made their first contribution in #610
@mnjdhl made their first contribution in #611
@JoachimSchurig made their first contribution in #614

Full Changelog: v1.6.0...v1.7.0

Contributors

balister, aqrit, and 8 other contributors

Assets 2

26 Dec 08:02

jserv

v1.6.0

31cb30b

v1.6.0

What's Changed

100% intrinsics coverage for SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2 and AES extension.
Implement _rdtsc by @Cuda-Chen in #532
Improve _mm_srai_epi32 to handle complex arguments by @Developer-Ecosystem-Engineering in #533
Implement _mm_cmpestri and _mm_cmpestrm by @Cuda-Chen in #534
Implement five _mm_cmpestr by @Cuda-Chen in #552
Implement _mm_cmpistri and _mm_cmpistrm by @Cuda-Chen in #553
Implement five _mm_cmpistr by @Cuda-Chen in #555
tests: Fix warnings raised by clang++ by @Cuda-Chen in #540
Exclude _mm_malloc/free definitions on Windows by @invertego in #541
Remove designated initialization of an array by @invertego in #542
Reintroduce ext-based implementations for shift intrinsics by @AymenQ in #543
Improve performance of float-to-integer intrinsics by @AymenQ in #546
Support __builtin_shuffle as an alternative to __builtin_shufflevector by @AymenQ in #545
Improve performance of various intrinsics by @AymenQ in #549
Vectorize _mm_minpos_epu16 by @AymenQ in #551
Align _mm_prefetch behavior to document by @howjmay in #550
Add clang/Windows build by @invertego in #556
Test all valid immediates in _mm_dp_pd by @Cuda-Chen in #557
Optimize _mm_aesenclast_si128 for Arm64 by @howjmay in #561
Implement _mm_aesdec_si128 by @howjmay in #559
Implement _mm_aesdeclast_si128 by @howjmay in #565
Implement _mm_aesimc_si128 by @howjmay in #567
Optimize aeskeygenassist_si128 for Arm64 by @howjmay in #569
Update Intel intrinsics document links by @howjmay in #570