Releases · Tencent/ncnn

26 Dec 03:27

20241226

5285895

android ios macos linux windows webassembly watchos tvos visionos 预编译库 20241226 5285895 Latest

Latest

编译版本，默认配置，android-ndk-r27c，xcode 15.2，ubuntu-20.04，ubuntu-22.04，ubuntu-24.04，vs2015，vs2017，vs2019，vs2022，emscripten-3.1.28

file	content	arch
ncnn-full-source.zip	包含全部 submodule 代码的完整源码
ncnn-android.zip	android 静态库/动态库	armeabi-v7a + arm64-v8a + x86 + x86_64 + riscv64
ncnn-android-vulkan.zip	android 静态库/动态库，支持 GPU	armeabi-v7a + arm64-v8a + x86 + x86_64 + riscv64
ncnn-apple.zip	apple xcframework，ios + ios-simulator + macos + mac-catalyst + watchos + watchos-simulator + tvos + tvos-simulator + visionos + visionos-simulator	arm64 + arm64e + x86_64
ncnn-apple-vulkan.zip	apple xcframework，ios + ios-simulator + macos + mac-catalyst + watchos + watchos-simulator + tvos + tvos-simulator + visionos + visionos-simulator，支持 GPU	arm64 + arm64e + x86_64
ncnn-ios.zip	ios 静态库	arm64
ncnn-ios-vulkan.zip	ios 静态库，支持 GPU	arm64
ncnn-ios-simulator.zip	ios simulator 静态库	x86_64 + arm64
ncnn-ios-simulator-vulkan.zip	ios simulator 静态库，支持 GPU	x86_64 + arm64
ncnn-macos.zip	macos 静态库	x86_64 + arm64
ncnn-macos-vulkan.zip	macos 静态库，支持 GPU	x86_64 + arm64
ncnn-mac-catalyst.zip	mac catalyst 静态库	x86_64 + arm64
ncnn-mac-catalyst-vulkan.zip	mac catalyst 静态库，支持 GPU	x86_64 + arm64
ncnn-watchos.zip	watchos 静态库	armv7k + arm64_32
ncnn-watchos-simulator.zip	watchos simulator 静态库	x86_64 + arm64
ncnn-tvos.zip	tvos 静态库	x86_64 + arm64
ncnn-tvos-vulkan.zip	tvos 静态库，支持 GPU	x86_64 + arm64
ncnn-tvos-simulator.zip	tvos simulator 静态库	x86_64 + arm64
ncnn-tvos-simulator-vulkan.zip	tvos simulator 静态库，支持 GPU	x86_64 + arm64
ncnn-visionos.zip	visionos 静态库	arm64
ncnn-visionos-vulkan.zip	visionos 静态库，支持 GPU	arm64
ncnn-visionos-simulator.zip	visionos simulator 静态库	x86_64 + arm64
ncnn-visionos-simulator-vulkan.zip	visionos simulator 静态库，支持 GPU	x86_64 + arm64
ncnn-ubuntu.zip	ubuntu linux 静态库/动态库，支持 GPU，模型转换工具	x86_64
ncnn-windows.zip	windows 静态库/动态库，支持 GPU，模型转换工具	x86 + x64 + arm + arm64
ncnn-webassembly.zip	webassembly 静态库	wasm32 + simd + threads + simd-threads

embed 支持int8量化
gemm 支持int8量化
multiheadattention 支持int8量化
新增spectrogram和inverse spectrogram实现
arm rmsnorm neon优化
arm layernorm neon fp32/bf16s/fp16s优化
x86 rmsnorm sse2/avx/avx512优化
x86 layernorm sse2/avx/avx512优化
x86 gemm int8 sse2/xop/avx/avx512/vnni/vnniint8优化
更新riscv vector标准到1.0，重写全部ncnn riscv优化代码，自动探测rvv/zfh/zvfh/xtheadvector并分发
riscv gemm rvv优化支持128bit/256bit vlen
禁用x86倒数优化避免可能的精度损失
改善harmonyos cpu拓扑结构abi兼容性
暂时禁用mesa驱动的vulkan矩阵扩展支持
兼容ndk-21编译asimdfhm目标的错误导致的问题
兼容clang-18编译avx512bf16时编译器崩溃的问题
禁用msvc对windows arm平台exp/tanh的svml优化以解决计算错误
探测avxvnniint8/avxvnniint16/avxneconvert指令集
runtime cpu开启时仅使用ncnn cmake内置的编译参数
删除windows arm32支持(@Shironana817)
android默认启用16kb pagesize编译，android-api升级到21
vkCreateDevice失败时不直接崩溃(@Upliner)
为powerpc架构跳过0.5附近数值的unaryop round测试用例
pnnx更新到torch-2.5
pnnx支持从traced inputs自动设定inputshape
pnnx编译不再输出来自torch头文件的警告
pnnx重排pass level2内的全部顺序，并复用pattern
pnnx不再保存debug中间模型(@LJoson)
pnnx输出python脚本的onnx导出代码更新到export(@whyb)
pnnx合并t5-layernorm为rmsnorm
pnnx不再折叠具有动态shape的tensor
pnnx在输出的python脚本中使用隐含的int转换避免trace时常数化
pnnx转换Tensor.select为ncnn crop+squeeze
pnnx转换onnx constantofshape为torch.zeros/ones
pnnx修正onnx clip在可选min/max缺失时的转换问题
ci更新riscv64工具链
ci添加c908/spacemit-x60
ci webassembly兼容node>20
ci android添加riscv64目标并打包
添加vim3 vulkan跑分数据(@GIBEREZ)

New Contributors

@ankushgoel27 made their first contribution in #5709
@Shironana817 made their first contribution in #5811
@GIBEREZ made their first contribution in #5821

Full Changelog: 2024082...2024122

Contributors

Upliner, whyb, and 4 other contributors

Assets 42

ncnn-20241226-android-shared.zip

13.3 MB 2024-12-26T03:28:00Z
ncnn-20241226-android-vulkan-shared.zip

19.6 MB 2024-12-26T03:28:00Z
ncnn-20241226-android-vulkan.zip

27.6 MB 2024-12-26T03:28:00Z
ncnn-20241226-android.zip

15.5 MB 2024-12-26T03:28:00Z
ncnn-20241226-apple-vulkan.zip

81.5 MB 2024-12-26T03:28:00Z
ncnn-20241226-apple.zip

55.1 MB 2024-12-26T03:28:00Z
ncnn-20241226-full-source.zip

21.3 MB 2024-12-26T03:27:59Z
ncnn-20241226-ios-simulator-vulkan.zip

10.7 MB 2024-12-26T03:27:59Z
ncnn-20241226-ios-simulator.zip

7.01 MB 2024-12-26T03:27:59Z
ncnn-20241226-ios-vulkan.zip

4.29 MB 2024-12-26T03:27:59Z
Source code (zip)

2024-12-26T02:56:43Z
Source code (tar.gz)

2024-12-26T02:56:43Z

20 Aug 08:45

github-actions

20240820

a6d3ef5

android ios macos linux windows webassembly 预编译库 20240820 a6d3ef5

编译版本，默认配置，android-ndk-r27，xcode 15.2，ubuntu-20.04，ubuntu-22.04，ubuntu-24.04，vs2015，vs2017，vs2019，vs2022，emscripten-3.1.28

file	content	arch
ncnn-full-source.zip	包含全部 submodule 代码的完整源码
ncnn-android.zip	android 静态库/动态库	armeabi-v7a + arm64-v8a + x86 + x86_64
ncnn-android-vulkan.zip	android 静态库/动态库，支持 GPU	armeabi-v7a + arm64-v8a + x86 + x86_64
ncnn-apple.zip	apple xcframework，ios + ios-simulator + macos + mac-catalyst + watchos + watchos-simulator + tvos + tvos-simulator + visionos + visionos-simulator	arm64 + arm64e + x86_64
ncnn-apple-vulkan.zip	apple xcframework，ios + ios-simulator + macos + mac-catalyst + watchos + watchos-simulator + tvos + tvos-simulator + visionos + visionos-simulator，支持 GPU	arm64 + arm64e + x86_64
ncnn-ios.zip	ios 静态库	arm64
ncnn-ios-vulkan.zip	ios 静态库，支持 GPU	arm64
ncnn-ios-simulator.zip	ios simulator 静态库	x86_64 + arm64
ncnn-ios-simulator-vulkan.zip	ios simulator 静态库，支持 GPU	x86_64 + arm64
ncnn-macos.zip	macos 静态库	x86_64 + arm64
ncnn-macos-vulkan.zip	macos 静态库，支持 GPU	x86_64 + arm64
ncnn-mac-catalyst.zip	mac catalyst 静态库	x86_64 + arm64
ncnn-mac-catalyst-vulkan.zip	mac catalyst 静态库，支持 GPU	x86_64 + arm64
ncnn-watchos.zip	watchos 静态库	armv7k + arm64_32
ncnn-watchos-simulator.zip	watchos simulator 静态库	x86_64 + arm64
ncnn-tvos.zip	tvos 静态库	x86_64 + arm64
ncnn-tvos-vulkan.zip	tvos 静态库，支持 GPU	x86_64 + arm64
ncnn-tvos-simulator.zip	tvos simulator 静态库	x86_64 + arm64
ncnn-tvos-simulator-vulkan.zip	tvos simulator 静态库，支持 GPU	x86_64 + arm64
ncnn-visionos.zip	visionos 静态库	arm64
ncnn-visionos-vulkan.zip	visionos 静态库，支持 GPU	arm64
ncnn-visionos-simulator.zip	visionos simulator 静态库	x86_64 + arm64
ncnn-visionos-simulator-vulkan.zip	visionos simulator 静态库，支持 GPU	x86_64 + arm64
ncnn-ubuntu.zip	ubuntu linux 静态库/动态库，支持 GPU，模型转换工具	x86_64
ncnn-windows.zip	windows 静态库/动态库，支持 GPU，模型转换工具	x86 + x64 + arm + arm64
ncnn-webassembly.zip	webassembly 静态库	wasm32 + simd + threads + simd-threads

新增RMSNorm层和对应的pnnx转换，单元测试
x86 convolution tiled gemm优化
量化工具支持 rnn/lstm/gru 动态量化
x86 lstm int8 sse2/xop/avx2/avx512/avx512vnni/avxvnni优化
arm rnn/lstm/gru int8 neon/asimdhp/asimddp优化
multiheadattention支持qdim参数与embed_dim不同
multiheadattention支持scale参数
更新pybind11到2.12支持numpy2
添加wasi支持(@quink-black)
添加x86/arm convolution/slice/concat oom单元测试
onnx2ncnn工具添加警告和推荐使用pnnx的信息输出(@lll143653)
修复x86 avx512 vnni指令派发失效的问题
增强x86/arm计算内核在内存不足时的错误返回
仅在windows arm平台使用ruapu指令集探测
windows mingw编译时支持大小核和SMT探测
修复powerpc vsx计算abs可能的错误
修复arm vfpv4条件下可能的fp16s/bf16s同时启用的冲突
修复aarch64架构l2-cache很小时因gemm K分块可能的越界读错误
修复riscv v tanh计算错误(@zhangyang2057)
arm/convolution_3x3_pack1to8_fp16s使用ldr/str替代ld1/st1优化(@quink-black)
修复c_api无参数函数声明(@quink-black)
c_api添加set_vulkan_device接口(@Baiyuetribe)
pyncnn添加从python bytes内存加载模型的接口(@joeyballentine)
为VkAndroidHardwareBufferImageAllocator添加NCNN_PLATFORM_API宏(@Xyzhao1999)
修复mingw64编译时avx崩溃和termux编译错误(@TianZerL)
修复在关闭NCNN_BF16时arm riscv编译错误
修复x86-wsl编译时的无用变量警告(@Tabbleman)
create_gpu_instance()中不进行destroy_gpu_instance()(@Asd-g)
更新ruapu.h(@lazyparser)
修复ndk-r27在cmake阶段的编译错误(@Galasnow)
添加yolov8示例代码(@whyb)
pnnx支持转换dynamo导出的onnx
pnnx默认编译onnx2pnnx支持，支持转换conv/convtranspose/pad/linear/softmax/relu/resize/upsample/avgpool/maxpool/batchnorm/lrn/layernorm/instancenorm/groupnorm/rnn/lstm/gru/prelu/gelu/elu/leakyrelu/relu6/celu/hardshrink/hardsigmoid/hardswish/clip/multiheadattention/reducemin/reducemax/reducemean/reducesum/reduceprod/logsoftmax/logsigmoid/mish/selu/sigmoid/silu/softmin/softplus/softshrink/softsign/tanh/tanhshrink/expand/permute/repeat/reshape/select/slice/cat/ceil/chunk/flatten/floor/maximum/minimum/split/squeeze/stack/transpose/unbind/unsqueeze
pnnx支持转换onnx指定inputshape
pnnx转换onnx遇到动态shape时尝试折叠非动态轴相关的常量
pnnx转换onnx合并简单的shape运算pattern
pnnx清除onnx中无用的cast
pnnx接受bf16的模型转换和输入输出类型
pnnx转换torch.tile/torch.where/torch.logaddexp
pnnx转换无dilation参数的F.maxpool到ncnn
pnnx转换1到2个轴参数的torch.roll到ncnn
pnnx转换有dim参数的torch.max/torch.min时返回tuple并自动删除没有用到的indice输出
pnnx合并onnx sdpa和qdim mha
pnnx识别sdpa的batch轴
pnnx支持torch-2.3和torch-2.4
pnnx不再折叠有就地操作的别名tensor为常量
pnnx转换到的ncnn模型py自动替换long为int
ci添加windows clang
ci添加harmonyos
ci添加mingw(@TianZerL)
ci添加esp32和esp32编译文档(@luxincn)
重构release ci脚本
发布ubuntu 24.04预编译包
发布visionos/visionos-simulator vulkan预编译包
pypi发布python 3.13预编译包
更新pytorch/onnx模型转换文档(@whyb)
添加riscv-gnu-toolchain编译文档(@Tabbleman)
添加harmonyos vulkan编译文档(@cugxchen)
修正vulkan-notes文档的错误(@roachsinai)
更新qcom855plus跑分数据
添加RaspberryPi 5 GPU超频跑分数据(@CharlieYu4994)
添加EPYC7742和V100跑分数据(@sakria9)
添加Snapdragon 888跑分数据(@chainsx)
添加RaspberryPi 5 CPU超频跑分数据(@chainsx)
添加OrangePi 5Plus跑分数据(@inspireMeNow)
添加Snapdragon 765G跑分数据(@inspireMeNow)
添加CVITEK SG2000跑分数据(@inspireMeNow)
添加OrangePi CM4跑分数据(@py1066)
添加Axera AX630C跑分数据(@UOPiceman)
添加Kunpeng 920 7260跑分数据(@violet73)

New Contributors

@quink-black made their first contribution in #5436
@Tabbleman made their first contribution in #5444
@roachsinai made their first contribution in #5472
@Asd-g made their first contribution in #5437
@lazyparser made their first contribution in #5499
@CharlieYu4994 made their first contribution in #5518
@Xyzhao1999 made their first contribution in #5521
@sakria9 made their first contribution in #5528
@inspireMeNow made their first contribution in #5550
@py1066 made their first contribution in #5551
@UOPiceman made their first contribution in #5559
@luxincn made their first contribution in #5567
@zhangyang2057 made their first contribution in #5584
@violet73 made their first contribution in #5606

Full Changelog: 2024041...2024082

Contributors

lazyparser, whyb, and 20 other contributors

Assets 42

10 Apr 11:16

github-actions

20240410

56775de

android ios macos linux windows webassembly 预编译库 20240410 56775de

编译版本，默认配置，android-ndk-r26c，xcode 15.2，ubuntu-20.04，ubuntu-22.04，vs2015，vs2017，vs2019，vs2022，emscripten-3.1.28

file	content	arch
ncnn-full-source.zip	包含全部 submodule 代码的完整源码
ncnn-android.zip	android 静态库/动态库	armeabi-v7a + arm64-v8a + x86 + x86_64
ncnn-android-vulkan.zip	android 静态库/动态库，支持 GPU	armeabi-v7a + arm64-v8a + x86 + x86_64
ncnn-apple.zip	apple xcframework，ios + ios-simulator + macos + mac-catalyst + watchos + watchos-simulator + tvos + tvos-simulator + visionos + visionos-simulator	arm64 + arm64e + x86_64
ncnn-apple-vulkan.zip	apple xcframework，ios + ios-simulator + macos + mac-catalyst + watchos + watchos-simulator + tvos + tvos-simulator + visionos + visionos-simulator，支持 GPU	arm64 + arm64e + x86_64
ncnn-ios.zip	ios 静态库	arm64
ncnn-ios-vulkan.zip	ios 静态库，支持 GPU	arm64
ncnn-ios-simulator.zip	ios simulator 静态库	x86_64 + arm64
ncnn-ios-simulator-vulkan.zip	ios simulator 静态库，支持 GPU	x86_64 + arm64
ncnn-macos.zip	macos 静态库	x86_64 + arm64
ncnn-macos-vulkan.zip	macos 静态库，支持 GPU	x86_64 + arm64
ncnn-mac-catalyst.zip	mac catalyst 静态库	x86_64 + arm64
ncnn-mac-catalyst-vulkan.zip	mac catalyst 静态库，支持 GPU	x86_64 + arm64
ncnn-watchos.zip	watchos 静态库	armv7k + arm64_32
ncnn-watchos-simulator.zip	watchos simulator 静态库	x86_64 + arm64
ncnn-tvos.zip	tvos 静态库	x86_64 + arm64
ncnn-tvos-vulkan.zip	tvos 静态库，支持 GPU	x86_64 + arm64
ncnn-tvos-simulator.zip	tvos simulator 静态库	x86_64 + arm64
ncnn-tvos-simulator-vulkan.zip	tvos simulator 静态库，支持 GPU	x86_64 + arm64
ncnn-visionos.zip	visionos 静态库	arm64
ncnn-visionos-simulator.zip	visionos simulator 静态库	x86_64 + arm64
ncnn-ubuntu.zip	ubuntu linux 静态库/动态库，支持 GPU，模型转换工具	x86_64
ncnn-windows.zip	windows 静态库/动态库，支持 GPU，模型转换工具	x86 + x64 + arm + arm64
ncnn-webassembly.zip	webassembly 静态库	wasm32 + simd + threads + simd-threads

解耦合layer cpu和vulkan，不再使用virtual public继承
支持编译动态库时编译单元测试
单层特性掩码支持禁用多线程
extractor set_num_threads和set_vulkan_compute现在是无操作
gpu shader增加uniform类型改善adreno上fp16兼容性
检测vulkan矩阵扩展8x8x16配置，fp16a条件下默认使用fp16累加
更新stb_image rvv/neon优化
x86 mish avx512优化(@wnqn1597)
riscv gemm fp32 rvv优化(@Xinyu302)
加载模型上传权重时不保留无用的临时数据
c-api新增draw rectangle/text/circle/line接口(@Deepdive543443)
修复armv7平台加载fp16模型sigbus错误
修复reduction L2norm denormal产生inf的问题
修复arm平台pixel_resize rounding导致的数值误差
修复softmax arm fp16计算错误
修复risc-v rvv输出fp16没有自动转换的问题
修复destroy_gpu_instance在驱动加载不完整时crash的问题(@shatyuka)
destroy_gpu_instance等待全部设备idle(@whyb)
修复low-level api没有load_param直接create_pipeline可能的崩溃
修复ncnnoptimize在shape推断的崩溃
ncnnoptimize支持更多新算子，修复gemm权重丢失问题
被调试时候禁用signal指令集检测
windows-arm平台使用ruapu cpu指令集检测
arm vfpv4支持时启用自动转换fp16
在arm64架构中总是报告支持neon和vfpv4
simplevk寻找更多已知的vulkan驱动路径
修复旧cpp标准下risc-v rvv编译错误
修复某些老编译器在debug模式下编译错误
修复uwp平台编译
修复test_reduction运行时的警告
修复NCNN_PIXEL_DRAWING禁用时候编译错误(@shatyuka)
支持MSVC使用LLVM openmp运行时的配合编译(@shatyuka)
修复yolov8 python示例返回空发生错误(@dsplvd)
pnnx解耦torchscript加载，清理cxxabi hack，修复whole-archive链接
pnnx加载dynamo onnx，默认不启用编译
pnnx改善函数化，支持更多slice+inplace复合操作
pnnx转换torch.masked_select/torch.slice_scatter
pnnx支持超过4G的模型
pnnx macos编译universal wheel
pnnx添加entrypoint脚本
pnnx支持动态slice下标
pnnx转换softmin logsoftmax dtype参数
pnnx处理index_put传入空indices和标量数值
pnnx转换一些cudnn conv2d变种
pnnx合并完整slices为tensor_split
pnnx合并静态embedding
pnnx不消除会导致shape变化的数学操作
pnnx改善torch-2.1 mha attn_mask探测
pnnx修复无bias tensor的nn.Conv2d转换
pnnx转换torch.stack负数dim
pnnx添加torch.arange单元测试
pnnx修复图匹配失败时可能的越界访问问题
pnnx识别embedding输入的batch轴为0
pnnx python添加控制fp16参数(@MollySophia)
pnnx添加torch-2.2 ci
github ci使用4并行编译
更新cmake ios工具链，添加visionos ci，watchos支持arm64_32架构
添加apple a17和m3 cpu名称
不再编译apple平台32bit支持，不再编译ios arm64e架构，提升最低部署版本到ios-13
统一android python macos ci
不再打包和发布apple bitcode和32bit预编译包，新增visionos预编译包，新增tvos-gpu预编译包，更新openmp到18.1.2
改善a53/a55双发射文档(@luqiang-guo)
添加windows上protobuf>=22.0编译文档(@Galasnow)
更新macos编译文档(@lll143653)
清理无用的代码警告(@hokamilkv)
修正FAQ的拼写错误(@eltociear)
修正拼写错误(@hugo-syn)
修正拼写错误(@afredooo)
修正convolution_x86注释错误(@strongtz)
添加markdown文档代码辅助标志(@hugo-syn)
添加OneCloud跑分数据(@mizu-bai)
添加AWS c5.4xlarge跑分数据(@mizu-bai)
添加Xeon Phi 3120A跑分数据(@mizu-bai)
添加orangepi zero2跑分数据(@wonderfullook)
添加Dimensity 9300 MT6989跑分数据(@MollySophia)
添加PhytiumPi跑分数据(@HalfSweet)
添加remipi跑分数据(@dreamcmi)
添加radxa zero 3w跑分数据(@Qengineering)

New Contributors

@wonderfullook made their first contribution in #5277
@hugo-syn made their first contribution in #5301
@FartSimps0n made their first contribution in #5304
@HalfSweet made their first contribution in #5312
@strongtz made their first contribution in #5310
@afredooo made their first contribution in #5339
@shatyuka made their first contribution in #5346
@dsplvd made their first contribution in #5345
@Galasnow made their first contribution in #5359
@hokamilkv made their first contribution in #5365

Full Changelog: 2024010...2024041

Contributors

whyb, dsplvd, and 19 other contributors

Assets 38

02 Jan 04:06

github-actions

20240102

1e88fb8

android ios macos linux windows webassembly 预编译库 20240102 1e88fb8

编译版本，默认配置，android-ndk-r26b，xcode 13.4.1，ubuntu-20.04，ubuntu-22.04，vs2015，vs2017，vs2019，vs2022，emscripten-3.1.28

file	content	arch
ncnn-full-source.zip	包含全部 submodule 代码的完整源码
ncnn-android.zip	android 静态库/动态库	armeabi-v7a + arm64-v8a + x86 + x86_64
ncnn-android-vulkan.zip	android 静态库/动态库，支持 GPU	armeabi-v7a + arm64-v8a + x86 + x86_64
ncnn-apple.zip	apple xcframework，ios + ios-simulator + macos + mac-catalyst，with and w/o bitcode	armv7 + arm64 + arm64e + i386 + x86_64
ncnn-apple-vulkan.zip	apple xcframework，ios + ios-simulator + macos + mac-catalyst，支持 GPU，with and w/o bitcode	arm64 + arm64e + x86_64
ncnn-ios.zip	ios 静态库，with and w/o bitcode	armv7 + arm64 + arm64e
ncnn-ios-vulkan.zip	ios 静态库，支持 GPU，with and w/o bitcode	arm64 + arm64e
ncnn-ios-simulator.zip	ios simulator 静态库，with and w/o bitcode	i386 + x86_64 + arm64
ncnn-ios-simulator-vulkan.zip	ios simulator 静态库，支持 GPU，with and w/o bitcode	x86_64 + arm64
ncnn-macos.zip	macos 静态库	x86_64 + arm64
ncnn-macos-vulkan.zip	macos 静态库，支持 GPU	x86_64 + arm64
ncnn-mac-catalyst.zip	mac catalyst 静态库，with and w/o bitcode	x86_64 + arm64
ncnn-mac-catalyst-vulkan.zip	mac catalyst 静态库，支持 GPU，with and w/o bitcode	x86_64 + arm64
ncnn-ubuntu.zip	ubuntu linux 静态库/动态库，支持 GPU，模型转换工具	x86_64
ncnn-windows.zip	windows 静态库/动态库，支持 GPU，模型转换工具	x86 + x64 + arm + arm64
ncnn-webassembly.zip	webassembly 静态库	wasm32 + simd + threads + simd-threads

内建vulkan驱动加载功能，不依赖vulkan-sdk编译gpu功能，可直接加载显卡驱动文件
msvc编译启用arm neon指令加速，启用arm64 asimdhp编译
实现python pnnx pypi包和python调用接口/文档(@Hideousmon)
arm convolution int8 直接卷积重构支持任意elempack
优化 vulkan global pooling性能
优化resize bilinear性能
压缩字体数据减小二进制体积
deconvolution支持动态权重和对应pnnx转换
新增跑分数据rank card(@Qengineering)
支持big-endian架构平台，powerpc32位
添加woa linux ci
添加msvc禁用exceptions/rtti的编译开关
在macos上使用信号探测avx512指令集支持情况
支持寻找32位显卡驱动文件(@whyb)
启用benchmark编译打印4维shape(@Deepdive543443)
修复riscv-int8 sigmoid激活的测试失败问题(@MollySophia)
修复deconvolution x86 bias非对齐访问的问题
修复prelu x86 sse指令非对齐访问的问题(@aioa)
修复windows上openmp设置线程数为0的警告
修复在支持16bit/8bit的gpu上有关fp16sa shader使用fp16 shared变量的警告
修复nvidia vulkan驱动在程序退出的crash
修复vkimagemat from_android_hardware_buffer缺失的elemsize参数错误
修复simpleocv Mat模板ptr的偏移错误
添加更过的gpu相关python绑定接口(@joeyballentine)
android vulkan包的api版本降低到14/21
pnnx支持转换recompute_scale_factor=True的nn.Upsample
新增nn.Identity测试
修复pnnx路径切分的问题
修复pnnx生成ncnn py空格对齐(@cmdbug)
pnnx生成的py可以直接执行推理
python pnnx返回优化后的torch模型
删除无用的代码(@ningjiang233)
改善cmake toolchain文件(@zchrissirhcz)
新增watchos和tvos ci
修复linux sde ci的运行错误
更新POWER clang版本信息的文档(@JeremyRand)
更新有关vulkan/libomp-dev依赖的文档(@JeremyRand)
更新有关编译python模块CMAKE_TOOLCHAIN_FILE环境变量的文档(@JeremyRand)
修复Rasberry拼写错误(@JeremyRand)
FAQ新增有关pyncnn数据连续性的文档(@lll143653)
更新readme下载页表格
添加Nintendo 3DS编译信息(@Deepdive543443)
添加oncloud amlogic s805跑分数据(@mizu-bai)
添加树莓派5 gpu跑分数据(@FantasyGmm)
添加Jetson TX2跑分数据(@FantasyGmm)
添加8gen2跑分数据(@mahirumahiru)
添加2K2000跑分数据(@RevySR)
更新Jetson Orin Nano/树莓派5跑分数据(@Qengineering)
添加visionfive2跑分数据(@wzyforgit)

New Contributors

@Deepdive543443 made their first contribution in #5116
@ningjiang233 made their first contribution in #5139
@FantasyGmm made their first contribution in #5152
@mahirumahiru made their first contribution in #5180
@Qengineering made their first contribution in #5216
@lll143653 made their first contribution in #5220
@joeyballentine made their first contribution in #5165

Full Changelog: 2023102...2024010

Contributors

JeremyRand, whyb, and 15 other contributors

Assets 42

27 Oct 06:17

github-actions

20231027

3116e02

android ios macos linux windows webassembly 预编译库 20231027 3116e02

编译版本，默认配置，android-ndk-r25c，xcode 13.4.1，ubuntu-20.04，ubuntu-22.04，vs2015，vs2017，vs2019，vs2022，emscripten-3.1.28

file	content	arch
ncnn-full-source.zip	包含全部 submodule 代码的完整源码
ncnn-android.zip	android 静态库/动态库	armeabi-v7a + arm64-v8a + x86 + x86_64
ncnn-android-vulkan.zip	android 静态库/动态库，支持 GPU	armeabi-v7a + arm64-v8a + x86 + x86_64
ncnn-apple.zip	apple xcframework，ios + ios-simulator + macos + mac-catalyst，with and w/o bitcode	armv7 + arm64 + arm64e + i386 + x86_64
ncnn-apple-vulkan.zip	apple xcframework，ios + ios-simulator + macos + mac-catalyst，支持 GPU，with and w/o bitcode	arm64 + arm64e + x86_64
ncnn-ios.zip	ios 静态库，with and w/o bitcode	armv7 + arm64 + arm64e
ncnn-ios-vulkan.zip	ios 静态库，支持 GPU，with and w/o bitcode	arm64 + arm64e
ncnn-ios-simulator.zip	ios simulator 静态库，with and w/o bitcode	i386 + x86_64 + arm64
ncnn-ios-simulator-vulkan.zip	ios simulator 静态库，支持 GPU，with and w/o bitcode	x86_64 + arm64
ncnn-macos.zip	macos 静态库	x86_64 + arm64
ncnn-macos-vulkan.zip	macos 静态库，支持 GPU	x86_64 + arm64
ncnn-mac-catalyst.zip	mac catalyst 静态库，with and w/o bitcode	x86_64 + arm64
ncnn-mac-catalyst-vulkan.zip	mac catalyst 静态库，支持 GPU，with and w/o bitcode	x86_64 + arm64
ncnn-ubuntu.zip	ubuntu linux 静态库/动态库，支持 GPU，模型转换工具	x86_64
ncnn-windows.zip	windows 静态库/动态库，支持 GPU，模型转换工具	x86 + x64 + arm + arm64
ncnn-webassembly.zip	webassembly 静态库	wasm32 + simd + threads + simd-threads

x86 convolution int8 gemm重构支持任意elempack
x86 convolution int8 winograd重构支持任意elempack
arm convolution int8 gemm重构支持任意elempack
arm convolution int8 winograd重构支持任意elempack
gelu vulkan优化(@FhqTreap)
convolution1d vulkan优化(@FhqTreap)
gridsample x86优化(@Yoh-Z)
riscv gemm fp32优化(@Xinyu302)
新增erf/shrink和onnx转换(@brightening-eyes)
新增diag和pnnx转换(@wnqn1597)
新增celu和pnnx转换(@wnqn1597)
新增simplemath，允许不依赖libm编译使用数学函数(@HonestDeng)
pooling adaptive支持动态的输出尺寸和pnnx转换
elu selu支持4维输入输出
slice支持indices参数
memorydata支持tag参数和fp16存储
x86 selu shufflechannel优化(@wnqn1597)
修复convolution vulkan在固定shape时的结果错误
修复权重tag潜在的溢出(@lrw04)
按层加载模型减少内存占用(@daquexian)
修复老版本gcc编译avx2 gather的错误(@chainsx)
修复老版本gcc编译_mm256_set_m128的错误(@whyb)
修复新版本protobuf编译问题
修复老版本glibc round编译问题
修复c906工具链编译错误
pyncnn启用vulkan支持(@Hideousmon)
pyncnn添加load_param_mem接口(@JeremyRand @theflyingzamboni)
pnnx支持torch-2.1
pnnx消除moduleop的输出unpack
pnnx moduleop将权重shape作为参数写入param，内部权重顺序为使用顺序
pnnx改善reflect replicated pad匹配
pnnx合并conv3d-bn和deconv3d-bn
pnnx转换torch.narrow(@zyt1024)
pnnx转换torch.lgamma(@shudorcl)
pnnx转换torch.positive(@nicochen1118)
pnnx转换torch.cumprod(@Jiang-Weibo)
pnnx转换torch.mv/nn.ReplicationPad3d(@ShuRaymond)
pnnx转换F.pairwise_distance(@Marsyule)
pnnx转换torch.view_as_real/torch.view_as_complex(@Baiyuetribe)
修复pnnx与新版本protobuf编译问题(@HuPengsheet)
修复pnnx改变目录下划线的错误
onnx2ncnn支持celu转换(@brightening-eyes)
自动为pull request添加label
修复ohos工具链编译错误
改进codeformat脚本使用函数(@xiezheng-XD)
添加rk3566 rk3588s跑分数据(@chainsx)
添加Allwinner T527跑分数据(@YuzukiTsuru)
添加树莓派5b跑分数据(@Pillar1989)
添加RTX A3000跑分数据(@chainsx)
添加多款pc跑分数据(@whyb)

New Contributors

@chainsx made their first contribution in #4959
@xiezheng-XD made their first contribution in #4993
@zyt1024 made their first contribution in #4918
@FhqTreap made their first contribution in #5001
@shudorcl made their first contribution in #4976
@nicochen1118 made their first contribution in #4999
@Jiang-Weibo made their first contribution in #5002
@brightening-eyes made their first contribution in #5012
@wnqn1597 made their first contribution in #4935
@ShuRaymond made their first contribution in #4974
@HuPengsheet made their first contribution in #5034
@Marsyule made their first contribution in #4942
@Pillar1989 made their first contribution in #5058
@Xinyu302 made their first contribution in #4903
@Hideousmon made their first contribution in #5020
@HonestDeng made their first contribution in #4905

Full Changelog: 2023081...2023102

Contributors

JeremyRand, whyb, and 22 other contributors

Assets 38

16 Aug 05:54

github-actions

20230816

39721ee

android ios macos linux windows webassembly 预编译库 20230816 39721ee

编译版本，默认配置，android-ndk-r25c，xcode 13.4.1，ubuntu-20.04，ubuntu-22.04，vs2015，vs2017，vs2019，vs2022，emscripten-3.1.28

file	content	arch
ncnn-full-source.zip	包含全部 submodule 代码的完整源码
ncnn-android.zip	android 静态库/动态库	armeabi-v7a + arm64-v8a + x86 + x86_64
ncnn-android-vulkan.zip	android 静态库/动态库，支持 GPU	armeabi-v7a + arm64-v8a + x86 + x86_64
ncnn-apple.zip	apple xcframework，ios + ios-simulator + macos + mac-catalyst，with and w/o bitcode	armv7 + arm64 + arm64e + i386 + x86_64
ncnn-apple-vulkan.zip	apple xcframework，ios + ios-simulator + macos + mac-catalyst，支持 GPU，with and w/o bitcode	arm64 + arm64e + x86_64
ncnn-ios.zip	ios 静态库，with and w/o bitcode	armv7 + arm64 + arm64e
ncnn-ios-vulkan.zip	ios 静态库，支持 GPU，with and w/o bitcode	arm64 + arm64e
ncnn-ios-simulator.zip	ios simulator 静态库，with and w/o bitcode	i386 + x86_64 + arm64
ncnn-ios-simulator-vulkan.zip	ios simulator 静态库，支持 GPU，with and w/o bitcode	x86_64 + arm64
ncnn-macos.zip	macos 静态库	x86_64 + arm64
ncnn-macos-vulkan.zip	macos 静态库，支持 GPU	x86_64 + arm64
ncnn-mac-catalyst.zip	mac catalyst 静态库，with and w/o bitcode	x86_64 + arm64
ncnn-mac-catalyst-vulkan.zip	mac catalyst 静态库，支持 GPU，with and w/o bitcode	x86_64 + arm64
ncnn-ubuntu.zip	ubuntu linux 静态库/动态库，支持 GPU，模型转换工具	x86_64
ncnn-windows.zip	windows 静态库/动态库，支持 GPU，模型转换工具	x86 + x64 + arm + arm64
ncnn-webassembly.zip	webassembly 静态库	wasm32 + simd + threads + simd-threads

实现全部的binaryop explicit广播规则类型
x86直接卷积权重变换的avx2/avx512优化
x86 int8直接卷积支持任意elempack和sse2/xop/avx2/avx512/vnni优化
ppc64 power8/power9 vsx工具链支持，编译器检查和intrinsic翻译优化(@JeremyRand)
更新glslang并启用VK_KHR_cooperative_matrix扩展和优化
修复pyncnn自定义layer模型权重加载
c_api新增Mat border/layer_to_index api(@Mek101)
VkCompute::submit_and_wait现在能返回错误值(@Upliner)
修复老版本clang编译时too many microtasks问题
修复clang-cl cpuid函数兼容性(@charlescao460)
修复新版本protobuf c++17编译问题
修复老版本编辑器sleep递归调用错误(@whyb)
编译时检查loongarch lasx扩展支持并自动启用
清理multiheadattention arm优化代码
binaryop支持一维outer axis广播规则，保持旧的兼容行为
benchncnn支持从命令行参数中指定自定义模型和输入(@tpoisonooo)
macos平台静态编译链接需要的系统库(@Baiyuetribe)
更改amd集显上的显存分配策略为仅设备优先，修复在bios设置大显存时分配失败问题
onnx2ncnn遇到不支持transpose类型输出错误信息(@huoshuai-dot)
pnnx支持多算子到多算子的图变换
pnnx新增转换torch.round/trunc/fill/index_put/to/type_as/topk/fmod/cross/t/maximum/minimum
pnnx合并chinese-clip/sam-iamge-encoder attention结构
pnnx合并F.scaled_dot_product_attention
pnnx消除无用的expand/expand_as/type_as
pnnx修正fp16模型在优化时的权重变换错误
pnnx修正负数shape索引越界问题(@Justin62628)
pnnx修复转换后py文件执行时权限错误(@zhenjiaguo)
pnnx转换ncnn global pooling后自动添加reshape
pnnx转换非zero padding模式的卷积到ncnn
pnnx转换2维nn.Linear为ncnn gemm
pnnx转换torch.stack为ncnn concat+reshape
pnnx转换torch.t到ncnn permute(@XiaBing992)
pnnx转换logsigmoid/log_softmax为ncnn sigmoid/softmax+log(@lrw04)
pnnx修复slice_copy输出的类型信息
pnnx修复表达式中int64转换溢出问题
pnnx修复reshape表达式消除后的ghost结点
pnnx合并表达式时折叠shape为1的类似标量的权重
pnnx合并表达式支持max/min
pnnx改善图中有inplace操作时的输出结点连接探测，带来更多的常量折叠
添加ncnn glsl扩展文档以及中文版(@whyb)
修正faq文档错误(@KYShek)
改进cmake寻找vulkan提示用语(@zchrissirhcz)
更新vs2017编译步骤的细节(@brcarry)
新增Intel oneAPI编译步骤(@mizu-bai)
更新loongarch ci工具链，添加loongarch lsx覆盖率
更新python ci版本，新增python-3.12包
更新rpi3b+/rpi4b测试数据
更新huawei kunpeng 920测试数据(@MobtgZhang)
新增3A6000和TH1520 gpu测试数据(@Rabenda)
新增RDK X3 Module测试数据(@LJoson)
修复ios模拟器gpu badge(@732857315)

New Contributors

@charlescao460 made their first contribution in #4738
@Justin62628 made their first contribution in #4765
@zhenjiaguo made their first contribution in #4801
@732857315 made their first contribution in #4836
@KYShek made their first contribution in #4837
@JeremyRand made their first contribution in #4807
@Upliner made their first contribution in #4828
@Mek101 made their first contribution in #4855
@brcarry made their first contribution in #4872
@Rabenda made their first contribution in #4894
@Baiyuetribe made their first contribution in #4859
@XiaBing992 made their first contribution in #4940
@lrw04 made their first contribution in #4925

Full Changelog: 2023051...2023081

Contributors

Upliner, JeremyRand, and 18 other contributors

Assets 38

17 May 08:48

github-actions

20230517

903ec7c

android ios macos linux windows webassembly 预编译库 20230517 903ec7c

编译版本，默认配置，android-ndk-r25c，xcode 13.4.1，ubuntu-20.04，ubuntu-22.04，vs2015，vs2017，vs2019，vs2022，emscripten-3.1.28

file	content	arch
ncnn-full-source.zip	包含全部 submodule 代码的完整源码
ncnn-android.zip	android 静态库/动态库	armeabi-v7a + arm64-v8a + x86 + x86_64
ncnn-android-vulkan.zip	android 静态库/动态库，支持 GPU	armeabi-v7a + arm64-v8a + x86 + x86_64
ncnn-apple.zip	apple xcframework，ios + ios-simulator + macos + mac-catalyst，with and w/o bitcode	armv7 + arm64 + arm64e + i386 + x86_64
ncnn-apple-vulkan.zip	apple xcframework，ios + ios-simulator + macos + mac-catalyst，支持 GPU，with and w/o bitcode	arm64 + arm64e + x86_64
ncnn-ios.zip	ios 静态库，with and w/o bitcode	armv7 + arm64 + arm64e
ncnn-ios-vulkan.zip	ios 静态库，支持 GPU，with and w/o bitcode	arm64 + arm64e
ncnn-ios-simulator.zip	ios simulator 静态库，with and w/o bitcode	i386 + x86_64 + arm64
ncnn-ios-simulator-vulkan.zip	ios simulator 静态库，支持 GPU，with and w/o bitcode	x86_64 + arm64
ncnn-macos.zip	macos 静态库	x86_64 + arm64
ncnn-macos-vulkan.zip	macos 静态库，支持 GPU	x86_64 + arm64
ncnn-mac-catalyst.zip	mac catalyst 静态库，with and w/o bitcode	x86_64 + arm64
ncnn-mac-catalyst-vulkan.zip	mac catalyst 静态库，支持 GPU，with and w/o bitcode	x86_64 + arm64
ncnn-ubuntu.zip	ubuntu linux 静态库/动态库，支持 GPU，模型转换工具	x86_64
ncnn-windows.zip	windows 静态库/动态库，支持 GPU，模型转换工具	x86 + x64 + arm + arm64
ncnn-webassembly.zip	webassembly 静态库	wasm32 + simd + threads + simd-threads

arm convolution winograd重构支持任意elempack
arm convolution sgemm重构支持任意elempack
arm convolution直接卷积重构支持任意elempack
arm deconvolution/matmul 调用 gemm 完成计算
arm softmax支持任意elempack和bf16/fp16优化
arm multiheadattention fp16sa softmax优化
arm/x86 convolution1d直接卷积重构支持任意elempack和优化
粗糙的vulkan gemm和multiheadattention优化
multiheadattention支持输入attention mask
sigmoid/swish/clip/gelu/mish/tanh支持4d输入
减少double类型的使用(@zhiliu6)
arm a53/a55架构检测和流水线优化
允许注册自定义层替代内置实现
x86 asin/acos/atan/atan2 sse2/avx/avx512优化(@MouriNaruto)
sse_mathfunc迁移floor/ceil(@Yoh-Z)
x86 mathfun迁移abs(@Yoh-Z)
simpleocv新增cv::imdecode内存加载图片(@AlOa)
新增配合vulkan vma使用的三种扩展支持(@whyb)
新增获取vkinstance的接口(@whyb)
新增通用的sleep接口(@whyb)
innerproduct允许2维高度1的输入输出
修复multiheadattention分配内存存在的多线程竞争问题
修复在获取不到cache信息时的除0错误
修复scale avx512计算错误
修复exynos9810非法指令错误
老旧adreno驱动中禁用fp16a以解决计算错误
绕过n卡padding shader编译错误
移除platform.h中无用的aarch64判断(@dreamcmi)
修正modelwriter squeeze层参数id错误(@irexyc)
修复gcc-13编译错误(@hillwoodroc)
修复gcc-5.2 aarch64编译错误
修复aosp编译错误(@caofx0418)
修复n卡上benchmark退出时的crash(@triple-Mu)
修复获取cpu cache信息潜在的fd泄漏
优化lightmode循环条件(@MambaWong)
绕过新版moltenvk的兼容性问题
绕过n卡在multiheadattention softmax结果偶发nan的兼容性问题
调用cpu.h接口时强制初始化全局cpu信息
pnnx支持torch-2.0
pnnx支持complex数据类型
pnnx转换torch.baddbmm/torch.mm/torch.stft家族/torch.std/F.scaled_dot_product_attention
pnnx支持fp16权重的torchscript
pnnx支持非forward的其他函数入口
pnnx当只有一个动态维时候折叠reshape的shape表达式
pnnx识别常数常量和表达式中的折叠
pnnx自动删除maxpool无indices输出项
pnnx总是删除convtransposed output_size参数
pnnx合并gelu表达式
pnnx合并vit/clip/diffusers attention
修正pnnx的RNN/GRU省略输出项的python代码生成
修正pnnx转换ir时潜在的负INT_MAX下溢问题
修正pnnx fprintf类型不匹配(@kernelbin)
修复pnnx windows编译错误(@Yoh-Z)
pyncnn model zoo添加yolov7-tiny(@kennybradley)
pyncnn model zoo添加yolov8s(@triple-Mu)
macos pypi包使用完整版本号
改善wasm ci编译效率
更新ci swiftshader版本
更新cmake ios toolchain，新增ios-simulator arm64和mac catalyst ci
添加qnx toolchain和编译步骤(@zchrissirhcz)
删除ubuntu-18.04的ci
更新3A5000 benchmark数据(@wzyforgit)
新增2K1000LA benchmark数据(@lrzlin)
新增icpc icc benchmark数据(@mizu-bai)
新增Hyper-V Linux Guest benchmark数据(@MouriNaruto)
新增和更新op4lts/op5/VF2/FT2000/3A4000 benchmark数据(@MobtgZhang)
更新centos编译文档(@inisis)
更新windows msvc编译文档(@kernelbin)
faq新增关于cmake版本升级的内容(@inisis)
faq新增关于显卡节能模式的内容(@whyb)
修正citation和benchmark文档中的拼写错误(@zchrissirhcz)
修正pnnx代码和readme中的拼写错误(@jsyzdej @zchrissirhcz)

New Contributors

@whyb made their first contribution in #4614
@AlOa made their first contribution in #4557
@caofx0418 made their first contribution in #4622
@dreamcmi made their first contribution in #4628
@kernelbin made their first contribution in #4647
@lrzlin made their first contribution in #4656
@irexyc made their first contribution in #4658
@jsyzdej made their first contribution in #4665
@hillwoodroc made their first contribution in #4684
@MobtgZhang made their first contribution in #4687
@kennybradley made their first contribution in #4693

Full Changelog: 2023022...2023051

Contributors

zhiliu6, hillwoodroc, and 18 other contributors

Assets 38

23 Feb 04:56

github-actions

20230223

ff80ac2

android ios macos linux windows webassembly 预编译库 20230223 ff80ac2

编译版本，默认配置，android-ndk-r25c，xcode 14.0.1，ubuntu-18.04，ubuntu-20.04，ubuntu-22.04，vs2015，vs2017，vs2019，vs2022，emscripten-3.1.28

file	content	arch
ncnn-full-source.zip	包含全部 submodule 代码的完整源码
ncnn-android.zip	android 静态库/动态库	armeabi-v7a + arm64-v8a + x86 + x86_64
ncnn-android-vulkan.zip	android 静态库/动态库，支持 GPU	armeabi-v7a + arm64-v8a + x86 + x86_64
ncnn-ios.zip	ios 静态库，with and w/o bitcode	armv7 + arm64 + arm64e + i386 + x86_64
ncnn-ios-vulkan.zip	ios 静态库，支持 GPU，with and w/o bitcode	arm64 + arm64e + x86_64
ncnn-macos.zip	macos 静态库	x86_64 + arm64
ncnn-macos-vulkan.zip	macos 静态库，支持 GPU	x86_64 + arm64
ncnn-ubuntu.zip	ubuntu linux 静态库/动态库，支持 GPU，模型转换工具	x86_64
ncnn-windows.zip	windows 静态库/动态库，支持 GPU，模型转换工具	x86 + x64 + arm + arm64
ncnn-webassembly.zip	webassembly 静态库	wasm32 + simd + threads + simd-threads

扩充binaryop broadcast规则
新增copyto算子，对应于torch inplace slice copy操作
x86 gemm优化，新增 transpose_output 参数
x86 multiheadattention优化
x86 groupnorm优化(@EdVince)
arm gemm优化，包括fp16s/fp16sa
arm gelu优化(@EdVince)
arm multiheadattention优化(@EdVince)
新增获取cpu l2/l3 cache大小接口，通过sysconf/win32-api和linux sysfs
x86 gemm 依据l2 cache分块的优化
x86 convolution/deconvolution/deformableconv2d/matmul 调用 gemm 完成计算
x86 convolution winograd重构支持任意elempack
x86 convolution直接卷积重构支持任意elempack
x86公共的bfloat转换函数
slice/eltwise/concat支持4d输入
c api新增获取output indexes names接口
改善vulkan winograd f43 fp16计算数值稳定性
修复gpu信息bug bliz初始化问题(@weirdseed)
修正arm bfloat2float和float2bfloat命名相反的问题
更新riscv winograd f32系数，修复一些警告
更好的riscv rvv tanh实现
为ncnnoptimize/ncnn2int8添加新加的算子和参数
修复musl libc编译问题
更新stb image和image write，启用arm neon优化
更新emsdk版本到3.1.28，开启SIMPLEOCV(@ncnnnnn)
pnnx新增torch.cumsum转换(@csukuangfj)
pnnx新增torch.atan2/log10转换
pnnx自动替换pow(x,2)为square(x)
修正pnnx windows slice end参数问题(@Yoh-Z)
pnnx自动删除无用的Tensor.clone(@Yoh-Z)
pnnx自动展开模型输入tuple和list类型
pnnx转ncnn时分析binaryop broadcast规则并插入适当的reshape
pnnx折叠常数常量，修复常数转换MemoryData兼容性问题
pnnx合并pixel unshuffle(@Yoh-Z)
去除pnnx readme多余空行(@inisis)
去除pnnx无用的include(@XiangYyang)
修正pyncnn output_indexes接口错误(@wyushun)
修复最新macos vulkan sdk兼容性问题(@w1ndseeker)
删除python代码无用的import(@dianjiaogit)
修复macos ci的xcode版本和vulkan sdk安装问题
更新ci中已废弃的create release步骤
添加CITATION.cff(@tpoisonooo)
更新cpu benchmark数据(@wzyforgit)
修复README编译状态badge(@tpoisonooo)
修复README编译链接(@tuduweb)
修正拼写错误(@hwdef @hiteshhedwig)
添加ncnn-fortran例子(@mizu-bai)
添加sherpa-ncnn实时语音识别例子(@csukuangfj)

New Contributors

@mizu-bai made their first contribution in #4423
@inisis made their first contribution in #4428
@dianjiaogit made their first contribution in #4378
@wyushun made their first contribution in #4453
@w1ndseeker made their first contribution in #4472
@hiteshhedwig made their first contribution in #4486
@weirdseed made their first contribution in #4493
@XiangYyang made their first contribution in #4497
@tuduweb made their first contribution in #4530

Full Changelog: 2022112...2023022

Contributors

wyushun, csukuangfj, and 14 other contributors

Assets 28

28 Nov 05:45

github-actions

20221128

03550ba

android ios macos linux windows webassembly 预编译库 20221128 03550ba

编译版本，默认配置，android-ndk-r25b，xcode 12.4，ubuntu-18.04，ubuntu-20.04，ubuntu-22.04，vs2015，vs2017，vs2019，vs2022，emscripten-2.0.8

file	content	arch
ncnn-full-source.zip	包含全部 submodule 代码的完整源码
ncnn-android.zip	android 静态库/动态库	armeabi-v7a + arm64-v8a + x86 + x86_64
ncnn-android-vulkan.zip	android 静态库/动态库，支持 GPU	armeabi-v7a + arm64-v8a + x86 + x86_64
ncnn-ios.zip	ios 静态库，with and w/o bitcode	armv7 + arm64 + arm64e + i386 + x86_64
ncnn-ios-vulkan.zip	ios 静态库，支持 GPU，with and w/o bitcode	arm64 + arm64e + x86_64
ncnn-macos.zip	macos 静态库	x86_64 + arm64
ncnn-macos-vulkan.zip	macos 静态库，支持 GPU	x86_64 + arm64
ncnn-ubuntu.zip	ubuntu linux 静态库/动态库，支持 GPU，模型转换工具	x86_64
ncnn-windows.zip	windows 静态库/动态库，支持 GPU，模型转换工具	x86 + x64 + arm + arm64
ncnn-webassembly.zip	webassembly 静态库	wasm32 + simd + threads + simd-threads

新增loongarch64 lsx向量指令集优化，包括absval/batchnorm/bias/binaryop/cast/clip/concat/convolution1d/convolutiondepthwise/convolution/crop/deconvolutiondepthwise/deconvolution/dequantize/dropout/eltwise/flatten/hardsigmoid/hardswish/innerproduct/interp/mish/packing/padding/pooling/prelu/quantize/relu/requantize/sigmoid/slice/softmax/swish/tanh/unaryop算子(@junchao-loongson)
layernorm x86优化(@LinHeLurking @LRY89757)
batchnorm/elu/prelu/gelu x86优化(@LRY89757)
softmax arm neon优化(@luqiang-guo)
batchnorm/instancenorm riscv vector优化(@thelastlin)
deformableconv2d x86优化(@miemie2013)
elu vulkan优化(@Yoh-Z)
convolution int8 x86 sse2/avx2优化
更新riscv vector segment load/store(@thelastlin)
改善内存池回收机制(@LinHeLurking)
新增获取cpu物理核心数量api，默认线程数设为物理大核心数量
实现控制单层运算特性是否启用的参数
更通用的macos/ios cpu特性探测过程，a15/a16/m2启用bf16和i8mm指令集
统一innerproduct x86 fp32/fp16s内核代码
修复在android省电模式cpu离线导致openmp崩溃的问题
实现glu算子与对应的pnnx转换(@csukuangfj)
新增fold和unfold算子
新增gridsample算子与对应的pnnx转换(@LRY89757)
lstm支持proj_size参数
groupnorm支持1d/2d/4d输入计算
squeeze/expanddims支持4d输入输出
multiheadattention支持kdim vdim参数
修复convolutiondepthwise allocator的错误设置(@w8501)
修正windows arm环境中convolution权重为空的问题
修复onnx2ncnn blob名字超出255长度的问题(@ZhangGe6)
修正expanddims axes参数id错误的问题(@LiuYi-Up)
修正c api allocator无法工作的问题(@qiqikit)
更严格的编译器armv7 fp16功能检查和兼容
修复老版本gcc编译avx512代码的编译错误(@bestpower)
修复windows-arm64编译(@zchrissirhcz)
修复在老版本ndk引用ncnn链接atomic内置函数失败的问题
修复新版本pybind11编译错误(@tpoisonooo)
python模块支持mat.numpy()(@csukuangfj)
更新pybind11和glslang子模块
pyncnn发布python 3.11包和windows arm版本
pnnx支持pytorch 1.13
pnnx现已支持在cpu上加载gpu导出的torchscript
pnnx保存onnx-zero模型文件
pnnx转换时将常量存储在临时文件减少内存占用
pnnx新增命令行参数fp16=0/1控制是否用fp16保存onnx-zero/ncnn模型
pnnx支持大部分数学函数转换，新增nn.Softmax2d/nn.Fold/nn.Unfold/F.fold/F.unfold/bitwise_left_shift/bitwise_right_shift转换
pnnx改善和匹配inplace slice copy操作
融合更多静态的F.convND/F.linear为nn module
合并临接的reshape
合并pad到conv中
改善pnnx F.softmax转换对dtype兼容性(@EdVince)
修正pnnx softmax/normalize/slice负数axis转换错误的问题
修正pnnx slice end下标错误问题
修正pnnx转ncnn保存fp16权重没考虑对齐的问题
pnnx遇到动态size时不再折叠为常量
pnnx自动折叠new_full/full_like
yolov5示例支持yolov5 6.2(@shaoshengsong)
修复编译警告(@tpoisonooo @veahow)
删除无用空行(@MollySophia @Menci)
修正空格对齐(@tonori)
修正拼写错误(@LRY89757 @Zepan @eltociear)
忽略.xmake目录，CMakeSettings.json，Visual Studio CMake文件(@zchrissirhcz)
重构README(@septs)
改善README布局(@magicse)
添加一些示例项目链接(@magicse @shaoshengsong)
faq新增有关禁用fp16设置的内容(@MisakaBit)
更新riscv rvv ci
新增c906 ci
新增loongarch64 lsx ci
迁移部分github action ci到腾讯ci
新增TH1520 cmake toolchain(@luyanaa)
切分大型单元测试加快多进程测试速度
新增Intel Celeron M 420跑分(@MouriNaruto)
新增T-Head TH1520跑分(@YuzukiTsuru)
新增rock5b rk3588跑分(@hwdef)

New Contributors

@LinHeLurking made their first contribution in #4065
@septs made their first contribution in #4114
@w8501 made their first contribution in #4173
@MollySophia made their first contribution in #4187
@Menci made their first contribution in #4188
@magicse made their first contribution in #4193
@tonori made their first contribution in #4217
@YuzukiTsuru made their first contribution in #4240
@ZhangGe6 made their first contribution in #4236
@MisakaBit made their first contribution in #4248
@LiuYi-Up made their first contribution in #4259
@veahow made their first contribution in #4274
@csukuangfj made their first contribution in #4283
@Zepan made their first contribution in #4287
@bestpower made their first contribution in #4294
@shaoshengsong made their first contribution in #4328
@junchao-loongson made their first contribution in #4242
@eltociear made their first contribution in #4358

Full Changelog: 2022072...2022112

Contributors

Zepan, zchrissirhcz, and 28 other contributors

Assets 28

29 Jul 03:04

github-actions

20220729

b4ba207

android ios macos linux windows webassembly 预编译库 20220729 b4ba207

编译版本，默认配置，android-ndk-r24，xcode 12.4，ubuntu-18.04，ubuntu-20.04，ubuntu-22.04，vs2015，vs2017，vs2019，vs2022，emscripten-2.0.8

file	content	arch
ncnn-full-source.zip	包含全部 submodule 代码的完整源码
ncnn-android.zip	android 静态库/动态库	armeabi-v7a + arm64-v8a + x86 + x86_64
ncnn-android-vulkan.zip	android 静态库/动态库，支持 GPU	armeabi-v7a + arm64-v8a + x86 + x86_64
ncnn-ios.zip	ios 静态库，with and w/o bitcode	armv7 + arm64 + arm64e + i386 + x86_64
ncnn-ios-vulkan.zip	ios 静态库，支持 GPU，with and w/o bitcode	arm64 + arm64e + x86_64
ncnn-macos.zip	macos 静态库	x86_64 + arm64
ncnn-macos-vulkan.zip	macos 静态库，支持 GPU	x86_64 + arm64
ncnn-ubuntu.zip	ubuntu linux 静态库/动态库，支持 GPU，模型转换工具	x86_64
ncnn-windows.zip	windows 静态库/动态库，支持 GPU，模型转换工具	x86 + x64 + arm + arm64
ncnn-webassembly.zip	webassembly 静态库	wasm32 + simd + threads + simd-threads

batchnorm avx512 优化(@LRY89757)
新增DeformableConv2d层和单元测试(@miemie2013)
修复conv3x3 winograd tensorcore权重数据错乱导致结果出错的问题
修复memorydata 4维数据转换的问题
pnnx转换torchvision.ops.DeformConv2d到ncnn
pnnx自动删除无用的 mul + torch.ones 和 add + torch.zeros
pnnx修复动态shape时删除无用pad可能的崩溃问题
pnnx修复动态shape时错误删除upsample的问题
添加sse优化文档(@DC-Zhou)
更加严格的编译器riscv vector支持检查，删除rvv-0.7.1编译支持
更新ci中android ndk路径，使用android-ndk-r24打包

New Contributors

@DC-Zhou made their first contribution in #4053
@miemie2013 made their first contribution in #4070

Full Changelog: 2022072...2022072

Contributors

DC-Zhou, miemie2013, and LRY89757

Assets 28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New Contributors

Contributors

New Contributors

Contributors

New Contributors

Contributors

New Contributors

Contributors

New Contributors

Contributors

New Contributors

Contributors

New Contributors

Contributors

New Contributors

Contributors

New Contributors

Contributors

New Contributors

Contributors

Releases: Tencent/ncnn

android ios macos linux windows webassembly watchos tvos visionos 预编译库 20241226 5285895

New Contributors

Contributors

android ios macos linux windows webassembly 预编译库 20240820 a6d3ef5

New Contributors

Contributors

android ios macos linux windows webassembly 预编译库 20240410 56775de

New Contributors

Contributors

android ios macos linux windows webassembly 预编译库 20240102 1e88fb8

New Contributors

Contributors

android ios macos linux windows webassembly 预编译库 20231027 3116e02

New Contributors

Contributors

android ios macos linux windows webassembly 预编译库 20230816 39721ee

New Contributors

Contributors

android ios macos linux windows webassembly 预编译库 20230517 903ec7c

New Contributors

Contributors

android ios macos linux windows webassembly 预编译库 20230223 ff80ac2

New Contributors

Contributors

android ios macos linux windows webassembly 预编译库 20221128 03550ba

New Contributors

Contributors

android ios macos linux windows webassembly 预编译库 20220729 b4ba207

New Contributors

Contributors