Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GSoC] Updates for Quantized models for QDQ method #266

Open
wants to merge 12 commits into
base: main
Choose a base branch
from
3 changes: 2 additions & 1 deletion models/face_detection_yunet/README.md
100644 → 100755
Original file line number Diff line number Diff line change
Expand Up @@ -8,13 +8,14 @@ Notes:
- This model can detect **faces of pixels between around 10x10 to 300x300** due to the training scheme.
- For details on training this model, please visit https://github.com/ShiqiYu/libfacedetection.train.
- This ONNX model has fixed input shape, but OpenCV DNN infers on the exact shape of input image. See https://github.com/opencv/opencv_zoo/issues/44 for more information.
- Quantization was done via Per Tensor method.

Results of accuracy evaluation with [tools/eval](../../tools/eval).

| Models | Easy AP | Medium AP | Hard AP |
| ----------- | ------- | --------- | ------- |
| YuNet | 0.8871 | 0.8710 | 0.7681 |
| YuNet quant | 0.8838 | 0.8683 | 0.7676 |
| YuNet quant | 0.8809 | 0.8626 | 0.7493 |
Copy link
Member

@fengyuentau fengyuentau Jul 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should test models using opencv dnn and give result numbers. That is why we need qdq support first for dnn.


\*: 'quant' stands for 'quantized'.

Expand Down
4 changes: 2 additions & 2 deletions models/face_detection_yunet/face_detection_yunet_2023mar_int8.onnx
100644 → 100755
Git LFS file not shown
3 changes: 2 additions & 1 deletion models/face_recognition_sface/README.md
100644 → 100755
Original file line number Diff line number Diff line change
Expand Up @@ -8,13 +8,14 @@ Note:
- Model files encode MobileFaceNet instances trained on the SFace loss function, see the [SFace paper](https://arxiv.org/abs/2205.12010) for reference.
- ONNX file conversions from [original code base](https://github.com/zhongyy/SFace) thanks to [Chengrui Wang](https://github.com/crywang).
- (As of Sep 2021) Supporting 5-landmark warping for now, see below for details.
- Quantization was done via Per Tensor method.

Results of accuracy evaluation with [tools/eval](../../tools/eval).

| Models | Accuracy |
| ----------- | -------- |
| SFace | 0.9940 |
| SFace quant | 0.9932 |
| SFace quant | 0.9928 |

\*: 'quant' stands for 'quantized'.

Expand Down
4 changes: 2 additions & 2 deletions models/face_recognition_sface/face_recognition_sface_2021dec_int8.onnx
100644 → 100755
Git LFS file not shown
1 change: 1 addition & 0 deletions models/facial_expression_recognition/README.md
100644 → 100755
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ Note:
- Progressive Teacher is contributed by [Jing Jiang](https://scholar.google.com/citations?user=OCwcfAwAAAAJ&hl=zh-CN).
- [MobileFaceNet](https://link.springer.com/chapter/10.1007/978-3-319-97909-0_46) is used as the backbone and the model is able to classify seven basic facial expressions (angry, disgust, fearful, happy, neutral, sad, surprised).
- [facial_expression_recognition_mobilefacenet_2022july.onnx](https://github.com/opencv/opencv_zoo/raw/master/models/facial_expression_recognition/facial_expression_recognition_mobilefacenet_2022july.onnx) is implemented thanks to [Chengrui Wang](https://github.com/crywang).
- Quantization was done via Per Channel method.

Results of accuracy evaluation on [RAF-DB](http://whdeng.cn/RAF/model1.html).

Expand Down
Git LFS file not shown
1 change: 1 addition & 0 deletions models/handpose_estimation_mediapipe/README.md
100644 → 100755
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ This model is converted from TFlite to ONNX using following tools:
**Note**:
- The int8-quantized model may produce invalid results due to a significant drop of accuracy.
- Visit https://github.com/google/mediapipe/blob/master/docs/solutions/models.md#hands for models of larger scale.
- Quantization was done via Per Tensor method.

## Demo

Expand Down
4 changes: 2 additions & 2 deletions models/handpose_estimation_mediapipe/handpose_estimation_mediapipe_2023feb_int8.onnx
100644 → 100755
Git LFS file not shown
4 changes: 2 additions & 2 deletions models/human_segmentation_pphumanseg/README.md
100644 → 100755
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# PPHumanSeg

This model is ported from [PaddleHub](https://github.com/PaddlePaddle/PaddleHub) using [this script from OpenCV](https://github.com/opencv/opencv/blob/master/samples/dnn/dnn_model_runner/dnn_conversion/paddlepaddle/paddle_humanseg.py).
This model is ported from [PaddleHub](https://github.com/PaddlePaddle/PaddleHub) using [this script from OpenCV](https://github.com/opencv/opencv/blob/master/samples/dnn/dnn_model_runner/dnn_conversion/paddlepaddle/paddle_humanseg.py). Quantization was done via Per Tensor method.

## Demo

Expand Down Expand Up @@ -47,7 +47,7 @@ Results of accuracy evaluation with [tools/eval](../../tools/eval).
| Models | Accuracy | mIoU |
| ------------------ | -------------- | ------------- |
| PPHumanSeg | 0.9581 | 0.8996 |
| PPHumanSeg quant | 0.4365 | 0.2788 |
| PPHumanSeg quant | 0.7261 | 0.3687 |


\*: 'quant' stands for 'quantized'.
Expand Down
4 changes: 2 additions & 2 deletions models/human_segmentation_pphumanseg/human_segmentation_pphumanseg_2023mar_int8.onnx
100644 → 100755
Git LFS file not shown
6 changes: 4 additions & 2 deletions models/image_classification_mobilenet/README.md
100644 → 100755
Original file line number Diff line number Diff line change
Expand Up @@ -6,12 +6,14 @@ MobileNetV2: Inverted Residuals and Linear Bottlenecks

Results of accuracy evaluation with [tools/eval](../../tools/eval).

Quantization was done via Per Channel method for V1 and Per Tensor for V2

| Models | Top-1 Accuracy | Top-5 Accuracy |
| ------------------ | -------------- | -------------- |
| MobileNet V1 | 67.64 | 87.97 |
| MobileNet V1 quant | 55.53 | 78.74 |
| MobileNet V1 quant | 40.50 | 53.87 |
| MobileNet V2 | 69.44 | 89.23 |
| MobileNet V2 quant | 68.37 | 88.56 |
| MobileNet V2 quant | 58.10 | 87.40 |

\*: 'quant' stands for 'quantized'.

Expand Down
4 changes: 2 additions & 2 deletions models/image_classification_mobilenet/image_classification_mobilenetv1_2022apr_int8.onnx
100644 → 100755
Git LFS file not shown
4 changes: 2 additions & 2 deletions models/image_classification_mobilenet/image_classification_mobilenetv2_2022apr_int8.onnx
100644 → 100755
Git LFS file not shown
1 change: 1 addition & 0 deletions models/license_plate_detection_yunet/README.md
100644 → 100755
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
This model is contributed by Dong Xu (徐栋) from [watrix.ai](watrix.ai) (银河水滴).

Please note that the model is trained with Chinese license plates, so the detection results of other license plates with this model may be limited.
Quantization was done via Per Tensor method.

## Demo

Expand Down
4 changes: 2 additions & 2 deletions models/license_plate_detection_yunet/license_plate_detection_lpd_yunet_2023mar_int8.onnx
100644 → 100755
Git LFS file not shown
1 change: 1 addition & 0 deletions models/object_detection_yolox/README.md
100644 → 100755
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ Key features of the YOLOX object detector

Note:
- This version of YoloX: YoloX_s
- Quantization was done via Per Tensor method.

## Demo

Expand Down
4 changes: 2 additions & 2 deletions models/object_detection_yolox/object_detection_yolox_2022nov_int8.onnx
100644 → 100755
Git LFS file not shown
1 change: 1 addition & 0 deletions models/palm_detection_mediapipe/README.md
100644 → 100755
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ SSD Anchors are generated from [GenMediaPipePalmDectionSSDAnchors](https://githu

**Note**:
- Visit https://github.com/google/mediapipe/blob/master/docs/solutions/models.md#hands for models of larger scale.
- Quantization was done via Per Tensor method.

## Demo

Expand Down
4 changes: 2 additions & 2 deletions models/palm_detection_mediapipe/palm_detection_mediapipe_2023feb_int8.onnx
100644 → 100755
Git LFS file not shown
1 change: 1 addition & 0 deletions models/pose_estimation_mediapipe/README.md
100644 → 100755
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ This model is converted from TFlite to ONNX using following tools:

**Note**:
- Visit https://github.com/google/mediapipe/blob/master/docs/solutions/models.md#pose for models of larger scale.
- Quantization was done via Per Channel method.
## Demo

### python
Expand Down
Git LFS file not shown
1 change: 1 addition & 0 deletions models/text_recognition_crnn/README.md
100644 → 100755
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
[An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition](https://arxiv.org/abs/1507.05717)

Results of accuracy evaluation with [tools/eval](../../tools/eval) at different text recognition datasets.
2021 Sep English model's Quantization was done via Per Channel method.

| Model name | ICDAR03(%) | IIIT5k(%) | CUTE80(%) |
| ------------ | ---------- | --------- | --------- |
Expand Down
Git LFS file not shown
58 changes: 49 additions & 9 deletions tools/quantize/quantize-ort.py
100644 → 100755
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@
import onnxruntime
from onnxruntime.quantization import quantize_static, CalibrationDataReader, QuantType, QuantFormat, quant_pre_process

from transform import Compose, Resize, CenterCrop, Normalize, ColorConvert, HandAlign
from transform import Compose, Resize, CenterCrop, Normalize, ColorConvert, HandAlign, ImagePad

class DataReader(CalibrationDataReader):
def __init__(self, model_path, image_dir, transforms, data_dim):
Expand Down Expand Up @@ -79,7 +79,7 @@ def run(self):
quant_pre_process(new_model_path, new_model_path)
output_name = '{}_{}.onnx'.format(self.model_path[:-5], self.wt_type)
quantize_static(new_model_path, output_name, self.dr,
quant_format=QuantFormat.QOperator, # start from onnxruntime==1.11.0, quant_format is set to QuantFormat.QDQ by default, which performs fake quantization
quant_format=QuantFormat.QDQ, # start from onnxruntime==1.11.0, quant_format is set to QuantFormat.QDQ by default, which performs fake quantization
per_channel=self.per_channel,
weight_type=self.type_dict[self.wt_type],
activation_type=self.type_dict[self.act_type],
Expand All @@ -91,22 +91,63 @@ def run(self):
models=dict(
yunet=Quantize(model_path='../../models/face_detection_yunet/face_detection_yunet_2023mar.onnx',
calibration_image_dir='../../benchmark/data/face_detection',
transforms=Compose([Resize(size=(160, 120))]),
transforms=Compose([Resize(size=(640, 640))]),
nodes_to_exclude=['MaxPool_5', 'MaxPool_18', 'MaxPool_25', 'MaxPool_32'],
),
), #COLOR_BGR2RGB
sface=Quantize(model_path='../../models/face_recognition_sface/face_recognition_sface_2021dec.onnx',
calibration_image_dir='../../benchmark/data/face_recognition',
transforms=Compose([Resize(size=(112, 112))])),
# Facial Expression Recognition net
facexpnet=Quantize(model_path='../../models/facial_expression_recognition/facial_expression_recognition_mobilefacenet_2022july.onnx',
calibration_image_dir='../../benchmark/data/facial_expression_recognition/fer_calibration',
transforms=Compose([Resize(size=(112, 112)),
ColorConvert(ctype=cv.COLOR_BGR2RGB),
Normalize(std=[255, 255, 255])
])),
# Object Detection nanonet
nanonet=Quantize(model_path='../../models/object_detection_nanodet/object_detection_nanodet_2022nov.onnx',
calibration_image_dir='../../benchmark/data/object_detection',
transforms=Compose([Resize(size=(112, 112))])),
# object_detection_yolox
yolox=Quantize(model_path='../../models/object_detection_yolox/object_detection_yolox_2022nov.onnx',
calibration_image_dir='../../benchmark/data/object_detection',
transforms=Compose([Resize(size=(640, 640))])),
# object_tracking_vittrack
vittrack=Quantize(model_path='../../models/object_tracking_vittrack/object_tracking_vittrack_2023sep.onnx',
calibration_image_dir='../../benchmark/data/object_tracking_image',
transforms=Compose([Resize(size=(640, 640))])),

pphumanseg=Quantize(model_path='../../models/human_segmentation_pphumanseg/human_segmentation_pphumanseg_2023mar.onnx',
calibration_image_dir='../../benchmark/data/human_segmentation',
transforms=Compose([Resize(size=(192, 192))])),

mobilenetv1=Quantize(model_path='../../models/image_classification_mobilenet/image_classification_mobilenetv1_2022apr.onnx',
calibration_image_dir='../../benchmark/data/image_classification',
transforms=Compose([
Resize(size=(224, 224)),
Normalize(std=[255, 255, 255]),
Normalize(mean=[0.485, 0.456, 0.406],std=[0.229, 0.224, 0.225])
])),
mobilenetv2=Quantize(model_path='../../models/image_classification_mobilenet/image_classification_mobilenetv2_2022apr.onnx',
calibration_image_dir='../../benchmark/data/image_classification',
transforms=Compose([
Resize(size=(224, 224)),
Normalize(std=[255, 255, 255]),
Normalize(mean=[0.485, 0.456, 0.406],std=[0.229, 0.224, 0.225])
])),

ppresnet50=Quantize(model_path='../../models/image_classification_ppresnet/image_classification_ppresnet50_2022jan.onnx',
calibration_image_dir='../../benchmark/data/image_classification',
transforms=Compose([Resize(size=(224, 224))])),
# TBD: VitTrack
youtureid=Quantize(model_path='../../models/person_reid_youtureid/person_reid_youtu_2021nov.onnx',
calibration_image_dir='../../benchmark/data/person_reid',
transforms=Compose([Resize(size=(128, 256))])),
mppose=Quantize(model_path='../../models/pose_estimation_mediapipe/pose_estimation_mediapipe_2023mar.onnx',
calibration_image_dir='../../benchmark/data/person_detection',
transforms=Compose([Resize(size=(256, 256)),
ColorConvert(ctype=cv.COLOR_BGR2RGB),
Normalize(std=[255, 255, 255]),
]),data_dim="hwc"),
ppocrv3det_en=Quantize(model_path='../../models/text_detection_ppocr/text_detection_en_ppocrv3_2023may.onnx',
calibration_image_dir='../../benchmark/data/text',
transforms=Compose([Resize(size=(736, 736)),
Expand All @@ -122,18 +163,17 @@ def run(self):
calibration_image_dir='../../benchmark/data/text',
transforms=Compose([Resize(size=(100, 32))])),
mp_palmdet=Quantize(model_path='../../models/palm_detection_mediapipe/palm_detection_mediapipe_2023feb.onnx',
calibration_image_dir='path/to/dataset',
calibration_image_dir='../../benchmark/data/FreiHAND/evaluation/rgb',
transforms=Compose([Resize(size=(192, 192)), Normalize(std=[255, 255, 255]),
ColorConvert(ctype=cv.COLOR_BGR2RGB)]), data_dim='hwc'),
mp_handpose=Quantize(model_path='../../models/handpose_estimation_mediapipe/handpose_estimation_mediapipe_2023feb.onnx',
calibration_image_dir='path/to/dataset',
calibration_image_dir='../../benchmark/data/FreiHAND/evaluation/rgb',
transforms=Compose([HandAlign("mp_handpose"), Resize(size=(224, 224)), Normalize(std=[255, 255, 255]),
ColorConvert(ctype=cv.COLOR_BGR2RGB)]), data_dim='hwc'),
lpd_yunet=Quantize(model_path='../../models/license_plate_detection_yunet/license_plate_detection_lpd_yunet_2023mar.onnx',
calibration_image_dir='../../benchmark/data/license_plate_detection',
transforms=Compose([Resize(size=(320, 240))]),
nodes_to_exclude=['MaxPool_5', 'MaxPool_18', 'MaxPool_25', 'MaxPool_32', 'MaxPool_39'],
),
nodes_to_exclude=['MaxPool_5', 'MaxPool_18', 'MaxPool_25', 'MaxPool_32', 'MaxPool_39'],),
)

if __name__ == '__main__':
Expand Down