[ONNX] Add per channel quantization support for Onnx.QLinearConv op #3917

vivekkhandelwal1 · 2024-12-13T06:50:03Z

This commit extends the OnnxToTorch Lowering for Onnx.QLinearConv op by adding the support for per channel quantization for the weight argument.

Signed-off-by: Vivek Khandelwal [email protected]

lib/Conversion/TorchOnnxToTorch/DefaultDomainQtoZ.cpp

jinchen62

LGTM

zjgarvey

Thanks Vivek. I think you need to modify some of the output quantization handling in the per-channel case. Maybe store a bool that tracks if we are in the per-channel case so you can reuse it for the output.

It looks like this conversion automatically fuses the input and weight quantization with the convolution, so the only thing that fuse-quantized-ops is going to do is quantize the bias (which won't work currently in the per-channel case). I think it is fine, but we won't be able to check correctness e2e until we address the per-channel quantization, unfortunately.

zjgarvey · 2024-12-19T14:32:43Z

lib/Conversion/TorchOnnxToTorch/DefaultDomainQtoZ.cpp

+          return failure();
+        auto weightShape = weightTy.getSizes();
+        auto weightScaleShape = weightScaleTy.getSizes();
+        Value weightScaleScalar = extract(weightScale);


extract won't work if the weight scale isn't a single element. I'd put this in the else block below.

I see you use this below to handle the quantization of the output, but this must also be per-channel if the weight is per-channel.

zjgarvey · 2024-12-19T14:33:47Z

lib/Conversion/TorchOnnxToTorch/DefaultDomainQtoZ.cpp

+        Value weightScaleScalar = extract(weightScale);
+        if (weightScaleShape.size() == 1 &&
+            weightScaleShape[0] != Torch::kUnknownSize &&
+            weightScaleShape[0] == weightShape[0]) {


Additionally check that weightShape[0] != 1 since we don't want to lower to per-channel when there is only one channel.

zjgarvey · 2024-12-19T14:36:38Z

lib/Conversion/TorchOnnxToTorch/DefaultDomainQtoZ.cpp

+        } else {
+          weightZp = extract(weightZp);
+          weight = makePerTensor(weight, weightScaleScalar, weightZp);
+        }


A bit of a nit, but I'd prefer an else if here with the conditions for makePerTensor, and then an else branch with an unreachable, just to be very clear about what assumptions are being made in each case.

zjgarvey · 2024-12-19T14:44:21Z

lib/Conversion/TorchOnnxToTorch/DefaultDomainQtoZ.cpp


-        cTy = rewriter.getType<Torch::ValueTensorType>(
+        outputTy = rewriter.getType<Torch::ValueTensorType>(


Okay, this is a bit subtle. The last optional input for this op is the int32 bias, assumed to be quantized via the product of input and weight scales. This implies that the quantization of the bias (and also the output of the convolution) is also per-channel if the weight was per-channel quantized. This part is fine, but we will need to case out the logic below.

zjgarvey · 2024-12-19T14:46:08Z

lib/Conversion/TorchOnnxToTorch/DefaultDomainQtoZ.cpp


        Value outScale = rewriter.create<Torch::AtenMulFloatOp>(
-            binder.getLoc(), rewriter.getType<Torch::FloatType>(), aScale,
-            bScale);
+            binder.getLoc(), rewriter.getType<Torch::FloatType>(), inputScale,


Will need to possibly be float x tensor mul.

vivekkhandelwal1 added 4 commits December 12, 2024 18:22

add per channel quantization for onnx.qlinearconv op

d7a9066

More changes

ffffb0c

Remove some code

599a877

Update lit test

68ad21b

vivekkhandelwal1 requested review from zjgarvey, rsuderman and AmosLewis December 13, 2024 07:47

vinayakdsci reviewed Dec 17, 2024

View reviewed changes

lib/Conversion/TorchOnnxToTorch/DefaultDomainQtoZ.cpp Show resolved Hide resolved

vivekkhandelwal1 mentioned this pull request Dec 18, 2024

failed to legalize operation 'torch.operator' that was explicitly marked illegal: onnx.QLinearConv nod-ai/SHARK-ModelDev#894

Open

jinchen62 approved these changes Dec 19, 2024

View reviewed changes

zjgarvey requested changes Dec 19, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ONNX] Add per channel quantization support for Onnx.QLinearConv op #3917

[ONNX] Add per channel quantization support for Onnx.QLinearConv op #3917

vivekkhandelwal1 commented Dec 13, 2024

jinchen62 left a comment

zjgarvey left a comment

zjgarvey Dec 19, 2024

zjgarvey Dec 19, 2024

zjgarvey Dec 19, 2024

zjgarvey Dec 19, 2024

zjgarvey Dec 19, 2024

zjgarvey Dec 19, 2024


		cTy = rewriter.getType<Torch::ValueTensorType>(
		outputTy = rewriter.getType<Torch::ValueTensorType>(

[ONNX] Add per channel quantization support for Onnx.QLinearConv op #3917

Are you sure you want to change the base?

[ONNX] Add per channel quantization support for Onnx.QLinearConv op #3917

Conversation

vivekkhandelwal1 commented Dec 13, 2024

jinchen62 left a comment

Choose a reason for hiding this comment

zjgarvey left a comment

Choose a reason for hiding this comment

zjgarvey Dec 19, 2024

Choose a reason for hiding this comment

zjgarvey Dec 19, 2024

Choose a reason for hiding this comment

zjgarvey Dec 19, 2024

Choose a reason for hiding this comment

zjgarvey Dec 19, 2024

Choose a reason for hiding this comment

zjgarvey Dec 19, 2024

Choose a reason for hiding this comment

zjgarvey Dec 19, 2024

Choose a reason for hiding this comment