Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Performance]: Why are the inference results in Python different from those in C++? #28188

Open
3 tasks done
hujhcv opened this issue Dec 24, 2024 · 2 comments
Open
3 tasks done
Assignees
Labels
category: Python API OpenVINO Python bindings performance Performance related topics support_request

Comments

@hujhcv
Copy link

hujhcv commented Dec 24, 2024

OpenVINO Version

No response

Operating System

Windows System

Device used for inference

CPU

OpenVINO installation

PyPi

Programming Language

C++

Hardware Architecture

x86 (64 bits)

Model used

mobilenet v2

Model quantization

No

Target Platform

No response

Performance issue description

In Python, I used the MobileNet model to infer an image, and both PyTorch and OpenVINO results were: Samoyed: 83.0%.


from torchvision.io import read_image
from torchvision import transforms
from torchvision.models import mobilenet_v2, MobileNet_V2_Weights
import requests, PIL, io, torch

img = PIL.Image.open("E:/dog.jpg")

preprocess = transforms.Compose([
    transforms.Resize(256),
    transforms.CenterCrop(224),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])


weights = MobileNet_V2_Weights.DEFAULT
model = mobilenet_v2(pretrained=True)
model.eval()
batch = preprocess(img).unsqueeze(0)


prediction = model(batch).squeeze(0).softmax(0)
class_id = prediction.argmax().item()
score = prediction[class_id].item()
category_name = weights.meta["categories"][class_id]
print(f"{category_name}: {100 * score:.1f}% (with PyTorch)")

# OpenVINO model preparation and inference with the same post-processing

import openvino as ov
compiled_model = ov.compile_model(ov.convert_model(model, example_input=batch))

prediction = torch.tensor(compiled_model(batch)[0]).squeeze(0).softmax(0)
class_id = prediction.argmax().item()
score = prediction[class_id].item()
category_name = weights.meta["categories"][class_id]
print(f"{category_name}: {100 * score:.1f}% (with OpenVINO)")

I exported the PyTorch MobileNet model to an OpenVINO IR file using the following Python code.


import torchvision
import torch, PIL
from torchvision import transforms
import openvino as ov

model = torchvision.models.mobilenet_v2(pretrained=True)
model.eval()
ov_model = ov.convert_model(model, example_input=torch.rand(1, 3, 224, 224))
output_filenmae = "mobilenet_v2.xml"
ov.save_model(ov_model, output_filenmae)

Then, using the following C++ code for inference, the result was: Samoyed: 68.7778%.


int main(int argc, char* argv[])
{
	try
	{

		ov::Core core; // OpenVINO core object
		std::shared_ptr<ov::Model> model = core.read_model("E:/mobilenet_v2.xml");

		// If the model has dynamic shapes, reshape it to the specified input shape
		if (model->is_dynamic())
		{
			model->reshape({ 1, 3, static_cast<long int>(224), static_cast<long int>(224) });
		}

		ov::preprocess::PrePostProcessor ppp = ov::preprocess::PrePostProcessor(model);
		ppp.input().tensor()
			.set_element_type(ov::element::u8)
			.set_layout("NHWC")
			.set_color_format(ov::preprocess::ColorFormat::BGR);
		ppp.input().preprocess()
			.convert_element_type(ov::element::f32)
			.convert_color(ov::preprocess::ColorFormat::RGB)
			.scale({ 255.0f, 255.0f, 255.0f })
			.mean({ 0.485f, 0.456f, 0.406f })
			.scale({ 0.229f, 0.224f, 0.225f });

		ppp.input().model().set_layout("NCHW");
		ppp.output().tensor().set_element_type(ov::element::f32);

		model = ppp.build(); // Build the preprocessed model

		// Compile the model for inference
		ov::CompiledModel compiled_model = core.compile_model(model, "CPU");
		ov::InferRequest inference_request = compiled_model.create_infer_request(); // Create inference request

		short width, height;

		// Get input shape from the model
		const std::vector<ov::Output<ov::Node>> inputs = model->inputs();
		const ov::Shape input_shape = inputs[0].get_shape();
		cv::Size model_input_shape = cv::Size(input_shape[2], input_shape[1]);

		// Get output shape from the model
		const std::vector<ov::Output<ov::Node>> outputs = model->outputs();
		const ov::Shape output_shape = outputs[0].get_shape();
		int classesNum = output_shape[1];


		cv::Mat img = cv::imread("E:/dog.jpg");

		cv::Mat resizedImage;
		cv::resize(img, resizedImage, cv::Size(256, 256));
		int centerX = resizedImage.cols / 2;
		int centerY = resizedImage.rows / 2;
		int cropSize = 224;
		int startX = centerX - cropSize / 2;
		int startY = centerY - cropSize / 2;
		cv::Mat croppedImage;
		resizedImage(cv::Rect(startX, startY, cropSize, cropSize)).copyTo(croppedImage);

		float* input_data = (float*)croppedImage.data; // Get pointer to resized frame data
		const ov::Tensor input_tensor = ov::Tensor(compiled_model.input().get_element_type(), compiled_model.input().get_shape(), input_data); // Create input tensor
		inference_request.set_input_tensor(input_tensor); // Set input tensor for inference

		inference_request.infer();

		const ov::Tensor& output_tensor = inference_request.get_output_tensor();
		const float* tensorData = inference_request.get_output_tensor().data<const float>();

                // softmax
		double softmaxData[1000];
		{
			float max_val = 0;
			for (int i = 0; i < 1000; i++)
			{
				if (max_val < tensorData[i])
				{
					max_val = tensorData[i];
				}
			}

			double sum_exp = 0.0;
			for (size_t i = 0; i < 1000; ++i)
			{
				softmaxData[i] = std::exp(tensorData[i] - max_val);
				sum_exp += softmaxData[i];
			}

			for (size_t i = 0; i < 1000; ++i)
			{
				softmaxData[i] /= sum_exp;
			}
		}


		double maxScore = 0;
		int maxScoreIndex = 0;
		for (int i = 0; i < classesNum; i++)
		{
			if (maxScore < softmaxData[i])
			{
				maxScore = softmaxData[i];
				maxScoreIndex = i;
			}
		}

		std::cout << "Max: " << maxScore*100 << "%  Index: " << maxScoreIndex << std::endl;


		return 0;

	}
	catch (const std::exception& ex)
	{
		std::cerr << ex.what() << std::endl;
		return 1;
	}
}

The difference in results is quite large. Is there a problem with my C++ code?

Step-by-step reproduction

No response

Issue submission checklist

  • I'm reporting a performance issue. It's not a question.
  • I checked the problem with the documentation, FAQ, open issues, Stack Overflow, etc., and have not found a solution.
  • There is reproducer code and related data files such as images, videos, models, etc.
@hujhcv hujhcv added performance Performance related topics support_request labels Dec 24, 2024
@ilya-lavrenov ilya-lavrenov added the category: Python API OpenVINO Python bindings label Dec 24, 2024
@adminaccount001
Copy link

you should use PIL resize instead of OpenCV

@hujhcv
Copy link
Author

hujhcv commented Dec 27, 2024

you should use PIL resize instead of OpenCV

I also suspect that the difference in the input data to the model might be caused by different scaling interpolation algorithms. However, I used a plain gray image (where every pixel has a value of 198) for inference. In this case, regardless of the scaling interpolation algorithm used, the final input data to the model should be the same. Unfortunately, the results from Python and C++ still show discrepancies.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: Python API OpenVINO Python bindings performance Performance related topics support_request
Projects
None yet
Development

No branches or pull requests

4 participants