Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gemini 2.0 Flash spatial reasoning severly degraded with Image size 1024 #383

Open
fayezsalka opened this issue Dec 22, 2024 · 3 comments
Open
Assignees
Labels
component:other Issues unrelated to examples/quickstarts status:triaged Issue/PR triaged to the corresponding sub-team type:bug Something isn't working

Comments

@fayezsalka
Copy link

Description of the bug:

I was trying to reproduce results I get with the spatial reasoning applet found in the AI studio, using the spatial reasoning notebook. However, I noticed for the same prompt and same image, applet produces far better results with 2.0 Flash.

Upon further investigation, it seems that in the notebook the image is resized to 1024, but in the applet the image is resized to 640. Using images with size of 1024 significantly degrades the quality of the boxes generated. E.g a lot of the bounding boxes are misplaced and detected incorrectly. Changing the sizing to use maximum 640 for width in the notebook fixes the issue.

Any intuition on why image sizes 1024 degardes the spatial reasoning?

Actual vs expected behavior:

No response

Any other information you'd like to share?

No response

@fayezsalka
Copy link
Author

Here is the comparison between 640 and 1024 image size:

640:
image

1024:
image

The degraded quality is not specific to this image / prompt. Same issue happens with all the examples found in spatial reasoning. Cupcakes, Fox, etc

@Giom-V
Copy link
Collaborator

Giom-V commented Dec 23, 2024

Thanks for reporting that behavior! I'll check with the Deepmind researchers but I'd say that it's because there is too much information when the image is not resized.

@manojssmk manojssmk added type:bug Something isn't working status:triaged Issue/PR triaged to the corresponding sub-team component:other Issues unrelated to examples/quickstarts labels Dec 24, 2024
@lucianommartins
Copy link
Contributor

Hi @fayezsalka,

Can you share a colab with the experiment you did? I tried reproducing the misbehavior you are facing using the following image but it consistently worked with resize() set to [1024, 768, 640, 512, 256]:

image

Thanks!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component:other Issues unrelated to examples/quickstarts status:triaged Issue/PR triaged to the corresponding sub-team type:bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants