Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SPAM of artifact trash candidates #21335

Open
pkalemba opened this issue Dec 18, 2024 · 7 comments
Open

SPAM of artifact trash candidates #21335

pkalemba opened this issue Dec 18, 2024 · 7 comments

Comments

@pkalemba
Copy link

Hi

I'm running Harbor instance version 2.11, and we run GC each hour.

We are producing a lot of test images that are deleted after 24h.

But in the log of GC I can find a HUGE amount of

artifact trash candidates:
2024-12-18T14:10:04Z [INFO] [/jobservice/job/impl/gc/garbage_collection.go:599]: ID-5187 MediaType-application/vnd.docker.container.image.v1+json ManifestMediaType-application/vnd.docker.distribution.manifest.v2+json RepositoryName-XXXXXXX Digest-sha256:371c001e161686e2d85c07ecf486e4860020ae266ff31cb6ca9e4254fc57c8c3 CreationTime-2023-01-27 12:00:09

Right now a log of GC has ~20MB....

Can I somehow delete it? GC should not delete it?

@wy65701436
Copy link
Contributor

hi @pkalemba, do you mean that harbor deletes the artifact that shouldn't be removed? Did you check the audit log to see if someone removed it somehow?

@pkalemba
Copy link
Author

@wy65701436 I looked in the code, in the database and it seems that I have leftovers in the artifact_trash table.
IMO this is connected with #20735.
For a long time, we had gc and retention jobs running every hour at minute 0.
Right now, the artifact_trash table has 50k records, and honestly, I don't know what to do with them..... I don't see this digest on disk or in other tables

@chlins
Copy link
Member

chlins commented Dec 19, 2024

@wy65701436 I looked in the code, in the database and it seems that I have leftovers in the artifact_trash table. IMO this is connected with #20735. For a long time, we had gc and retention jobs running every hour at minute 0. Right now, the artifact_trash table has 50k records, and honestly, I don't know what to do with them..... I don't see this digest on disk or in other tables

@pkalemba In your scenario, during GC execution, are there users or scheduled tasks, such as tag retention, deleting artifacts?

@pkalemba
Copy link
Author

pkalemba commented Dec 19, 2024

@chlins Yes, it can happen.
Some time ago all retention tasks and GC were starting at full hours, now we have distributed them, but there are leftovers,

can someone tell me if its safe to delete them?

@chlins
Copy link
Member

chlins commented Dec 23, 2024

@chlins Yes, it can happen.

Some time ago all retention tasks and GC were starting at full hours, now we have distributed them, but there are leftovers,

can someone tell me if its safe to delete them?

@pkalemba BTW,do you use internal or external database?

@chlins
Copy link
Member

chlins commented Dec 23, 2024

@pkalemba Hello, could you assist me in performing the following actions to verify if this issue is the same as the one reported at #20711?

  1. Count the artifact and artifact_trash to see the total records number in the 2 tables.
  2. Explain analyze the SQL in your database if possible and paste the outputs.
  3. Choose one or several records in the artifact_trash and confirm they are retained in the artifact_blob, but missing in the blob, project_blob, and their manifest should be retained in the distribution(s3 or your storage) as well.

@pkalemba
Copy link
Author

@chlins sure I will share all info I can/have
I will do it on Friday, coz I'm already on holidays break

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: No status
Development

No branches or pull requests

3 participants