Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with Metadata Reconciliation Between Iceberg and Delta Tables During Snapshot Updates #586

Open
2 of 4 tasks
MrDerecho opened this issue Dec 3, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@MrDerecho
Copy link

MrDerecho commented Dec 3, 2024

Search before asking

  • I had searched in the issues and found no similar issues.

Please describe the bug 🐞

Description
I am encountering an issue while using xtable to perform updates from Iceberg to Delta tables. Here is the observed behavior:

Snapshot 0: The metadata between the Iceberg and Delta tables reconcile as expected.
Snapshot 1: Erroneous metadata is generated that includes "add" and "remove" actions that did not actually occur. This results in a lowered row count in the Delta metadata compared to the source Iceberg table.
Snapshot 2: The metadata appears to reconcile again and reflects the updated values accurately.
Snapshot 3: The issue is recreated with similar discrepancies in the metadata.
Additional Context:
This behavior has been observed consistently to occur in 5 instances across a sample of 30 tables. The issue is in the largest of these tables with around 7 million files and 7.3 trillion records. This table object is "append-only", the disappeared or removed files on snapshot 1 are re-added in snapshot 2. The issue seems cyclical, occurring every alternate snapshot. The only error/info found in the logs is: "incremental sync is not safe from instant falling back to snapshot sync" and "truncated the string representation of a plan since it was too large"
Steps to Reproduce:
Use xtable to perform updates from Iceberg to Delta tables.
Observe metadata reconciliation across snapshots.
Expected Behavior:
The metadata between the Iceberg and Delta tables should reconcile accurately across all snapshots, without erroneous "add" or "remove" actions.

Actual Behavior:
Alternate snapshots (e.g., snapshots 1 and 3) generate erroneous metadata with inaccurate "add" and "remove" actions, leading to a mismatch in row counts.

Environment
Tool: xtable
Source: Apache Iceberg
Destination: Delta Lake
Additional Notes
The issue might be related to how snapshots are processed or metadata is generated.

Are you willing to submit PR?

  • I am willing to submit a PR!
  • I am willing to submit a PR but need help getting started!

Code of Conduct

@MrDerecho MrDerecho added the bug Something isn't working label Dec 3, 2024
@vinishjail97
Copy link
Contributor

Hi @MrDerecho thanks for reporting the issue, is it possible to share the iceberg and delta metadata folders with snapshot 1, 2, 3 ? It will help in reproducing it through a unit test.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants