Issue with Metadata Reconciliation Between Iceberg and Delta Tables During Snapshot Updates #586
Open
2 of 4 tasks
Labels
bug
Something isn't working
Search before asking
Please describe the bug 🐞
Description
I am encountering an issue while using xtable to perform updates from Iceberg to Delta tables. Here is the observed behavior:
Snapshot 0: The metadata between the Iceberg and Delta tables reconcile as expected.
Snapshot 1: Erroneous metadata is generated that includes "add" and "remove" actions that did not actually occur. This results in a lowered row count in the Delta metadata compared to the source Iceberg table.
Snapshot 2: The metadata appears to reconcile again and reflects the updated values accurately.
Snapshot 3: The issue is recreated with similar discrepancies in the metadata.
Additional Context:
This behavior has been observed consistently to occur in 5 instances across a sample of 30 tables. The issue is in the largest of these tables with around 7 million files and 7.3 trillion records. This table object is "append-only", the disappeared or removed files on snapshot 1 are re-added in snapshot 2. The issue seems cyclical, occurring every alternate snapshot. The only error/info found in the logs is: "incremental sync is not safe from instant falling back to snapshot sync" and "truncated the string representation of a plan since it was too large"
Steps to Reproduce:
Use xtable to perform updates from Iceberg to Delta tables.
Observe metadata reconciliation across snapshots.
Expected Behavior:
The metadata between the Iceberg and Delta tables should reconcile accurately across all snapshots, without erroneous "add" or "remove" actions.
Actual Behavior:
Alternate snapshots (e.g., snapshots 1 and 3) generate erroneous metadata with inaccurate "add" and "remove" actions, leading to a mismatch in row counts.
Environment
Tool: xtable
Source: Apache Iceberg
Destination: Delta Lake
Additional Notes
The issue might be related to how snapshots are processed or metadata is generated.
Are you willing to submit PR?
Code of Conduct
The text was updated successfully, but these errors were encountered: