Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[docs] CDC: Documentation updates for CDC #25372

Open
wants to merge 11 commits into
base: master
Choose a base branch
from
Open
Original file line number Diff line number Diff line change
Expand Up @@ -112,3 +112,5 @@ For reference documentation, see [YugabyteDB Connector](./yugabytedb-connector/)
- Support for transaction savepoints is tracked in issue [10936](https://github.com/yugabyte/yugabyte-db/issues/10936).

- Support for enabling CDC on Read Replicas is tracked in issue [11116](https://github.com/yugabyte/yugabyte-db/issues/11116).

- Support for tablet splitting with logical replication is disabled from YB version 2024.1.4 & 2024.2.1. It is tracked in issue [24918](https://github.com/yugabyte/yugabyte-db/issues/24918).
Sumukh-Phalgaonkar marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
Expand Up @@ -80,6 +80,12 @@ Provide information about CDC service in YugabyteDB.
| cdcsdk_sent_lag_micros | `long` | The LAG metric is calculated by subtracting the timestamp of the latest record in the WAL of a tablet from the last record sent to the CDC connector. |
| cdcsdk_expiry_time_ms | `long` | The time left to read records from WAL is tracked by the Stream Expiry Time (ms). |

{{< note title="Note" >}}

The CDC service metrics are calculated for every tablet that is of interest for a replication slot. In the scenario where user is not interested in polling all the tables (and consequently all the tablets) in database, the metrics will be calculated considering the unpolled tablets until [cdcsdk_tablet_not_of_interest_timeout_secs](../../../reference/configuration/yb-tserver.md#cdcsdk-tablet-not-of-interest-timeout-secs) interval.

{{< /note >}}

## Connector metrics

<!-- TODO (Siddharth): Fix link to connector metrics section -->
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -352,7 +352,7 @@ There are 4 possible values for `REPLICA IDENTITY`:

{{< note title="Note">}}

YugabyteDB supports the replica identity CHANGE only with the plugin `yboutput`.
The plugin `pgoutput` does not support replica identity CHANGE.

{{< /note >}}

Expand Down Expand Up @@ -1095,6 +1095,10 @@ YugabyteDB connector events are designed to work with [Kafka log compaction](htt

When a row is deleted, the _delete_ event value still works with log compaction, because Kafka can remove all earlier messages that have that same key. However, for Kafka to remove all messages that have that same key, the message value must be `null`. To make this possible, the YugabyteDB connector follows a _delete_ event with a special tombstone event that has the same key but a `null` value.

## Updating or deleting a row which was inserted in the same transaction
Sumukh-Phalgaonkar marked this conversation as resolved.
Show resolved Hide resolved

If a row is updated or deleted in the same transaction in which it was inserted, CDC cannot retrieve the before-image values for the UPDATE / DELETE event. If the replica identity is not `CHANGE` then CDC will throw an error while processing such events. To handle such updates/deletes with a non-CHANGE replica identity, set the tserver flag [cdc_send_null_before_image_if_not_exists](../../../reference/configuration/yb-tserver.md#cdc-send-null-before-image-if-not-exists) to true. With this flag enabled, CDC will send a null before-image instead of failing with an error.

<!-- YB Note skipping content for truncate and message events -->

## Data type mappings
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,7 @@ For reference documentation, see [YugabyteDB gRPC Connector](./debezium-connecto

* A single stream can only be used to stream data from one namespace only.
* There should be a primary key on the table you want to stream the changes from.
* CDC is not supported on a target table for xCluster replication [11829](https://github.com/yugabyte/yugabyte-db/issues/11829).
* CDC is not supported on tables (both source and target) for xCluster replication [25371](https://github.com/yugabyte/yugabyte-db/issues/25371) and [15534](https://github.com/yugabyte/yugabyte-db/issues/15534).
* Currently, CDC doesn't support schema evolution for changes that require table rewrites (for example, [ALTER TYPE](../../../api/ysql/the-sql-language/statements/ddl_alter_table/#alter-type-with-table-rewrite)), or DROP TABLE and TRUNCATE TABLE operations.
* YCQL tables aren't currently supported. Issue [11320](https://github.com/yugabyte/yugabyte-db/issues/11320).
* [Composite types](../../ysql-language-features/data-types#composite-types) are currently not supported. Issue [25221](https://github.com/yugabyte/yugabyte-db/issues/25221).
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -433,6 +433,10 @@ For record type `MODIFIED_COLUMNS_OLD_AND_NEW_IMAGES`, the update and delete rec

</td> </tr> </table>

### Updating or deleting a row which was inserted in the same transaction

If a row is updated or deleted in the same transaction in which it was inserted, CDC cannot retrieve the before-image values for the UPDATE / DELETE event. If the before image record type is not `CHANGE` then CDC will throw an error while processing such events. To handle such updates/deletes with a non-CHANGE before image record type, set the tserver flag [cdc_send_null_before_image_if_not_exists](../../../reference/configuration/yb-tserver.md#cdc-send-null-before-image-if-not-exists) to true. With this flag enabled, CDC will send a null before-image instead of failing with an error.

## Schema evolution

Table schema is needed for decoding and processing the changes and populating CDC records. Thus, older schemas are retained if CDC streams are lagging. Also, older schemas that are not needed for any of the existing active CDC streams are garbage collected. In addition, if before image is enabled, the schema needed for populating before image is also retained. The YugabyteDB source connector caches schema at the tablet level. This means that for every tablet the connector has a copy of the current schema for the tablet it is polling the changes for. As soon as a DDL command is executed on the source table, the CDC service emits a record with the new schema for all the tablets. The YugabyteDB source connector then reads those records and modifies its cached schema gracefully.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -81,6 +81,12 @@ Provide information about CDC service in YugabyteDB.
| cdcsdk_sent_lag_micros | `long` | The LAG metric is calculated by subtracting the timestamp of the latest record in the WAL of a tablet from the last record sent to the CDC connector. |
| cdcsdk_expiry_time_ms | `long` | The time left to read records from WAL is tracked by the Stream Expiry Time (ms). |

{{< note title="Note" >}}

The cdc service metrics are calculated for every tablet that is of interest for a particular stream. In the scenario where user is not interested in polling all the tables (and consequently all the tablets) in database, the metrics will be calculated considering the unpolled tablets until [cdcsdk_tablet_not_of_interest_timeout_secs](../../../reference/configuration/yb-tserver.md#cdcsdk-tablet-not-of-interest-timeout-secs) interval.

{{< /note >}}

### Snapshot metrics

The **MBean** is `debezium.yugabytedb:type=connector-metrics,server=<database.server.name>,task=<task.id>,context=snapshot`.
Expand Down
6 changes: 6 additions & 0 deletions docs/content/preview/reference/configuration/yb-master.md
Original file line number Diff line number Diff line change
Expand Up @@ -897,6 +897,12 @@ By default, TRUNCATE commands on tables with an active CDCSDK stream will fail.

Default: `false`

##### --enable_tablet_split_of_replication_slot_streamed_tables

Toggle automatic tablet splitting for tables under replication slot, enhancing user control over replication processes.

Default: `false`

## Metric export flags

YB-Master metrics are available in Prometheus format at `http://localhost:7000/prometheus-metrics`.
Expand Down
12 changes: 12 additions & 0 deletions docs/content/preview/reference/configuration/yb-tserver.md
Original file line number Diff line number Diff line change
Expand Up @@ -1374,6 +1374,18 @@ Determines the window in milliseconds in which if a client has consumed the chan

Default: `60000`

##### --cdc_send_null_before_image_if_not_exists

When this flag is set to true, CDC service will return a null before image if it is not able to find one.

Default: `false`

##### --cdcsdk_tablet_not_of_interest_timeout_secs

Timeout after which it is inferred that a particular tablet is not of interest for CDC. In order to indicate that a particular tablet is of interest for CDC, it should be polled at least once within this interval of stream / slot creation.

Default: `14400 (4 hours)`

## File expiration based on TTL flags

##### --tablet_enable_ttl_file_filter
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -80,6 +80,12 @@ Provide information about CDC service in YugabyteDB.
| cdcsdk_sent_lag_micros | `long` | The LAG metric is calculated by subtracting the timestamp of the latest record in the WAL of a tablet from the last record sent to the CDC connector. |
| cdcsdk_expiry_time_ms | `long` | The time left to read records from WAL is tracked by the Stream Expiry Time (ms). |

{{< note title="Note" >}}

The CDC service metrics are calculated for every tablet that is of interest for a replication slot. In the scenario where user is not interested in polling all the tables (and consequently all the tablets) in database, the metrics will be calculated considering the unpolled tablets until [cdcsdk_tablet_not_of_interest_timeout_secs](../../../reference/configuration/yb-tserver.md#cdcsdk_tablet_not_of_interest_timeout_secs) interval.

{{< /note >}}

## Connector metrics

<!-- TODO (Siddharth): Fix link to connector metrics section -->
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -350,7 +350,7 @@ There are 4 possible values for `REPLICA IDENTITY`:

{{< note title="Note">}}

YugabyteDB supports the replica identity CHANGE only with the plugin `yboutput`.
The plugin `pgoutput` does not support replica identity CHANGE.

{{< /note >}}

Expand Down Expand Up @@ -1093,6 +1093,10 @@ YugabyteDB connector events are designed to work with [Kafka log compaction](htt

When a row is deleted, the _delete_ event value still works with log compaction, because Kafka can remove all earlier messages that have that same key. However, for Kafka to remove all messages that have that same key, the message value must be `null`. To make this possible, the YugabyteDB connector follows a _delete_ event with a special tombstone event that has the same key but a `null` value.

## Updating or deleting a row which was inserted in the same transaction

If a row is updated or deleted in the same transaction in which it was inserted, CDC cannot retrieve the before-image values for the UPDATE / DELETE event. If the replica identity is not `CHANGE` then CDC will throw an error while processing such events. To handle such updates/deletes with a non-CHANGE replica identity, set the tserver flag [cdc_send_null_before_image_if_not_exists](../../../reference/configuration/yb-tserver.md#cdc-send-null-before-image-if-not-exists) to true. With this flag enabled, CDC will send a null before-image instead of failing with an error.

<!-- YB Note skipping content for truncate and message events -->

## Data type mappings
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@ For reference documentation, see [YugabyteDB gRPC Connector](./debezium-connecto

* A single stream can only be used to stream data from one namespace only.
* There should be a primary key on the table you want to stream the changes from.
* CDC is not supported on a target table for xCluster replication [11829](https://github.com/yugabyte/yugabyte-db/issues/11829).
* CDC is not supported on tables (both source and target) for xCluster replication [25371](https://github.com/yugabyte/yugabyte-db/issues/25371) and [15534](https://github.com/yugabyte/yugabyte-db/issues/15534).
* Currently, CDC doesn't support schema evolution for changes that require table rewrites (for example, [ALTER TYPE](../../../api/ysql/the-sql-language/statements/ddl_alter_table/#alter-type-with-table-rewrite)), or DROP TABLE and TRUNCATE TABLE operations.
* YCQL tables aren't currently supported. Issue [11320](https://github.com/yugabyte/yugabyte-db/issues/11320).

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -431,6 +431,10 @@ For record type `MODIFIED_COLUMNS_OLD_AND_NEW_IMAGES`, the update and delete rec

</td> </tr> </table>

### Updating or deleting a row which was inserted in the same transaction

If a row is updated or deleted in the same transaction in which it was inserted, CDC cannot retrieve the before-image values for the UPDATE / DELETE event. If the before image record type is not `CHANGE` then CDC will throw an error while processing such events. To handle such updates/deletes with a non-CHANGE before image record type, set the tserver flag [cdc_send_null_before_image_if_not_exists](../../../reference/configuration/yb-tserver.md#cdc-send-null-before-image-if-not-exists) to true. With this flag enabled, CDC will send a null before-image instead of failing with an error.

## Schema evolution

Table schema is needed for decoding and processing the changes and populating CDC records. Thus, older schemas are retained if CDC streams are lagging. Also, older schemas that are not needed for any of the existing active CDC streams are garbage collected. In addition, if before image is enabled, the schema needed for populating before image is also retained. The YugabyteDB source connector caches schema at the tablet level. This means that for every tablet the connector has a copy of the current schema for the tablet it is polling the changes for. As soon as a DDL command is executed on the source table, the CDC service emits a record with the new schema for all the tablets. The YugabyteDB source connector then reads those records and modifies its cached schema gracefully.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -79,6 +79,12 @@ Provide information about CDC service in YugabyteDB.
| cdcsdk_sent_lag_micros | `long` | The LAG metric is calculated by subtracting the timestamp of the latest record in the WAL of a tablet from the last record sent to the CDC connector. |
| cdcsdk_expiry_time_ms | `long` | The time left to read records from WAL is tracked by the Stream Expiry Time (ms). |

{{< note title="Note" >}}

The cdc service metrics are calculated for every tablet that is of interest for a particular stream. In the scenario where user is not interested in polling all the tables (and consequently all the tablets) in database, the metrics will be calculated considering the unpolled tablets until [cdcsdk_tablet_not_of_interest_timeout_secs](../../../reference/configuration/yb-tserver.md#cdcsdk-tablet-not-of-interest-timeout-secs) interval.

{{< /note >}}

### Snapshot metrics

The **MBean** is `debezium.yugabytedb:type=connector-metrics,server=<database.server.name>,task=<task.id>,context=snapshot`.
Expand Down
6 changes: 6 additions & 0 deletions docs/content/stable/reference/configuration/yb-master.md
Original file line number Diff line number Diff line change
Expand Up @@ -905,6 +905,12 @@ By default, TRUNCATE commands on tables with an active CDCSDK stream will fail.

Default: `false`

##### --enable_tablet_split_of_replication_slot_streamed_tables

Toggle automatic tablet splitting for tables under replication slot, enhancing user control over replication processes.

Default: `false`

## Metric export flags

YB-Master metrics are available in Prometheus format at `http://localhost:7000/prometheus-metrics`.
Expand Down
12 changes: 12 additions & 0 deletions docs/content/stable/reference/configuration/yb-tserver.md
Original file line number Diff line number Diff line change
Expand Up @@ -1382,6 +1382,18 @@ Determines the window in milliseconds in which if a client has consumed the chan

Default: `60000`

##### --cdc_send_null_before_image_if_not_exists

When this flag is set to true, CDC service will return a null before image if it is not able to find one.

Default: `false`

##### --cdcsdk_tablet_not_of_interest_timeout_secs

Timeout after which it is inferred that a particular tablet is not of interest for CDC. In order to indicate that a particular tablet is of interest for CDC, it should be polled at least once within this interval of stream / slot creation.

Default: `14400 (4 hours)`

## File expiration based on TTL flags

##### --tablet_enable_ttl_file_filter
Expand Down