server/ Change Data Capture (CDC)
Last Updated: October 20, 2018

Automatically streaming database changes to external systems like Kafka.

Change Data Capture (CDC)

TigerGraph's Change Data Capture (CDC) service allows you to automatically capture and stream data changes (inserts, updates, deletes) to external Kafka topics in real-time.

1. Key Capabilities

  • Automatic Streaming: Every committed change to the graph is serialized as a JSON message.
  • Resumable: If external Kafka goes down, TigerGraph pauses and resumes from the last successful point upon recovery.
  • Ordered: Changes are sequenced to allow for perfect reproduction of the data stream for debugging.

2. Message Format

CDC messages are delivered in a consistent JSON structure:

  • Operator: insert, delete, or update.
  • Content: The attribute values for the vertex or edge.
  • Metadata: Timestamps and unique message IDs (mid).

3. High Availability (HA)

As of version 4.1.0, the CDC service is HA-aware.

  • The CDC manager runs on the GPE Leader of each partition.
  • If a leader fails, the service migrates to a new live node automatically.
  • Deduplication: Use the mid field in Kafka consumers to handle potential duplicate messages produced during a leader switch.

4. Reset Scenarios

The CDC service is reset (skipping historical data) during operations that reset the Graph Processing Engine (GPE):

  • gadmin backup and gadmin restore.
  • Cluster Expansion or Shrink.
  • GSQL DROP ALL or CLEAR GRAPH STORE.

[!CAUTION] Implicit Deletions: TigerGraph CDC does not currently generate messages for "implicit" edge deletions (e.g., when a vertex is deleted, its connected edges are gone, but no individual edge-delete message is sent).