server/ Change Data Capture (CDC)
Last Updated: October 20, 2018Automatically streaming database changes to external systems like Kafka.
Change Data Capture (CDC)
TigerGraph's Change Data Capture (CDC) service allows you to automatically capture and stream data changes (inserts, updates, deletes) to external Kafka topics in real-time.
1. Key Capabilities
- Automatic Streaming: Every committed change to the graph is serialized as a JSON message.
- Resumable: If external Kafka goes down, TigerGraph pauses and resumes from the last successful point upon recovery.
- Ordered: Changes are sequenced to allow for perfect reproduction of the data stream for debugging.
2. Message Format
CDC messages are delivered in a consistent JSON structure:
- Operator:
insert,delete, orupdate. - Content: The attribute values for the vertex or edge.
- Metadata: Timestamps and unique message IDs (
mid).
3. High Availability (HA)
As of version 4.1.0, the CDC service is HA-aware.
- The CDC manager runs on the GPE Leader of each partition.
- If a leader fails, the service migrates to a new live node automatically.
- Deduplication: Use the
midfield in Kafka consumers to handle potential duplicate messages produced during a leader switch.
4. Reset Scenarios
The CDC service is reset (skipping historical data) during operations that reset the Graph Processing Engine (GPE):
gadmin backupandgadmin restore.- Cluster Expansion or Shrink.
- GSQL
DROP ALLorCLEAR GRAPH STORE.
[!CAUTION] Implicit Deletions: TigerGraph CDC does not currently generate messages for "implicit" edge deletions (e.g., when a vertex is deleted, its connected edges are gone, but no individual edge-delete message is sent).
On this page
TigerGraph Book
v1.0 Curated