Graph Data Science (GDS)

The pyTigerGraph.gds submodule is a powerful abstraction layer designed for machine learning. It streamlines the process of extracting graph features and feeding them into ML frameworks like PyTorch and DGL.

1. The Featurizer

The Featurizer allows you to run GDS Library algorithms and store the results as vertex attributes for training.


pythonterminal
from pytigergraph.gds import Featurizer

f = Featurizer(conn)
# Run PageRank and save to attribute "pagerank_score"
f.runAlgorithm("tg_pagerank", params={"v_type": "User", "attr": "pagerank_score"})

2. Data Loaders

GDS Data Loaders stream data from TigerGraph into Python in batches, which is essential for training on graphs that are too large to fit in memory.

NeighborLoader: Samples neighborhoods around target nodes (ideal for GNNs).
EdgeLoader: Streams edges for link prediction tasks.
VertexLoader: Streams vertices for classification/regression.

3. Vertex Splitting

Prepare your data for supervised learning by splitting vertices into training, validation, and testing sets.


pythonterminal
from pytigergraph.gds import Splitter

s = Splitter(conn)
s.vertex_split(v_types=["User"], train_size=0.8, val_size=0.1, test_size=0.1)

4. ML Framework Integration

pyTigerGraph GDS provides native support for:

PyTorch Geometric (PyG): Direct ingestion into Data or HeteroData objects.
Deep Graph Library (DGL): Optimized streaming for DGL graph objects.

[!TIP] The GDS layer handles the complex task of mapping TigerGraph's internal IDs to the contiguous integer IDs required by most ML libraries automatically.

BOOK

Graph Data Science (GDS)

1. The Featurizer

2. Data Loaders

3. Vertex Splitting

4. ML Framework Integration

On this page