Training ML models and featurizing data using the pyTigerGraph GDS layer.
Graph Data Science (GDS)
The pyTigerGraph.gds submodule is a powerful abstraction layer designed for machine learning. It streamlines the process of extracting graph features and feeding them into ML frameworks like PyTorch and DGL.
1. The Featurizer
The Featurizer allows you to run GDS Library algorithms and store the results as vertex attributes for training.
pythonterminalfrom pytigergraph.gds import Featurizer f = Featurizer(conn) # Run PageRank and save to attribute "pagerank_score" f.runAlgorithm("tg_pagerank", params={"v_type": "User", "attr": "pagerank_score"})
2. Data Loaders
GDS Data Loaders stream data from TigerGraph into Python in batches, which is essential for training on graphs that are too large to fit in memory.
- NeighborLoader: Samples neighborhoods around target nodes (ideal for GNNs).
- EdgeLoader: Streams edges for link prediction tasks.
- VertexLoader: Streams vertices for classification/regression.
3. Vertex Splitting
Prepare your data for supervised learning by splitting vertices into training, validation, and testing sets.
pythonterminalfrom pytigergraph.gds import Splitter s = Splitter(conn) s.vertex_split(v_types=["User"], train_size=0.8, val_size=0.1, test_size=0.1)
4. ML Framework Integration
pyTigerGraph GDS provides native support for:
- PyTorch Geometric (PyG): Direct ingestion into
DataorHeteroDataobjects. - Deep Graph Library (DGL): Optimized streaming for DGL graph objects.
[!TIP] The GDS layer handles the complex task of mapping TigerGraph's internal IDs to the contiguous integer IDs required by most ML libraries automatically.
On this page
TigerGraph Book
v1.0 Curated