graph-ml/ ML Workbench & GNNs
Last Updated: October 20, 2018

Training Graph Neural Networks (GNNs) using TigerGraph data.

ML Workbench & GNNs

TigerGraph provides specialized tools for data scientists to train machine learning models, specifically Graph Neural Networks (GNNs), directly on production-scale graph data.

1. What is ML Workbench?

The ML Workbench is a JupyterLab-based development environment that comes pre-installed with the necessary libraries for graph-based machine learning:

  • PyTorch Geometric (PyG)
  • Deep Graph Library (DGL)
  • pyTigerGraph (GDS Layer)

2. High-Level Architecture

The workbench acts as an orchestration layer between your Python environment and the TigerGraph database:

  1. Data Sampling: Efficiently pull subgraphs or neighborhoods from TigerGraph.
  2. Feature Extraction: Use GSQL algorithms (like PageRank) to generate features for your ML model.
  3. Model Training: Train models in PyTorch or DGL using the extracted graph features.
  4. Inference: Write the predicted labels or embeddings back to the graph.

3. Graph Neural Networks (GNNs)

Unlike traditional ML, GNNs capture the topology of the data. TigerGraph supports:

  • Graph Convolutional Networks (GCN)
  • GraphSAGE
  • Graph Attention Networks (GAT)

4. The Transition to pyTigerGraph GDS

[!IMPORTANT] The standalone ML Workbench is being phased out in favor of the pyTigerGraph GDS (Graph Data Science) library.

All the core functionality—data loaders, neighborhood sampling, and vertex splitting—is now available directly via pip install pytigergraph. This allows you to use your own Jupyter environment or cloud platforms like SageMaker and Vertex AI.

5. Key ML Utilities

  • Data Loaders: Stream large graph data into PyG/DGL without loading everything into RAM.
  • Vertex Splitting: Automatically split your graph into training, validation, and test sets.
  • Neighborhood Sampling: Efficiently sample neighbors to train on massive graphs.

[!TIP] For new projects, start directly with the pyTigerGraph GDS library to ensure long-term compatibility and flexibility with modern AI stacks.