resources/ Best Practices
Last Updated: October 20, 2018

Optimization strategies for performance and scalability.

Best Practices

To get the most out of your TigerGraph cluster, follow these architectural and development best practices.

1. Schema Design

  • Primary IDs: Use strings for external IDs. TigerGraph handles the internal mapping.
  • Attributes vs. Edges: If you need to filter on a value frequently, consider making it an edge to another vertex rather than just an attribute (for faster traversal).
  • Data Types: Use the most compact data type possible (e.g., INT instead of STRING for numeric IDs).

2. GSQL Optimization

  • Minimize Shuffle: Try to write queries that can be executed locally on each node before aggregating (Map-Reduce style).
  • Limit Result Sets: Always use LIMIT in your SELECT statements during exploration to avoid huge network payloads.
  • Accumulator Choice: Use SumAccum for counters and MaxAccum for finding best values. Avoid ListAccum on massive result sets as it can be memory-intensive.

3. Cluster Scaling

  • Vertical Scaling: Increasing RAM and CPU on existing nodes. Best for handling complex queries on the same dataset.
  • Horizontal Scaling: Adding more nodes. Best for increasing total storage capacity and handling more concurrent users (QPS).

4. Loading Performance

  • Batch Size: Optimize your loading job batch sizes. Too small increases overhead; too large can cause memory spikes.
  • Parallel Loading: Use multiple files and multiple loaders to saturate the system's I/O capacity.

[!IMPORTANT] Memory Allocation: Always reserve at least 20-30% of system RAM for the operating system and background processes to prevent GPE from crashing under high load.