resources/ Best Practices
Last Updated: October 20, 2018Optimization strategies for performance and scalability.
Best Practices
To get the most out of your TigerGraph cluster, follow these architectural and development best practices.
1. Schema Design
- Primary IDs: Use strings for external IDs. TigerGraph handles the internal mapping.
- Attributes vs. Edges: If you need to filter on a value frequently, consider making it an edge to another vertex rather than just an attribute (for faster traversal).
- Data Types: Use the most compact data type possible (e.g.,
INTinstead ofSTRINGfor numeric IDs).
2. GSQL Optimization
- Minimize Shuffle: Try to write queries that can be executed locally on each node before aggregating (Map-Reduce style).
- Limit Result Sets: Always use
LIMITin your SELECT statements during exploration to avoid huge network payloads. - Accumulator Choice: Use
SumAccumfor counters andMaxAccumfor finding best values. AvoidListAccumon massive result sets as it can be memory-intensive.
3. Cluster Scaling
- Vertical Scaling: Increasing RAM and CPU on existing nodes. Best for handling complex queries on the same dataset.
- Horizontal Scaling: Adding more nodes. Best for increasing total storage capacity and handling more concurrent users (QPS).
4. Loading Performance
- Batch Size: Optimize your loading job batch sizes. Too small increases overhead; too large can cause memory spikes.
- Parallel Loading: Use multiple files and multiple loaders to saturate the system's I/O capacity.
[!IMPORTANT] Memory Allocation: Always reserve at least 20-30% of system RAM for the operating system and background processes to prevent GPE from crashing under high load.
On this page
TigerGraph Book
v1.0 Curated