Diagnosing service failures, performance bottlenecks, and log analysis.
Troubleshooting
When issues arise, follow a systematic approach to identify and resolve the bottleneck.
1. General Health Check
Always start with these baseline commands to rule out resource exhaustion:
bashterminalgadmin status # Ensure all services are UP df -lh # Check for disk space exhaustion free -g # Check for memory pressure dmesg -T | tail # Look for OOM (Out of Memory) kills
2. Navigating Log Files
Each service writes detailed logs to the directory defined by System.LogRoot.
| Service | Primary Log File | Use Case |
|---|---|---|
| GPE | gpe/log.INFO | Query execution, memory usage, graph errors. |
| RESTPP | restpp/log.INFO | REST API requests, input validation, loading jobs. |
| GSQL | gsql_server_log/GSQL_LOG | Query compilation and installation errors. |
| NGINX | nginx/nginx.access.log | Connectivity and authentication issues. |
3. Common Issues
Slow Query Performance
- Huge JSON Response: If GPE CPU is high but execution is finished, your result size may be too large.
- Insufficient Memory: Check if the system is swapping to disk.
- Logic Bottlenecks: Verify if your GSQL is traversing more edges than necessary.
Services "Not Ready"
If a service like GSE is stuck in not_ready, it is usually "warming up" (loading data from disk to RAM). Check CPU usage to confirm activity.
Cluster Out of Sync
TigerGraph requires clocks across all nodes to be synchronized within 2 seconds. If they drift, schema changes and loading jobs will fail.
bashterminalgrun all "date" # Compare time across nodes
4. Collecting Support Data
If you need to contact TigerGraph support, use the gcollect tool to gather all relevant logs and configs into a single bundle:
bashterminalgcollect collect
[!IMPORTANT] Query Abortion: If queries are being aborted unexpectedly, check the GPE log for the message
System Memory in Critical state. This indicates the system is protecting itself from an OOM crash.