server/ Troubleshooting
Last Updated: October 20, 2018

Diagnosing service failures, performance bottlenecks, and log analysis.

Troubleshooting

When issues arise, follow a systematic approach to identify and resolve the bottleneck.

1. General Health Check

Always start with these baseline commands to rule out resource exhaustion:

bashterminal
gadmin status # Ensure all services are UP df -lh # Check for disk space exhaustion free -g # Check for memory pressure dmesg -T | tail # Look for OOM (Out of Memory) kills

2. Navigating Log Files

Each service writes detailed logs to the directory defined by System.LogRoot.

ServicePrimary Log FileUse Case
GPEgpe/log.INFOQuery execution, memory usage, graph errors.
RESTPPrestpp/log.INFOREST API requests, input validation, loading jobs.
GSQLgsql_server_log/GSQL_LOGQuery compilation and installation errors.
NGINXnginx/nginx.access.logConnectivity and authentication issues.

3. Common Issues

Slow Query Performance

  • Huge JSON Response: If GPE CPU is high but execution is finished, your result size may be too large.
  • Insufficient Memory: Check if the system is swapping to disk.
  • Logic Bottlenecks: Verify if your GSQL is traversing more edges than necessary.

Services "Not Ready"

If a service like GSE is stuck in not_ready, it is usually "warming up" (loading data from disk to RAM). Check CPU usage to confirm activity.

Cluster Out of Sync

TigerGraph requires clocks across all nodes to be synchronized within 2 seconds. If they drift, schema changes and loading jobs will fail.

bashterminal
grun all "date" # Compare time across nodes

4. Collecting Support Data

If you need to contact TigerGraph support, use the gcollect tool to gather all relevant logs and configs into a single bundle:

bashterminal
gcollect collect

[!IMPORTANT] Query Abortion: If queries are being aborted unexpectedly, check the GPE log for the message System Memory in Critical state. This indicates the system is protecting itself from an OOM crash.