gsql/ Creating Loading Jobs
Last Updated: October 20, 2018

Designing ETL pipelines to map, transform, and load external data into TigerGraph.

Creating Loading Jobs

Loading data is a two-step process in GSQL:

  1. Define the job with CREATE LOADING JOB.
  2. Execute the job with RUN LOADING JOB.

1. Job Structure

A loading job consists of DEFINE statements for variables and LOAD statements for data mapping.

gsqlterminal
CREATE LOADING JOB my_job FOR GRAPH MyGraph { DEFINE FILENAME f = "/path/to/data.csv"; LOAD f TO VERTEX Person VALUES ($0, $1, $2) USING SEPARATOR=",", HEADER="true"; }

2. The LOAD Statement

The LOAD statement describes how to parse input tokens ($0, $1, etc.) into vertex and edge attributes.

Destination Clauses

  • Vertex: TO VERTEX <type> VALUES (id, attr1, attr2...)
  • Edge: TO EDGE <type> VALUES (from_id, to_id, attr1...)
  • Temp Table: TO TEMP_TABLE <table> (cols...) VALUES (exprs...)

Filtering with WHERE

You can skip input lines based on conditions:

gsqlterminal
LOAD f TO VERTEX Person VALUES ($0, $1) WHERE to_int($2) > 18;

3. Configuration Options (USING)

Common parameters used in the USING clause:

  • SEPARATOR: The column delimiter (default: ,).
  • HEADER: Set to "true" if the first line is a header.
  • QUOTE: Define boundary markers ("double" or "single").
  • JSON_FILE: Set to "true" for JSON Lines loading.
  • VERTEX_MUST_EXIST: If true, won't create an edge if endpoints are missing.

4. Token & Reducer Functions

GSQL provides built-in functions to transform data during the load.

Token Functions (Transformation)

  • gsql_trim($0): Removes whitespace.
  • gsql_concat($0, "-", $1): Joins multiple tokens.
  • to_datetime($0): Converts strings to DATETIME objects.

Reducer Functions (Aggregation)

Use REDUCE to aggregate multiple input rows into a single attribute value.

  • REDUCE(add($1)): Sums values.
  • REDUCE(max($1)): Keeps the maximum value.
  • REDUCE(overwrite($1)): Always takes the latest value.

5. Advanced: TEMP_TABLE and flatten

Use a TEMP_TABLE to handle multi-valued fields or nested JSON arrays.

gsqlterminal
# Example: Splitting a comma-separated list into multiple edges CREATE LOADING JOB flatten_job FOR GRAPH MyGraph { LOAD "data.csv" TO TEMP_TABLE t1(id, genre) VALUES ($0, flatten($1, ",", 1)); LOAD TEMP_TABLE t1 TO EDGE Has_Genre VALUES ($"id", $"genre"); }

[!NOTE] flatten splits a single field into multiple rows, which can then be processed as independent vertices or edges.