Creating Loading Jobs

Loading data is a two-step process in GSQL:

Define the job with CREATE LOADING JOB.
Execute the job with RUN LOADING JOB.

1. Job Structure

A loading job consists of DEFINE statements for variables and LOAD statements for data mapping.


gsqlterminal
CREATE LOADING JOB my_job FOR GRAPH MyGraph {
   DEFINE FILENAME f = "/path/to/data.csv";
   
   LOAD f TO VERTEX Person VALUES ($0, $1, $2)
          USING SEPARATOR=",", HEADER="true";
}

2. The `LOAD` Statement

The LOAD statement describes how to parse input tokens ($0, $1, etc.) into vertex and edge attributes.

Destination Clauses

Vertex: TO VERTEX <type> VALUES (id, attr1, attr2...)
Edge: TO EDGE <type> VALUES (from_id, to_id, attr1...)
Temp Table: TO TEMP_TABLE <table> (cols...) VALUES (exprs...)

Filtering with `WHERE`

You can skip input lines based on conditions:


gsqlterminal
LOAD f TO VERTEX Person VALUES ($0, $1)
     WHERE to_int($2) > 18;

3. Configuration Options (`USING`)

Common parameters used in the USING clause:

SEPARATOR: The column delimiter (default: ,).
HEADER: Set to "true" if the first line is a header.
QUOTE: Define boundary markers ("double" or "single").
JSON_FILE: Set to "true" for JSON Lines loading.
VERTEX_MUST_EXIST: If true, won't create an edge if endpoints are missing.

4. Token & Reducer Functions

GSQL provides built-in functions to transform data during the load.

Token Functions (Transformation)

gsql_trim($0): Removes whitespace.
gsql_concat($0, "-", $1): Joins multiple tokens.
to_datetime($0): Converts strings to DATETIME objects.

Reducer Functions (Aggregation)

Use REDUCE to aggregate multiple input rows into a single attribute value.

REDUCE(add($1)): Sums values.
REDUCE(max($1)): Keeps the maximum value.
REDUCE(overwrite($1)): Always takes the latest value.

5. Advanced: `TEMP_TABLE` and `flatten`

Use a TEMP_TABLE to handle multi-valued fields or nested JSON arrays.


gsqlterminal
# Example: Splitting a comma-separated list into multiple edges
CREATE LOADING JOB flatten_job FOR GRAPH MyGraph {
  LOAD "data.csv" TO TEMP_TABLE t1(id, genre) 
       VALUES ($0, flatten($1, ",", 1));
  
  LOAD TEMP_TABLE t1 TO EDGE Has_Genre VALUES ($"id", $"genre");
}

[!NOTE] flatten splits a single field into multiple rows, which can then be processed as independent vertices or edges.

BOOK

Creating Loading Jobs

1. Job Structure

2. The `LOAD` Statement

Destination Clauses

Filtering with `WHERE`

3. Configuration Options (`USING`)

4. Token & Reducer Functions

Token Functions (Transformation)

Reducer Functions (Aggregation)

5. Advanced: `TEMP_TABLE` and `flatten`

On this page

Creating Loading Jobs

1. Job Structure

2. The LOAD Statement

Destination Clauses

Filtering with WHERE

3. Configuration Options (USING)

4. Token & Reducer Functions

Token Functions (Transformation)

Reducer Functions (Aggregation)

5. Advanced: TEMP_TABLE and flatten

On this page

2. The `LOAD` Statement

Filtering with `WHERE`

3. Configuration Options (`USING`)

5. Advanced: `TEMP_TABLE` and `flatten`