Designing ETL pipelines to map, transform, and load external data into TigerGraph.
Creating Loading Jobs
Loading data is a two-step process in GSQL:
- Define the job with
CREATE LOADING JOB. - Execute the job with
RUN LOADING JOB.
1. Job Structure
A loading job consists of DEFINE statements for variables and LOAD statements for data mapping.
gsqlterminalCREATE LOADING JOB my_job FOR GRAPH MyGraph { DEFINE FILENAME f = "/path/to/data.csv"; LOAD f TO VERTEX Person VALUES ($0, $1, $2) USING SEPARATOR=",", HEADER="true"; }
2. The LOAD Statement
The LOAD statement describes how to parse input tokens ($0, $1, etc.) into vertex and edge attributes.
Destination Clauses
- Vertex:
TO VERTEX <type> VALUES (id, attr1, attr2...) - Edge:
TO EDGE <type> VALUES (from_id, to_id, attr1...) - Temp Table:
TO TEMP_TABLE <table> (cols...) VALUES (exprs...)
Filtering with WHERE
You can skip input lines based on conditions:
gsqlterminalLOAD f TO VERTEX Person VALUES ($0, $1) WHERE to_int($2) > 18;
3. Configuration Options (USING)
Common parameters used in the USING clause:
SEPARATOR: The column delimiter (default:,).HEADER: Set to"true"if the first line is a header.QUOTE: Define boundary markers ("double"or"single").JSON_FILE: Set to"true"for JSON Lines loading.VERTEX_MUST_EXIST: If true, won't create an edge if endpoints are missing.
4. Token & Reducer Functions
GSQL provides built-in functions to transform data during the load.
Token Functions (Transformation)
gsql_trim($0): Removes whitespace.gsql_concat($0, "-", $1): Joins multiple tokens.to_datetime($0): Converts strings toDATETIMEobjects.
Reducer Functions (Aggregation)
Use REDUCE to aggregate multiple input rows into a single attribute value.
REDUCE(add($1)): Sums values.REDUCE(max($1)): Keeps the maximum value.REDUCE(overwrite($1)): Always takes the latest value.
5. Advanced: TEMP_TABLE and flatten
Use a TEMP_TABLE to handle multi-valued fields or nested JSON arrays.
gsqlterminal# Example: Splitting a comma-separated list into multiple edges CREATE LOADING JOB flatten_job FOR GRAPH MyGraph { LOAD "data.csv" TO TEMP_TABLE t1(id, genre) VALUES ($0, flatten($1, ",", 1)); LOAD TEMP_TABLE t1 TO EDGE Has_Genre VALUES ($"id", $"genre"); }
[!NOTE]
flattensplits a single field into multiple rows, which can then be processed as independent vertices or edges.
On this page
- Creating Loading Jobs
- 1. Job Structure
- 2. The `LOAD` Statement
- Destination Clauses
- Filtering with `WHERE`
- 3. Configuration Options (`USING`)
- 4. Token & Reducer Functions
- Token Functions (Transformation)
- Reducer Functions (Aggregation)
- 5. Advanced: `TEMP_TABLE` and `flatten`
- Example: Splitting a comma-separated list into multiple edges
TigerGraph Book
v1.0 Curated