Pipeline Anatomy

⚠️

These docs are still a work in progress! Contact us if you find anything missing or out of place

Rigor aims to keep data pipelines as simple as practically possible. This means the number of moving parts for each pipeline is kept to a minimum - this also means a lot of configuration and setup is streamlined.

Pipelines are composed of a language specific file (.py) and a configuration file (.json). Your pipeline code does not necessarily need to be contained within the one file, however two functions need to be defined in the file: the runner function and if necessary, a loader function.

runner.py

def loader():
    # do something
    return {"runner_inputs": []}
 
def runner(args):
    # do something
    return

The config file is used to both identify the pipeline in the build process, but also to define parameters under which the task will run. These a typically named the same as the primary pipeline file but with a .config.json extension.

runner.config.json

{
"name": "Example to Bucket",
"runFile" "runner.py",
"timeout": 600,
"cpus": 1,
"memory": 2,
"retries": 3,
"loader": false
}

Inside the config file you will find parameters that map to both metadata of the pipeline but also parameters that define how the pipeline will run.

Reference Defining Config