Pipeline Anatomy
These docs are still a work in progress! Contact us if you find anything missing or out of place
Rigor aims to keep data pipelines as simple as practically possible. This means the number of moving parts for each pipeline is kept to a minimum - this also means a lot of configuration and setup is streamlined.
Pipelines are composed of a language specific file (.py) and a configuration file (.json).
Your pipeline code does not necessarily need to be contained within the
one file, however two functions need to be defined in the file: the runner
function and if necessary, a loader
function.
def loader():
# do something
return {"runner_inputs": []}
def runner(args):
# do something
return
The config file is used to both identify the pipeline in the build process, but also to define parameters under which the task will run. These a typically named the same as the primary pipeline file but with a .config.json extension.
{
"name": "Example to Bucket",
"runFile" "runner.py",
"timeout": 600,
"cpus": 1,
"memory": 2,
"retries": 3,
"loader": false
}
Inside the config file you will find parameters that map to both metadata of the pipeline but also parameters that define how the pipeline will run.