Pipelines in Elyra can be run locally in JupyterLab, or remotely on Kubeflow Pipelines or Apache Airflow to take advantage of shared resources that speed up processing of compute intensive tasks.
Note: Support for Apache Airflow is experimental.
This document outlines how to set up a new Elyra-enabled Apache Airflow environment or add Elyra support to an existing deployment.
This guide assumes a general working knowledge of and administration of a Kubernetes cluster.
AND
OR
In order to use Apache Airflow with Elyra, it must be configured to use a Git repository to store DAGs.
main
) in your repository. This will be referenced later for storing the DAGs.Take note of the following information:
https://api.github.com
for github.com or https://gitlab.com
for gitlab.com)your-git-org/your-dag-repo
)main
)4d79206e616d6520697320426f6e642e204a616d657320426f6e64
)You need to provide this information in addition to your cloud object storage credentials when you create a runtime configuration in Elyra for the Apache Airflow deployment.
To deploy Apache Airflow on a new Kubernetes cluster:
airflow-secret
from three files. Replace the secret name, file names and locations as appropriate for your environment.
kubectl create secret generic airflow-secret --from-file=id_rsa=.ssh/id_rsa --from-file=known_hosts=.ssh/known_hosts --from-file=id_rsa.pub=.ssh/id_rsa.pub -n airflow
helm
configuration (or customize an existing configuration). This sample configuration will use the KubernetesExecutor
by default.
git.url
to the URL of the private repository you created earlier, e.g. ssh://git@github.com/your-git-org/your-dag-repo
. Note: Make sure your ssh URL contains only forward slashes.git.ref
to the DAG branch, e.g. main
you created earlier.git.secret
to the name of the secret you created, e.g. airflow-secret
.git.gitSync.refreshTime
as desired.Example excerpt from a customized configuration:
## configs for the DAG git repository & sync container
##
git:
## url of the git repository
##
## EXAMPLE: (HTTP)
## url: "https://github.com/torvalds/linux.git"
##
## EXAMPLE: (SSH)
## url: "ssh://git@github.com:torvalds/linux.git"
##
url: "ssh://git@github.com/your-git-org/your-dag-repo"
## the branch/tag/sha1 which we clone
##
ref: "main"
## the name of a pre-created secret containing files for ~/.ssh/
##
## NOTE:
## - this is ONLY RELEVANT for SSH git repos
## - the secret commonly includes files: id_rsa, id_rsa.pub, known_hosts
## - known_hosts is NOT NEEDED if `git.sshKeyscan` is true
##
secret: "airflow-secret"
...
gitSync:
...
refreshTime: 10
airflow:
## configs for the docker image of the web/scheduler/worker
##
image:
repository: elyra/airflow
The container image is created using this Dockerfile
and published on Docker Hub and quay.io.
helm install "airflow" stable/airflow --values path/to/your_customized_helm_values.yaml
Once Apache Airflow is deployed you are ready to create and run pipelines, as described in the tutorial.
To enable running of notebook pipelines on an existing Apache Airflow deployment
airflow.cfg
.Once Apache Airflow is deployed you are ready to create and run pipelines, as described in the tutorial.