Pipelines comprise of nodes that are implemented using components. A component typically only implements one unit-of-work, such as loading data, transforming data, training a model, or deploying a model to serve. The following depicts a basic pipeline in the Visual Pipeline Editor, which utilizes components to load a data file, split the file, truncates the resulting files, and counts the number of records in each file.
The same pipeline could be implemented using a single component that performs all these tasks, but that component might not be as universally re-usable. Consider, for example, that for another project the data resides in a different kind of storage. With fine-granular components you'd only have to replace the load data component with one that supports the other storage type and could retain everything else.
Elyra includes three generic components that allow for the processing of Jupyter notebooks, Python scripts, and R scripts. These components are called generic because they can be included in pipelines for any supported runtime type: local/JupyterLab, Kubeflow Pipelines, and Apache Airflow. Components are exposed in the pipeline editor via the palette.
Note: Refer to the Best practices topic in the User Guide to learn more about special considerations for generic components.
Custom components are commonly only implemented for one runtime type, such as Kubeflow Pipelines or Apache Airflow. (The local runtime type does not support custom components.)
There are many custom components available on the web that you can include in pipelines, but you can also create your own. Details on how to create a component can be found in the Kubeflow Pipelines documentation and the Apache Airflow documentation. Do note that in Apache Airflow components are called operators, but for the sake of consistency the Elyra documentation refers to them as components.
Note: Refer to the Requirements and best practices for custom pipeline components topic in the User Guide to learn more about special considerations for custom components.
Elyra does not include its own component repository. Instead you can configure it to pull components from local or remote catalogs, such as filesystems, web resources, or source control systems. Elyra defines a connector API, which provides access to the catalogs resources.
Elyra includes connectors for the following component catalog types:
Example: A filesystem component catalog that is configured using the /users/jdoe/kubeflow_components/dev/my_component.yaml
path makes my_component.yaml
available to Elyra.
Example: A directory component catalog that is configured using the /users/jdoe/kubeflow_components/test
path makes all component files in that directory available to Elyra.
URL component catalogs provide access to components that are stored on the web and can be retrieved using anonymous HTTP GET
requests.
Example: A URL component catalog that is configured using the http://myserver:myport/mypath/my_component.yaml
URL makes the my_component.yaml
component file available to Elyra.
Apache Airflow package catalogs provide access to Apache Airflow operators that are stored in Apache Airflow built distributions.
Apache Airflow provider package catalogs provide access to Apache Airflow operators that are stored in Apache Airflow provider packages.
Refer to section Built-in catalog connector reference for details about these connectors.
You can add support for other component catalogs by installing a connector from the catalog connector marketplace or by implementing your own catalog connector.
To help you get started with custom components, the Elyra community has selected a couple for Kubeflow Pipelines and makes them available using example catalogs.
Whether or not your Elyra includes the example components depends on how you deployed it:
pip install elyra[all]
) include the example components. However, the catalog must be explicitly added to the palette.pip install elyra
) do not include the example components. The example catalog must be separately installed and explicitly added to the palette.Installing and enabling the component examples catalogs
Follow the instructions in Kubeflow Pipelines component examples catalog.
Details and demo pipelines for some of the included components can be found in the Elyra examples repository:
Custom Airflow components imported from some types of component catalog connectors require additional configuration in order to be used in pipelines. See 'Best Practices for Custom Pipeline Components' for details.
Components are managed in Elyra using the JupyterLab UI or the Elyra command line interface.
Custom components can be added, modified, duplicated, and removed in the Pipeline Components panel.
To access the panel in JupyterLab:
Open Pipeline Components
button in the pipeline editor toolbar.OR
Pipeline Components
panel from the JupyterLab sidebar.OR
Cmd/Ctrl + Shift + C
) and search for Manage Pipeline Components
.To add components from a catalog:
Pipeline Components
panel.+
in the Pipeline Components panel.Elyra queries the catalog, loads the components, and adds them to the Visual Pipeline Editor palette.
Tip: check the log file for error messages if no components from the added catalog are displayed in the palette.
Pipeline Components
panel.edit
(pencil) icon next to the entry name.To duplicate a component catalog entry:
Pipeline Components
panel.To remove a component catalog entry and its referenced component(s) from the Visual Pipeline Editor palette:
Pipeline Components
panel.delete
(trash) icon next to the entry name.Caution: Pipelines that utilize the referenced components are no longer valid after the catalog entry was deleted.
Custom components can be added, modified, and removed using the elyra-metadata
command line interface.
To list component catalog entries:
$ elyra-metadata list component-catalogs
Available metadata instances for component-catalogs (includes invalid):
Schema Instance Resource
------ -------- --------
elyra-kfp-examples-catalog kubeflow_pipelines_examples /.../Jupyter/metadata/component-catalogs/kubeflow_pipelines_examples.json
To add a component catalog entry run elyra-metadata create component-catalogs
.
$ elyra-metadata create component-catalogs \
--display_name "filter components" \
--description "filter text in files" \
--runtime_type KUBEFLOW_PIPELINES \
--schema_name "url-catalog"\
--paths "['https://raw.githubusercontent.com/elyra-ai/examples/master/component-catalog-connectors/kfp-example-components-connector/kfp_examples_connector/resources/filter_text_using_shell_and_grep.yaml']" \
--categories '["filter content"]'
Refer to section Configuration properties for parameter descriptions.
To replace a component catalog entry run elyra-metadata update component-catalogs
:
$ elyra-metadata update component-catalogs \
--name "filter_components" \
--display_name "filter components" \
--description "filter text in files" \
--runtime_type KUBEFLOW_PIPELINES \
--schema_name "url-catalog"\
--paths "['https://raw.githubusercontent.com/elyra-ai/examples/master/component-catalog-connectors/kfp-example-components-connector/kfp_examples_connector/resources/filter_text_using_shell_and_grep.yaml']" \
--categories='["file operations"]'
Note: You must specify all property values, not only the ones that you want to modify.
Refer to section Configuration properties for parameter descriptions.
To export component catalogs:
$ elyra-metadata export component-catalogs \
--directory "/tmp/foo"
The above example will export all component catalogs to the "/tmp/foo/component-catalogs" directory.
Note that you must specify the --directory
option.
There are two flags that can be specified when exporting component catalogs:
--include-invalid
flag.--clean
flag. Using the --clean
flag in the above example will empty the "/tmp/foo/component-catalogs" directory before exporting the component catalogs.To import component catalogs:
$ elyra-metadata import component-catalogs \
--directory "/tmp/foo"
The above example will import all valid component catalogs in the "/tmp/foo" directory (files present in any sub-directories will be ignored).
Note that you must specify the --directory
option.
By default, metadata will not be imported if a component catalog instance with the same name already exists. The --overwrite
flag can be used to override this default behavior and to replace any installed metadata with the newer file in the import directory.
To remove a component catalog entry and its component definitions from the Visual Pipeline Editor palette:
$ elyra-metadata remove component-catalogs \
--name "filter_components"
Refer to section Configuration properties for parameter descriptions.
The Elyra 3.3 release renamed Component Registries to Component Catalogs and split the component-registry
schema
into three separate "component catalog" schemas based on the old schema's location-type
. As a result, any
user-defined component registry instances created prior to Elyra 3.3 will not be available unless migrated.
The Elyra 3.7 release, however, officially removes support for the component-registries
schema, including the ability
to migrate component registry instances to component catalog instances. If you have upgraded to Elyra 3.7+ from Elyra
3.2 or earlier and would still like access to your previously-defined instances, you will first need to install a
down-level release and migrate your instances using the instructions below. This migration is performed using the
elyra-metadata
CLI tool.
To determine the instances available to migrate, issue the following command:
$ elyra-metadata list component-registries
In this example, there are three user-defined instances.
Available metadata instances for component-registries (includes invalid):
Schema Instance Resource
------ -------- --------
component-registry airflow_components /Users/jovyan/Library/Jupyter/metadata/component-registries/airflow_components.json
component-registry aa_custom /Users/jovyan/Library/Jupyter/metadata/component-registries/aa_custom.json
component-registry myoperators /Users/jovyan/Library/Jupyter/metadata/component-registries/myoperators.json
You may find that some of these instances no longer apply. If there are any that do not apply to 3.3, they can be removed individually:
$ elyra-metadata remove component-registries --name aa_custom
Metadata instance 'aa_custom' removed from schemaspace 'component-registries'.
````
Note: Because the `component-registries` schemaspace has been deprecated, instances can be listed, removed, or migrated, but not created.
#### Migrating instances
Once the set of component registry instances to migrate have been determined, issue the following command to migrate the remaining instances:
```bash
$ elyra-metadata migrate component-registries
Upon completion, which should be on the order of seconds, output similar to the following should be produced:
[I 2021-11-15 11:05:48,012.012] Migrating 'component-registries' instance 'myoperators' to schema 'local-file-catalog' of schemaspace 'component-catalogs'...
[I 2021-11-15 11:05:48,042.042] Migrating 'component-registries' instance 'airflow_components' to schema 'url-catalog' of schemaspace 'component-catalogs'...
The following component-registries instances were migrated: ['myoperators', 'airflow_components']
Once migrated, these entries should appear in the set of component catalogs. This can be confirmed by listing the component-catalogs instances:
$ elyra-metadata list component-catalogs
Available metadata instances for component-catalogs (includes invalid):
Schema Instance Resource
------ -------- --------
local-file-catalog aa_custom /Users/jovyan/Library/Jupyter/metadata/component-catalogs/aa_custom.json
local-file-catalog myoperators /Users/jovyan/Library/Jupyter/metadata/component-catalogs/myoperators.json
url-catalog airflow_components /Users/jovyan/Library/Jupyter/metadata/component-catalogs/airflow_components.json
The component catalog entry properties are defined as follows. The string in the headings below, which is enclosed in parentheses, denotes the CLI option name.
A user-friendly name for the catalog entry. Note that the catalog entry name is not displayed in the palette. This property is required.
Example: data load components
The canonical name for this catalog entry. A value is generated from Name
if no value is provided.
Example: data_load_components
A description for the catalog entry.
Example: Load data from external data sources
In the pipeline editor palette components are grouped into categories to make them more easily accessible. If no category is provided, the components defined by this catalog entry are added to the palette under no category
. A limit of 18 characters or fewer is enforced for each category.
Examples (CLI):
['load data from db']
['train model','pytorch']
The runtime environment that supports the component(s). Valid values are the set of configured runtimes that appear in the dropdown (UI) or help-text (CLI). This property is required.
Examples:
APACHE_AIRFLOW
KUBEFLOW_PIPELINES
Elyra supports fetching components from the filesystem and the web using its built-in connectors.
The filesystem component catalog connector provides access to components that are stored in the filesystem where Elyra is running:
~
may be used to denote the user's home directory.*
or ?
) are not supported.base directory
and a relative file path to make pipelines portable across environments.Examples (GUI):
/Users/patti/specs/load_data_from_public_source/http_operator.py
~patti/specs/filter_files/row_filter.yaml
Examples (CLI):
['/Users/patti/specs/load_data_from_public_source/http_operator.py']
['~patti/specs/filter_files/row_filter.yaml']
['/Users/patti/specs/comp1.yaml','/Users/patti/specs/comp2.yaml']
The directory component catalog connector provides access to components that are stored in a filesystem directory:
Path
is set to /Users/patti/specs/load_from_database
, the connector searches the specified directory for components for the selected runtime type.~
may be used to denote the user's home directory.Examples (GUI):
/Users/patti/specs/load_from_database
~patti/specs/load_from_cloud_storage
Examples (CLI):
['/Users/patti/specs/load_from_database']
['~patti/specs/load_from_cloud_storage']
['/Users/patti/load_specs/','/Users/patti/cleanse_specs/']
The URL component catalog connector provides access to components that are stored on the web:
GET
request.Examples (GUI):
https://raw.githubusercontent.com/elyra-ai/examples/master/component-catalog-connectors/kfp-example-components-connector/kfp_examples_connector/resources/filter_text_using_shell_and_grep.yaml
Examples (CLI):
['https://raw.githubusercontent.com/elyra-ai/examples/master/component-catalog-connectors/kfp-example-components-connector/kfp_examples_connector/resources/filter_text_using_shell_and_grep.yaml']
['<URL_1>','<URL_2>']
The Apache Airflow package catalog connector provides access to operators that are stored in Apache Airflow built distributions:
GET
request.Examples:
Apache Airflow (v1.10.15):
https://files.pythonhosted.org/packages/f0/3a/f5ce74b2bdbbe59c925bb3398ec0781b66a64b8a23e2f6adc7ab9f1005d9/apache_airflow-1.10.15-py2.py3-none-any.whl
The Apache Airflow provider package catalog connector provides access to operators that are stored in Apache Airflow provider packages:
GET
request.Examples:
apache-airflow-providers-http (v2.0.2):
https://files.pythonhosted.org/packages/a1/08/91653e9f394cbefe356ac07db809be7e69cc89b094379ad91d6cef3d2bc9/apache_airflow_providers_http-2.0.2-py3-none-any.whl