-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Concepts
Below are the different concepts used within ML Pipelines:
Description of an ML workflow, including all of its different components, and how they come together in the form of a graph, as well as a list of the parameters. This is the main shareable artifact, and is created and edited separately from the ML Pipelines UI, although the UI allows the user to upload and list pipelines.
One building block in the pipeline template; self-contained user code that performs one step in the pipeline, such as preprocessing, transformation, training… etc. It must be packaged as a Docker image.
A copy of the pipeline with all fields (parameters) filled out, plus an optional pipeline trigger. Jobs are generated by the system when the user deploys the pipeline. A job with a recurring trigger will run periodically, and its trigger can be disabled/enabled from the UI.
A single execution of a pipeline job. A job with a certain trigger types will cause multiple runs to start. Runs comprise an immutable log of all experiments attempted by the user, and are designed to be self-contained to allow for reproducibility.
The user selects one of multiple types of triggers to tell the system when a job should schedule its runs:
- Run right away: for starting a one-off run.
- Periodic: for an interval-based scheduling of runs (e.g. every 2 hours, every 50 minutes).
- Cron: for specifying cron semantics for scheduling runs. The UI also has an option to allow the user to manually enter a cron expression.
An execution of one of the components in the pipeline. The relationship of a step to its component is much like that of a job to its pipeline: an instantiation relationship. In a complex pipeline, components can execute multiple times in loops, or conditionally after resolving an if/else like clause in the pipeline code.
Artifacts are outputs emitted by the pipeline's steps, which the ML Pipelines UI understands, and can render as rich visualizations. It’s useful for pipeline Components to include these in order to provide for performance evaluation, quick decision making for the run or comparison across different runs. They also make it possible to understand how the pipeline’s different components work. This can range from a plain textual view of the data, to rich interactive visualizations.
The frontend is supported by a REST API server, whose interface is discussed separately. For user data stored in external services (e.g. GCS), the frontend will make requests directly to those services using their client libraries.