-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Output Viewers
The ML Pipelines UI has built-in support for several types of visualizations, in order to provide for a rich performance evaluation and comparison experience. Components can leverage these by writing a JSON file at any point during their execution to their local filesystem. The file must be written to the root level, and named: /metadata.json. It includes an array of outputs, each of which describes metadata for an output viewer (discussed below), its structure looks like this:
{
"version": 1,
"outputs": [
{
"type": "confusion_matrix",
"format": "csv",
"source": "dir1/matrix.csv",
"schema": "dir1/schema.json",
"predicted_col": "column1",
"target_col": "column2"
},
{
...
}
]
}
If such a file is written to the component's container filesystem, it is extracted by ML Pipelines, and used by the UI to generate the specified viewer(s). The metadata specifies where the artifact data should be loaded from, and then the UI loads the data into memory and renders it. It's important to keep this data at a manageable level by the UI, for example by running a sampling step before exporting it as an artifact.
These are the different metadata fields that can be specified:
| Field name | Description |
|---|---|
format |
Specifies the format of the artifact data, default is 'csv'. NOTE The only format supported as of now is 'csv'. |
header |
A list of strings that are used as the header of the artifact data. |
labels |
A list of strings that are used to label artifact columns/rows. |
predicted_col |
Name of the predicted column. |
schema |
A list of {type, name} objects that specify the schema of the artifact data. |
source |
Full path to data. This can contain wildcards '*', in which case the data is concatenated before it's displayed by the UI. |
storage |
Storage provider service name, default is 'gcs'. |
target_col |
Name of the target column. |
type |
Name of the viewer, one of the ones below. |
Below are the different types of viewers supported, and the required metadata fields for each:
Metadata Fields:
sourcelabelsschema-
formatPlots a Confusion Matrix visualization using the data from the given source path, and the schema to be able to parse the data. Labels provide the names of the classes to be plotted on the x and y axes.
Metadata Fields:
sourceformat-
schemaPlots an ROC curve using the data from the given source path. It assumes the schema includes three columns with the following names: "fpr", "tpr" and "thresholds." Hovering on the ROC curve shows the threshold value used for the cursor's closes fpr and tpr values.
Metadata Fields:
sourceheader-
formatBuilds an HTML table out of the data at the given source path, where theheaderfield specifies what shows up in the first row of the table. The table supports pagination.
Metadata Fields:
-
sourceAdds a "Start Tensorboard" button to the output page. Clicking this button will start a Tensorboard Pod in the Kubernetes cluster, and switch the button to "Open Tensorboard." Clicking this button again opens up the Tensorboard interface in a new tab, pointing it to the logdir data specified in thesourcefield.
It's important to point out that Tensorboard instances are not completely managed by the ML Pipelines UI. The "Start Tensorboard" is only a convenience feature to avoid interrupting the user's workflow when looking at pipeline Runs. The user is responsible for recycling or deleting those Pods separately using their Kubernetes management tools.
Metadata Fields:
source
In order to provide the user with more flexibility rendering custom output, this viewer supports specifying an HTML file that is created by the component, and is rendered in the outputs page as is. It's important to note that this file must be self-contained, with no references to other files in the filesystem. It can still have absolute references to files on the web, however. Content running inside this web app is isolated in an iframe, and cannot communicate with the ML Pipelines UI.