Skip to content

ML + Dataset BoM spec clarifications and feedback #229

@willarmiros

Description

@willarmiros

I work at Protect AI. We are building tooling for ML teams to build AI/ML BOM programatically. We are actively looking into offering CycloneDX compliant BOMs for this purpose. We have some questions and feedback on the changes introduced by #209 in the upcoming v1.5 spec.

It is unclear how to associate ComponentData with Dataset files

In the new Data component, there is a ComponentDataContents field that has a single URL. In our experience however, a single logical dataset could be composed of several distinct files, each with their own URL, name, version, hash, etc. One way of representing this would be to create several File subcomponents nested within the Data component since the files are just pieces that make up the Dataset. Another way is to use dependencies to show that a Dataset component depends on 1 or more File components. However both of these approaches would circumvent using the ComponentDataContents field at all, so we are wondering:

  1. Which approach better leverages the CycloneDX spec?
  2. If one of the suggested approaches are preferred, what purpose does ComponentDataContents serve?

Are there concerns with using a purl for model locations?

It is a common practice to serialize a trained model and store it as a file in a model registry or in cloud storage. We plan to use a purl to locate these models, because there is already some support for them in the purl-spec added in package-url/purl-spec#201. For future model registries like KubeFlow, SageMaker (and more) do we anticipate those will need further purl-spec updates? Alternatively, we can come up with custom schema not defined in the purl-spec.

Include hyperparameters in ML Model component

Hyperparameters are the key attributes to reproducing a given ML Model. The component for a Model should capture this data, especially since changing hyperparameters can significantly change the behavior of the model. The SPDX SBOM specification for AI Models includes a hyperparameters entry.

cc @iamfaisalkhan @badarahmed for visibility

Metadata

Metadata

Assignees

No one assigned

    Labels

    CDX 1.5related to release v1.5

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions