Skip to content

Performance bug: artifact-version delete does global only_unused depagination per delete #4684

@strickvl

Description

@strickvl

Summary

Deleting artifact versions through ZenML's high-level client API appears to be pathologically slow when many unused artifact versions exist in a project.

The hot path seems to be Client._delete_artifact_version(...), which validates deletability using:

if artifact_version not in depaginate(
    self.list_artifact_versions,
    only_unused=True,
):
    raise ValueError(...)

This means each individual deletion re-lists the full global set of unused artifact versions before performing the actual delete.

Why this is a problem

Instead of:

  • fetch target
  • validate target
  • delete target

The current behavior is effectively:

  • fetch target by ID
  • list all unused artifact versions across the project
  • check membership
  • delete target

...and that happens once per deleted version.

This scales very poorly when many unused artifact versions exist.

Concrete evidence

In our REST-backed setup, deleting a single old artifact version took about 41.3s, while the actual DELETE request itself took about 0.1s.

Observed request pattern for 1 deletion:

  • GET /api/v1/artifact_versions: 176
  • GET /api/v1/artifact_versions/<id>: 1
  • DELETE /api/v1/artifact_versions/<id>: 1

For 2 deletions, the cost roughly doubled:

  • wall time: ~83.7s
  • GET /api/v1/artifact_versions: 351
  • actual DELETE calls: 2

So the behavior looks like “one full only_unused=True scan per delete”.

Relevant code path

In zenml/src/zenml/client.py:

def _delete_artifact_version(self, artifact_version):
    if artifact_version not in depaginate(
        self.list_artifact_versions,
        only_unused=True,
    ):
        raise ValueError(...)
    self.zen_store.delete_artifact_version(artifact_version.id)

And depaginate(...) fetches every page.

Expected behavior

Deletion should validate the specific target artifact version much more narrowly.

For example, instead of depaginating the entire only_unused=True result set, ZenML could use a targeted query such as:

list_artifact_versions(id=<target_id>, only_unused=True, size=1)

or another backend-side existence check for just that artifact version.

That would preserve correctness while avoiding the repeated global scan.

Workaround

We implemented a Kitaru-side workaround by:

  • doing a per-key preflight query for unused versions
  • verifying the target IDs
  • then calling the lower-level store delete directly

That reduced one deletion in the same environment from ~41.3s to ~0.64s.

So there seems to be a large performance win available upstream in ZenML's high-level delete path itself.

Suggested fix direction

Smallest likely fix:

  • replace the global depaginate(... only_unused=True) membership check with a targeted query for the specific artifact version ID

Longer-term:

  • a batch delete API for artifact versions could help as well

Metadata

Metadata

Assignees

Labels

bugSomething isn't workingcore-teamIssues that are being handled by the core teamplannedPlanned for the short term

Type

Projects

Status

No status

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions