Skip to content

Expose ProcessingPhase in DataVolume Status #4021

@brandboat

Description

@brandboat

Is your feature request related to a problem? Please describe:

Currently, the ProcessingPhase information (defined in the importer/data-processor.go) is only available internally within the importer/uploadserver pod and is not exposed to users. This creates a visibility gap, especially during long-running operations like image conversion.

A common scenario is when downloading a QCOW2 image via HTTP:

  1. The progress reaches ~99% (download complete) and last for a while
  2. Behind the scenes, QEMU conversion is happening, which can take considerable time (especially for large images)
  3. Users have no visibility into this conversion phase unless they check the logs inside the importer pod

This lack of visibility makes it difficult for users to understand what's happening with their DataVolume and whether the system is still working or stuck.

Describe the solution you'd like:

Add a new field populatorPhase to the DataVolume status that reflects the current processing phase of the importer/uploadserver pod.

The proposed implementation includes:

  1. Adding a metric to track the current processing phase in the importer/uploadserver pod
  2. Updating an annotation on the PVC with the current phase information
  3. Exposing this phase information to datavolume.status.populatorPhase

This would allow users to see exactly what phase their import/upload is in.

Example use case:

$ kubectl get datavolume my-dv -o jsonpath='{.status.populatorPhase}'
Convert

Describe alternatives you've considered:

  • Get processing phase from the importer/uploadserver pod logs, but this requires additional kubectl commands and is not user-friendly.
  • Emitting Kubernetes events when entering a new processing phase. Events could be emitted from the importer/uploadserver pod during phase transitions. However, there is currently no convention for emitting events from importer/uploadserver pods in the codebase. While events would provide more precise timing information compared to polling-based metrics, the proposed solution uses the Prometheus metrics endpoint, which is polled every 2 seconds, meaning short-lived phases may not be captured between polling intervals. Despite this limitation, the metrics-based approach is more consistent with the existing architecture.

Additional context:
This issue was raised in #3537, which mentions that while conversion typically doesn't take as long as the download/upload process, it can still take a considerable amount of time for large images. Exposing this phase information would greatly improve user experience and observability.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions