Skip to content

Comments

Support multi-document compiled YAML (pipeline + platform spec)#100

Open
sduvvuri1603 wants to merge 3 commits intokubeflow:mainfrom
sduvvuri1603:feature/multi-doc-kfp-compilation
Open

Support multi-document compiled YAML (pipeline + platform spec)#100
sduvvuri1603 wants to merge 3 commits intokubeflow:mainfrom
sduvvuri1603:feature/multi-doc-kfp-compilation

Conversation

@sduvvuri1603
Copy link
Contributor

What was going wrong

When the KFP compiler produced a multi-document YAML file (pipeline spec + platform spec), we only read the first document with yaml.safe_load(f). Base images that lived only in the second document were never passed to extract_base_images(), so validation either missed those images or failed later when something expected them.

What we changed

Therefore, Raising this PR to switch to yaml.safe_load_all(f) and, when there are multiple documents, merge deploymentSpec, root, and components from all of them into a single dict. That merged dict is what we pass to extract_base_images(), so validation now sees images from both the pipeline spec and the platform spec. Only the read/merge logic in kfp_compilation.py was changed; the compiler write path and validate_base_images.py are unchanged.

@google-oss-prow
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign hbelmiro for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

- Read compiled output with yaml.safe_load_all so multi-doc does not fail.
- Merge deploymentSpec/root/components from all docs into one dict so
  base image extraction sees images from both pipeline and platform spec.
- Add _merge_ir_docs in kfp_compilation and unit tests for merge behavior.
- No change to compiler write path or to validate_base_images.py.

Co-authored-by: Cursor <cursoragent@cursor.com>
Signed-off-by: sduvvuri1603 <sduvvuri@redhat.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
@sduvvuri1603 sduvvuri1603 force-pushed the feature/multi-doc-kfp-compilation branch from a8faa24 to 009967f Compare February 19, 2026 13:20
Signed-off-by: sduvvuri1603 <sduvvuri@redhat.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
@sduvvuri1603 sduvvuri1603 marked this pull request as ready for review February 19, 2026 13:44
Copilot AI review requested due to automatic review settings February 19, 2026 13:44
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates KFP compilation parsing to support multi-document compiled YAML (pipeline spec + platform spec) by loading all YAML documents and merging the relevant IR sections before downstream base-image validation.

Changes:

  • Switch YAML parsing from yaml.safe_load to yaml.safe_load_all in compile_and_get_yaml().
  • Add _merge_ir_docs() to merge deploymentSpec.executors, root.dag.tasks, and components across documents.
  • Add unit tests covering _merge_ir_docs() behavior and integration with extract_base_images().

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
scripts/lib/kfp_compilation.py Loads all YAML documents and merges IR sections for downstream image extraction.
scripts/lib/tests/test_kfp_compilation.py Adds unit tests validating multi-doc merge behavior and compatibility with extract_base_images().

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +17 to +43
def _merge_ir_docs(docs: list[dict[str, Any]]) -> dict[str, Any]:
"""Merge multiple IR YAML docs so deploymentSpec, root, components are combined."""
if not docs:
return {}
if len(docs) == 1:
return docs[0]

def _get(d: Any, path: list[str]) -> dict[str, Any]:
for k in path:
d = (d.get(k) or {}) if isinstance(d, dict) else {}
return d if isinstance(d, dict) else {}

def _ensure_and_merge(out: dict[str, Any], path: list[str], src: dict[str, Any]) -> None:
cur: Any = out
for k in path[:-1]:
cur = cur.setdefault(k, {}) if isinstance(cur, dict) else {}
if not path or not isinstance(cur, dict):
return
key = path[-1]
target = cur.get(key)
if not isinstance(target, dict):
cur[key] = {}
target = cur[key]
target.update(src)

out: dict[str, Any] = dict(docs[0])
paths = [["deploymentSpec", "executors"], ["root", "dag", "tasks"], ["components"]]
Copy link

Copilot AI Feb 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_merge_ir_docs() is annotated as taking list[dict[str, Any]], but the implementation already anticipates non-dict docs (and the new tests pass a string). Also, out = dict(docs[0]) will raise TypeError if the first YAML document is None/non-mapping (safe_load_all can yield None for empty/null docs). Consider changing the parameter type to list[Any] (or Sequence[Any]) and filtering to dict docs up front (return {} if none) so the merge is robust regardless of document ordering/content.

Copilot uses AI. Check for mistakes.
Comment on lines 95 to 101
with open(output_path) as f:
return yaml.safe_load(f)
docs = list(yaml.safe_load_all(f))
if not docs:
return {}
if len(docs) == 1:
return docs[0]
return _merge_ir_docs(docs)
Copy link

Copilot AI Feb 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

compile_and_get_yaml() is declared to return dict[str, Any], but when there is exactly one YAML document it returns docs[0] without validating it is a dict. If the compiled YAML ever contains an empty/null document (safe_load_all yields None) or a non-mapping top-level, this will propagate a non-dict to callers and will cause extract_base_images() to raise (it requires a dict). Consider filtering/validating the loaded documents and returning {} (or raising a clearer error) when no dict document is present.

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Single-document YAML is already guaranteed to be a dict because we only keep dict documents here:
docs = [d for d in raw if isinstance(d, dict)]
So when there’s one document we return docs[0], which is always a dict.

Added a clear ValueError for the case where there is no dict document at all: " Expected at least one pipeline/component spec.” instead of returning {}."

Signed-off-by: sduvvuri1603 <sduvvuri@redhat.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
return {}
if len(docs) == 1:
return docs[0]
return _merge_ir_docs(docs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to combine the docs? Do we need platform spec somewhere? Can we just return pipeline spec dict?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue was multi-doc compiled YAML causing us to miss base images when we only read the first document.

So my idea was to merge so that extract_base_images() sees both pipeline and platform spec, since images can live in either doc ( incorporate both for correct validation)

Pls correct me if you think pipeline spec alone would be enough :)

compiler_class().compile(func, output_path)
with open(output_path) as f:
return yaml.safe_load(f)
docs = list(yaml.safe_load_all(f))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we don't need to cast it to a list, we can just use safe_load_all as it is and check its instance type

Also,

  • Throw an exception in case docs is nil or empty
  • Return docs if instance is dict
  • Else check which index of docs is pipeline spec (by checking which doc contains key pipelineInfo

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants