Skip to content

[YAML] - Jinja % include example#35914

Merged
damccorm merged 18 commits intoapache:masterfrom
derrickaw:jinja_example
Aug 22, 2025
Merged

[YAML] - Jinja % include example#35914
damccorm merged 18 commits intoapache:masterfrom
derrickaw:jinja_example

Conversation

@derrickaw
Copy link
Collaborator

#35909


Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Mention the appropriate issue in your description (for example: addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment fixes #<ISSUE NUMBER> instead.
  • Update CHANGES.md with noteworthy changes.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md

GitHub Actions Tests Status (on master branch)

Build python source distribution and wheels
Python tests
Java tests
Go tests

See CI.md for more information about GitHub Actions CI or the workflows README to see a list of phrases to trigger workflows.

@derrickaw
Copy link
Collaborator Author

Run Python PreCommit 3.10

@derrickaw
Copy link
Collaborator Author

Run Python PreCommit 3.11

1 similar comment
@derrickaw
Copy link
Collaborator Author

Run Python PreCommit 3.11

@derrickaw derrickaw marked this pull request as ready for review August 20, 2025 16:18
@derrickaw
Copy link
Collaborator Author

assign set of reviewers

@github-actions
Copy link
Contributor

Assigning reviewers:

R: @liferoad for label python.

Note: If you would like to opt out of this review, comment assign to next reviewer.

Available commands:

  • stop reviewer notifications - opt out of the automated review tooling
  • remind me after tests pass - tag the comment author after tests pass
  • waiting on author - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)

The PR bot will only process comments in the main thread (not review comments).

@liferoad
Copy link
Contributor

Love the jinia idea for include. :)
We can add more examples more explicitly here later if you want: https://beam.apache.org/documentation/sdks/yaml/#jinja-templatization

@derrickaw derrickaw changed the title [YAML] - Jinja example [YAML] - Jinja % include example Aug 21, 2025
@derrickaw
Copy link
Collaborator Author

Run Python_Transforms PreCommit 3.10

@codecov
Copy link

codecov bot commented Aug 21, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 56.67%. Comparing base (f59a45e) to head (ffcda88).
⚠️ Report is 14 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff            @@
##             master   #35914   +/-   ##
=========================================
  Coverage     56.67%   56.67%           
  Complexity     3380     3380           
=========================================
  Files          1219     1219           
  Lines        184561   184565    +4     
  Branches       3507     3507           
=========================================
+ Hits         104595   104605   +10     
+ Misses        76639    76633    -6     
  Partials       3327     3327           
Flag Coverage Δ
python 80.82% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@derrickaw
Copy link
Collaborator Author

Run Python PreCommit 3.12

Copy link
Contributor

@damccorm damccorm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for putting this together!

- [wordCount.yaml](#TODO: pending)

% include:
- [wordCount.yaml](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/yaml/examples/transforms/jinja/include/wordCount.yaml)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be wordCountInclude.yaml?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would also be good to give an example of invoking this (and how you would pass in appropriate parameters)

Copy link
Collaborator Author

@derrickaw derrickaw Aug 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. I had it here before, but I moved it to the other Readme. Will update name.
  2. Similar pattern that Charles used on some of his ML work with Readme usage more in the applicable folder and not in the main one since multiple folders will be under Jinja.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar pattern that Charles used on some of his ML work with Readme usage more in the applicable folder and not in the main one since multiple folders will be under Jinja.

I missed that Readme entirely when I did my review. SGTM

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we could just link to that folder or the Readme here then? I worry that it will be easy to miss if you just follow the link (basically how I reviewed this)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do. Thanks

inputs from the user through `% include`, `% import`, and inheritance
directives.

% import:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should these be code blocks? Also looks like some pieces are still missing, we should probably not include them for now

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was more of a heading and not usable code. Will try to rethink this.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, thanks

- name: Read from GCS
type: ReadFromText
config:
{% include 'apache_beam/yaml/examples/transforms/jinja/include/submodules/readFromText.yaml' %}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than just including the config, could we maybe map in whole transforms for simple ones like this?

Since this is an example, we should also explain what we're doing in comments

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Practically, I don't think it would be common to include just the config like this, especially if it is very simple (and also parameterized)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd also recommend using more descriptive file names. So for examples:

readFromTextTransform.yaml (assuming we put the whole transform there)

or

mapToFieldsCountConfig.yaml (assuming we keep that one as config since it is more complicated)

Copy link
Collaborator Author

@derrickaw derrickaw Aug 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. We generally don't want to put the whole transform here because it limits us from changing names or inputs. Wanted to give an example of how to use. How about a mix? Some transforms, some configs only?
  2. Ok on better names.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We generally don't want to put the whole transform here because it limits us from changing names or inputs.

I don't follow - could you give an example of someething we'd want to change but wouldn't be able to?

How about a mix? Some transforms, some configs only?

This SGTM - I think the more complicated configs make sense, the one liners seem less useful

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't follow - could you give an example of someething we'd want to change but wouldn't be able to?

Maybe a mute point now with both options, but if we have the whole transform in the include yaml. We can't change the name or input paths etc unless we pass even more stuff through the jinjia variables. So the option I went with was to give an example where a user could control or orient the pipeline graph from the main yaml file and then include what configs they needed from a library. When we push everything to a library, we lose that control, but maybe that is fine.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, thanks.

@liferoad
Copy link
Contributor

/gemini review

@liferoad
Copy link
Contributor

I am trying to test how good Gemini knows our YAML. The review comments could be skipped. :)

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces an example for using Jinja's % include directive in Beam YAML pipelines. The changes include new example files, corresponding tests, and documentation updates. While the core functionality is a valuable addition, there are several issues that need to be addressed. These include broken commands and incorrect links in the documentation, fragile test logic, and potential blockers in the test data, such as a reference to a missing file. Addressing these points will improve the quality and usability of the new example.

@derrickaw
Copy link
Collaborator Author

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new example demonstrating the use of Jinja's % include directive in Beam YAML pipelines. The changes include the new example files, corresponding tests, and updates to documentation. My review focuses on improving code clarity, fixing a bug in the documentation, and enhancing the maintainability of the new test code and YAML examples. Overall, this is a great addition that showcases a powerful feature of Beam YAML.

@derrickaw
Copy link
Collaborator Author

Run Prism_Python PreCommit 3.12

@derrickaw
Copy link
Collaborator Author

Run Python_Integration PreCommit 3.9

@derrickaw
Copy link
Collaborator Author

Run Python_Transforms PreCommit 3.10

@derrickaw
Copy link
Collaborator Author

Run Python_ML PreCommit 3.10

@derrickaw derrickaw requested review from damccorm and liferoad August 22, 2025 13:45
Copy link
Contributor

@damccorm damccorm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM. There are failing precommits which look to me like flakes, I'm going to try to run them again and see if they pass before merging

@damccorm damccorm merged commit 90f8c50 into apache:master Aug 22, 2025
130 of 142 checks passed
@derrickaw derrickaw deleted the jinja_example branch August 22, 2025 15:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants