-
Notifications
You must be signed in to change notification settings - Fork 70
Add QualityFlow: AI powered test generation pipeline project #242
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@strickvl merging this in because i need it now but please continue your review i will fix it in another branch |
strickvl
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the agentic test generation bit needs a closer look, but otherwise the rest of my comments are smaller stuff...
Use the junit.xml file instead of parsing the stdout output
We already have the junit.xml files, so we should use them. Makes the parsing less brittle.
We can defer to parsing the XML first, and then always fallback to the stdout parsing when that fails.
Duplication in evaluate_coverage.py (currently unused)
I think the logic in the evaluate_coverage.py is duplicated inside the report.py step.
So probably best to reintroduce the evaluate_coverage step into the pipeline+ remove the duplicated logic from the report step?
README stuff
list of pipeline steps is maybe a bit confusing. maybe split the test execution into two separate points, and also add an 'evaluation' one (if we restore the step as per above comment)?
Unimplemented features / things just needing a note in the README or something
- The CHANGED_FILES strategy in analyze_code.py is a stub that falls back to selecting all files. This should be made clear in the documentation to avoid confusion.
- The LLM cost estimation in steps/gen_tests_agent.py uses hardcoded price values. These will quickly go out of date. It would be better to add a comment indicating that these are estimates and link to the official pricing pages for OpenAI and Anthropic.
Dependencies
- requirements.txt is missing hypothesis
- also openai + anthropic are listed as optional, but I think the code will fail when they're not installed (i.e. the code doesn't handle their absence fully)
Test generation
I think this is probably the bit which needs the most work for it to be a bit more serious?
Both fake and baseline tests generate trivial code that doesn't actually test the source.
- Fake tests: Always pass with assertions like self.assertTrue(True)
- Baseline tests: Generate skeleton methods with just pass statements
Even when the LLM calls succeed for the agentic test generation, the current prompt templates include a commented-out import line, so the generated tests may still not import or exercise the target module unless the model decides to do so. Which makes the coverage low or misleading?
This makes coverage comparisons somewhat meaningless. Additionally, coverage is currently collected for the entire workspace path, which can include the generated tests themselves if I'm not mistaken?.
Resource cleanup
Temporary directories aren't always cleaned up properly on errors:
- fetch_source.py creates temp dirs but cleanup only happens on some error paths
- run_tests.py has finally block but uses ignore_errors=True
|
|
||
| If your project only requires Python dependencies listed in `requirements.txt`, **do not include a Dockerfile**. The projects backend will automatically build your project using the generic Dockerfile available at: | ||
| [https://github.com/zenml-io/zenml-projects-backend/blob/main/.docker/project.Dockerfile](https://github.com/zenml-io/zenml-projects-backend/blob/main/.docker/project.Dockerfile) | ||
| If your project only requires Python dependencies listed in `requirements.txt`, **do not include a Dockerfile**. The projects backend will automatically build your project using the generic Dockerfile available at the zenml-projects-backend repo. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| If your project only requires Python dependencies listed in `requirements.txt`, **do not include a Dockerfile**. The projects backend will automatically build your project using the generic Dockerfile available at the zenml-projects-backend repo. | |
| If your project only requires Python dependencies listed in `requirements.txt`, **do not include a Dockerfile**. The projects backend will automatically build your project using the generic Dockerfile available at the [zenml-projects-backend](https://github.com/zenml-io/zenml-projects-backend) repo. |
Summary
QualityFlow Project Summary
QualityFlow is a ZenML-powered MLOps pipeline that demonstrates AI-driven test generation for Python
codebases.
What it does:
Key Features:
Perfect for:
Get started in 3 steps:
A solid example of production-ready MLOps with practical business value - automated test generation at
scale.
Checklist
Related Issues
Please link to any relevant issues or discussions.