Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
99 commits
Select commit Hold shift + click to select a range
31b2aba
Add AI-powered Shiny test generation and evaluation
karangattu Jul 24, 2025
a701f93
Update workflow to create PR for testing docs changes
karangattu Jul 24, 2025
a12962d
Add API key checks for Anthropic and OpenAI providers
karangattu Jul 24, 2025
88d46de
Refactor test file generation and validation logic
karangattu Jul 24, 2025
03233b3
Update imports and test exclusions for Shiny evaluation
karangattu Jul 24, 2025
a83ba1c
Remove add_test command and update test generation
karangattu Jul 25, 2025
5c83e7a
Update testing docs workflows and validation
karangattu Jul 25, 2025
a037fb2
Move evaluation test files to new directory
karangattu Jul 25, 2025
8be0a01
Update references from evaluation to inspect-ai apps
karangattu Jul 25, 2025
b96688a
Refactor testing framework to shiny/pytest/generate
karangattu Jul 25, 2025
9cf8715
Merge branch 'main' into integrate-test-generator
karangattu Jul 25, 2025
118a874
Move utility scripts to tests/inspect-ai/utils
karangattu Jul 25, 2025
e4ca7e3
Add Makefile targets for updating testing docs
karangattu Jul 25, 2025
45f7fb8
Update repomix install check and expand doc testing API
karangattu Jul 25, 2025
822637a
Update testing guidelines to skip icons and plots
karangattu Jul 25, 2025
c896498
Add new test dependencies to pyproject.toml
karangattu Jul 25, 2025
63e6a81
Add workflow for validating test generation prompts
karangattu Jul 25, 2025
8f91d19
Update dependency installation in CI workflow
karangattu Jul 25, 2025
8f9a6a0
Add py-shiny setup step to workflow
karangattu Jul 25, 2025
40174fc
Update test generation workflow dependencies and comments
karangattu Jul 25, 2025
a6919e2
Update CI workflow for test prompt validation
karangattu Jul 25, 2025
14a9dc8
Switch PR comment to sticky-pull-request-comment action
karangattu Jul 25, 2025
23ed1f6
Optimize CI workflow with caching and env vars
karangattu Jul 25, 2025
5adfca2
Update inspect-ai installation in workflow and dependencies
karangattu Jul 25, 2025
aabb772
Update YAML quoting and cache key in workflow
karangattu Jul 25, 2025
13dfdb5
Add AI-powered test generator for Shiny apps
karangattu Jul 25, 2025
e746054
Rename workflow files from 'validate' to 'verify'
karangattu Jul 27, 2025
655adff
Update testing prompt with new argument and formatting rules
karangattu Jul 28, 2025
c88e822
Clarify keyword-only args rule in test prompt
karangattu Jul 28, 2025
d418745
Improve test generation workflow reliability and reporting
karangattu Jul 28, 2025
ce26dd4
Update Playwright cache key to exclude uv.lock
karangattu Jul 28, 2025
3e76867
Add caching for uv dependencies in CI workflow
karangattu Jul 28, 2025
7535834
Improve test workflow reliability and timeout handling
karangattu Jul 28, 2025
ea096ec
Remove parallel test execution from workflow
karangattu Jul 28, 2025
038a350
Refactor CI test evaluation and update Python version
karangattu Aug 5, 2025
97b32ba
Merge branch 'main' into integrate-test-generator
karangattu Aug 5, 2025
e973f82
Remove caching steps from CI workflow
karangattu Aug 5, 2025
9ac4d9d
Improve relative app path handling in test generation
karangattu Aug 8, 2025
fd725ee
Remove duplicate dependency and clean up docstrings
karangattu Aug 14, 2025
686369b
Merge branch 'main' into integrate-test-generator
schloerke Aug 14, 2025
4e2e881
Refactor test generation setup and update dependencies
karangattu Aug 14, 2025
210b978
Merge branch 'integrate-test-generator' of https://github.com/posit-d…
karangattu Aug 14, 2025
ad13625
Refactor pytest test generation to _generate package
karangattu Aug 14, 2025
856dc32
Refactor PR comment generation to Python script
karangattu Aug 14, 2025
2b64047
Refactor result and summary file checks in workflow
karangattu Aug 14, 2025
958aa67
Add type annotations and improve type safety in scripts
karangattu Aug 15, 2025
0d3eeb1
remove exit 1
karangattu Aug 15, 2025
9edfd2c
Refactor evaluation script to use inspect_ai imports
karangattu Aug 15, 2025
013d0b1
Update type hint for prepare_comment argument
karangattu Aug 15, 2025
cd19207
Update model references and add test examples
karangattu Aug 15, 2025
86ce21e
Add Playwright browser installation to workflow
karangattu Aug 15, 2025
7230fd8
Clarify plot testing and scope in testing prompt
karangattu Aug 15, 2025
caa79c3
Improve relative path handling for create_app_fixture
karangattu Aug 15, 2025
6a6a9a1
Add navset menu test example to prompts
karangattu Aug 20, 2025
5c66ef7
Update navset section heading in test prompt
karangattu Aug 20, 2025
c98f332
Merge branch 'main' into integrate-test-generator
karangattu Aug 21, 2025
f3788be
Add instruction to use documented parameter names in prompts
karangattu Aug 21, 2025
0accf10
Merge branch 'integrate-test-generator' of https://github.com/posit-d…
karangattu Aug 21, 2025
4c05510
Remove extra instruction from system prompt generation
karangattu Aug 21, 2025
46599a3
Expand Playwright controller documentation
karangattu Aug 21, 2025
7d420e7
Add configurable timeouts and parallelism to test script
karangattu Aug 21, 2025
50739f9
Add LLM token usage reporting and improve test script
karangattu Aug 21, 2025
68bdfc2
Remove single test function enforcement in instructions
karangattu Aug 21, 2025
503e248
Increase cost display precision to 4 decimal places
karangattu Aug 21, 2025
8fd6e28
Add OpenAI model pricing and detailed cost calculation
karangattu Aug 21, 2025
d89793b
Update prompts and fix typo in evaluation script
karangattu Aug 21, 2025
fb96b6a
Update model references and remove deprecated models
karangattu Aug 21, 2025
8d7904f
Update model to gpt-5-mini in shiny_test_evaluation
karangattu Aug 21, 2025
1ca24c1
Enhance evaluation to use actual component IDs from app code
karangattu Aug 21, 2025
bc440c5
Refactor component ID extraction and clean up evaluation script
karangattu Aug 21, 2025
f196107
Fix whitespace and reformat evaluation instructions
karangattu Aug 21, 2025
a03ae11
Include partial grades in pass rate calculation
karangattu Aug 21, 2025
df65e84
Merge branch 'main' into integrate-test-generator
karangattu Aug 22, 2025
4d44a1c
Average test results across multiple attempts
karangattu Aug 22, 2025
3b1c908
Enhance PR comment with averaged test results
karangattu Aug 22, 2025
06b3706
Add debug logging to test evaluation scripts
karangattu Aug 22, 2025
f028254
Remove pytest results from prepare_comment script
karangattu Aug 22, 2025
69d2ff4
Remove unused code for pytest and combined results
karangattu Aug 22, 2025
55ebb89
Refactor dotenv loading and logging setup in test generation
karangattu Aug 29, 2025
d7a06ab
Merge branch 'main' into integrate-test-generator
karangattu Aug 29, 2025
3ef36f3
Merge branch 'main' into integrate-test-generator
schloerke Sep 5, 2025
a5bd55c
Update comment
schloerke Sep 5, 2025
f8f888e
Make method internal
schloerke Sep 5, 2025
8a354ac
Fix outdated openai requirement
schloerke Sep 5, 2025
f17f6c2
Remove a layer of folder nesting
schloerke Sep 5, 2025
56e9419
Use a variable for the results folder. Ignore the new folder
schloerke Sep 5, 2025
7cda751
Add make commands for running and installing inspect-ai tests
schloerke Sep 5, 2025
1828320
Allow multiple PRs to work at the same time. But cancel within the sa…
schloerke Sep 5, 2025
b4fc642
Update for latest inspect-ai xml output
schloerke Sep 5, 2025
de5f74b
Make pytest path relative to current working directory
schloerke Sep 5, 2025
247d6d3
Use shiny setup helper. Test on workflow files updates
schloerke Sep 5, 2025
73e8627
typo
schloerke Sep 5, 2025
6eed4b2
Update verify-test-generation-prompts.yaml
schloerke Sep 5, 2025
b95211f
Update GHA names
schloerke Sep 5, 2025
03f2ed3
`make update-testing-docs`
schloerke Sep 5, 2025
ff2af1b
Update verify-testing-docs-on-change.yml
schloerke Sep 5, 2025
a1e7273
diagnostics
schloerke Sep 5, 2025
4808812
Update verify-testing-docs-on-change.yml
schloerke Sep 5, 2025
c0687f3
Reverse logic?
schloerke Sep 5, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
97 changes: 97 additions & 0 deletions .github/workflows/verify-test-generation-prompts.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,97 @@
name: Verify test generation prompts

on:
pull_request:
paths:
- ".github/workflows/verify-test-generation-prompts.yml"
- "shiny/pytest/_generate/**"
workflow_dispatch:

concurrency:
group: "prompt-test-generation-${{ github.event.pull_request.number || 'dispatch' }}"
cancel-in-progress: true

env:
PYTHON_VERSION: "3.13"
ATTEMPTS: 3
PYTHONUNBUFFERED: 1

jobs:
verify-test-generation-prompts:
runs-on: ubuntu-latest
timeout-minutes: 30

steps:
- name: Checkout repository
uses: actions/checkout@v4
with:
fetch-depth: 0

- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: ${{ env.PYTHON_VERSION }}

- name: Setup py-shiny
id: install
uses: ./.github/py-shiny/setup

- name: Install Test Generator Dependencies
run: |
make ci-install-ai-deps

- name: Run Evaluation and Tests 3 Times
env:
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
PYTHONUNBUFFERED: 1
timeout-minutes: 25
run: |
make run-test-ai-evaluation

- name: Upload test results
if: always()
uses: actions/upload-artifact@v4
with:
name: test-results-${{ github.run_id }}
path: |
test-results-inspect-ai/
retention-days: 7

- name: Process Results
timeout-minutes: 2
run: |
# Results are already averaged by the bash script, just verify they exist
if [ ! -f "test-results-inspect-ai/summary.json" ]; then
echo "No averaged summary found at test-results-inspect-ai/summary.json"
ls -la test-results-inspect-ai/
exit 1
else
echo "Using averaged results from all attempts"
cat test-results-inspect-ai/summary.json
fi

- name: Check Quality Gate
timeout-minutes: 2
run: |
if [ ! -f "test-results-inspect-ai/summary.json" ]; then
echo "Summary file not found at test-results-inspect-ai/summary.json"
ls -la test-results-inspect-ai/
exit 1
else
echo "Found summary file, checking quality gate..."
python tests/inspect-ai/utils/scripts/quality_gate.py test-results-inspect-ai/
fi

- name: Prepare Comment Body
if: github.event_name == 'pull_request'
timeout-minutes: 1
run: |
python tests/inspect-ai/scripts/prepare_comment.py test-results-inspect-ai/summary.json

- name: Comment PR Results
if: github.event_name == 'pull_request'
uses: marocchino/sticky-pull-request-comment@v2
with:
header: inspect-ai-results
path: comment_body.txt
93 changes: 93 additions & 0 deletions .github/workflows/verify-testing-docs-on-change.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
name: Verify testing documentation for changes

on:
pull_request:
paths:
- ".github/workflows/verify-testing-docs-on-change.yml"
- "docs/_quartodoc-testing.yml"
- "shiny/playwright/controller/**"

permissions:
contents: write
pull-requests: write

jobs:
verify-testing-docs:
runs-on: ubuntu-latest
if: github.event_name == 'pull_request'

steps:
- name: Checkout repository
uses: actions/checkout@v4
with:
fetch-depth: 0

- name: Setup py-shiny
id: install
uses: ./.github/py-shiny/setup

- name: Install dependencies
run: |
make ci-install-docs

- name: Update testing docs and check for changes
id: check-docs-changes
run: |
# Store the current state of the documentation file
cp shiny/pytest/_generate/_data/testing-documentation.json testing-documentation-before.json

# Run the make command to update testing docs
make update-testing-docs

if [[ ! -f testing-documentation-before.json || ! -f shiny/pytest/_generate/_data/testing-documentation.json ]]; then
echo "One or both documentation files are missing."
exit 1
fi

# Check if the documentation file has changed
if diff -q testing-documentation-before.json shiny/pytest/_generate/_data/testing-documentation.json > /dev/null 2>&1; then
echo "docs_changed=true" >> $GITHUB_OUTPUT
echo "The generated documentation is out of sync with the current controller changes."
echo "\n\n"
diff -q testing-documentation-before.json shiny/pytest/_generate/_data/testing-documentation.json || true
echo "\n\n"
else
echo "docs_changed=false" >> $GITHUB_OUTPUT
echo "Documentation file is up to date"
fi

- name: Comment on PR about testing docs update
if: steps.check-docs-changes.outputs.docs_changed == 'true'
uses: marocchino/sticky-pull-request-comment@v2
with:
header: testing-docs-update
message: |
🚨 **Testing Documentation Out of Sync**

We detected changes in the `shiny/playwright/controller` directory that affect the testing documentation used by the `shiny add test` command.

**The generated documentation is out of sync with your controller changes. Please run:**

```bash
make update-testing-docs
```

**Then commit the updated `shiny/pytest/_generate/_data/testing-documentation.json` file.**

<details><summary>Additional details</summary>

The updated documentation file ensures that the AI test generator has access to the latest controller API documentation.

</details>

❌ **This check will fail until the documentation is updated and committed.**

---
*This comment was automatically generated by the `verify-testing-docs-on-change.yml` workflow.*

- name: Remove comment when no controller changes or docs are up to date
if: steps.check-docs-changes.outputs.docs_changed == 'false'
uses: marocchino/sticky-pull-request-comment@v2
with:
header: testing-docs-update
delete: true
7 changes: 7 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -123,3 +123,10 @@ shiny_bookmarks/

# setuptools_scm
shiny/_version.py

# Other
tests/inspect-ai/apps/*/test_*.py
test-results.xml
results-inspect-ai/
test-results-inspect-ai/
tests/inspect-ai/scripts/test_metadata.json
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

### New features

* Added AI-powered test generator for Shiny applications. Use `shiny add test` to automatically generate comprehensive Playwright tests for your apps using AI models from Anthropic or OpenAI. (#2041)

* `navset_card_*()` now has a `full_screen` option to support `card()`'s existing full-screen functionality. (#1451)

* Added `ui.insert_nav_panel()`, `ui.remove_nav_panel()`, and `ui.update_nav_panel()` to support dynamic navigation. (#90)
Expand Down
29 changes: 29 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -123,6 +123,35 @@ docs-quartodoc: FORCE
@echo "-------- Making quartodoc docs --------"
@cd docs && make quartodoc

install-repomix: install-npm FORCE ## Install repomix if not already installed
@echo "-------- Installing repomix if needed --------"
@if ! command -v repomix > /dev/null 2>&1; then \
echo "Installing repomix..."; \
npm install -g repomix; \
else \
echo "repomix is already installed"; \
fi

update-testing-docs-repomix: install-repomix FORCE ## Generate repomix output for testing docs
@echo "-------- Generating repomix output for testing docs --------"
repomix docs/api/testing -o tests/inspect-ai/utils/scripts/repomix-output-testing.xml

update-testing-docs-process: FORCE ## Process repomix output to generate testing documentation JSON
@echo "-------- Processing testing documentation --------"
python tests/inspect-ai/utils/scripts/process_docs.py --input tests/inspect-ai/utils/scripts/repomix-output-testing.xml --output shiny/pytest/_generate/_data/testing-documentation.json
@echo "-------- Cleaning up temporary files --------"
rm -f tests/inspect-ai/utils/scripts/repomix-output-testing.xml

update-testing-docs: docs update-testing-docs-repomix update-testing-docs-process FORCE ## Update testing documentation (full pipeline)
@echo "-------- Testing documentation update complete --------"

ci-install-ai-deps: FORCE
uv pip install -e ".[dev,test,testgen]"
$(MAKE) install-playwright

run-test-ai-evaluation: FORCE ## Run the AI evaluation script for tests
@echo "-------- Running AI evaluation for tests --------"
bash ./tests/inspect-ai/scripts/run-test-evaluation.sh

install-npm: FORCE
$(if $(shell which npm), @echo -n, $(error Please install node.js and npm first. See https://nodejs.org/en/download/ for instructions.))
Expand Down
7 changes: 7 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -124,6 +124,13 @@ doc = [
"quartodoc>=0.8.1",
"griffe>=1.3.2",
]
testgen = [
"chatlas[anthropic,openai]",
"openai>=1.104.1",
"anthropic>=0.62.0",
"inspect-ai>=0.3.129",
"pytest-timeout",
]


[project.urls]
Expand Down
5 changes: 4 additions & 1 deletion pyrightconfig.json
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,10 @@
"docs",
"tests/playwright/deploys/*/app.py",
"shiny/templates",
"tests/playwright/ai_generated_apps",
"tests/playwright/ai_generated_apps/*/*/app*.py",
"tests/inspect-ai/apps/*/app*.py",
"shiny/pytest/_generate/_main.py",
"tests/inspect-ai/scripts/evaluation.py"
],
"typeCheckingMode": "strict",
"reportImportCycles": "none",
Expand Down
34 changes: 24 additions & 10 deletions shiny/_main.py
Original file line number Diff line number Diff line change
Expand Up @@ -533,11 +533,10 @@ def add() -> None:
@add.command(
help="""Add a test file for a specified Shiny app.

Add an empty test file for a specified app. You will be prompted with a destination
folder. If you don't provide a destination folder, it will be added in the current
working directory based on the app name.
Generate a comprehensive test file for a specified app using AI. The generator
will analyze your app code and create appropriate test cases with assertions.

After creating the shiny app file, you can use `pytest` to run the tests:
After creating the test file, you can use `pytest` to run the tests:

pytest TEST_FILE
"""
Expand All @@ -546,22 +545,37 @@ def add() -> None:
"--app",
"-a",
type=str,
help="Please provide the path to the app file for which you want to create a test file.",
help="Path to the app file for which you want to generate a test file.",
)
@click.option(
"--test-file",
"-t",
type=str,
help="Please provide the name of the test file you want to create. The basename of the test file should start with `test_` and be unique across all test files.",
help="Path for the generated test file. If not provided, will be auto-generated.",
)
@click.option(
"--provider",
type=click.Choice(["anthropic", "openai"]),
default="anthropic",
help="AI provider to use for test generation.",
)
@click.option(
"--model",
type=str,
help="Specific model to use (optional). Examples: haiku3.5, sonnet, gpt-5, gpt-5-mini",
)
# Param for app.py, param for test_name
def test(
app: Path | None,
test_file: Path | None,
app: str | None,
test_file: str | None,
provider: str,
model: str | None,
) -> None:
from ._main_add_test import add_test_file
from ._main_generate_test import generate_test_file

add_test_file(app_file=app, test_file=test_file)
generate_test_file(
app_file=app, output_file=test_file, provider=provider, model=model
)


@main.command(
Expand Down
Loading
Loading