You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Enhance evaluation to use actual component IDs from app code
Added extract_component_ids to parse Shiny app code and detect actual component and output IDs. Updated evaluation instructions and sample generation to ensure tests are only evaluated against components that exist in the app code, ignoring criteria for non-existent components. Improved grading instructions and made evaluation more robust to app-specific variations.
- Whether the test creates an instance of the InputRadioButtons controller with id "radio_buttons"
60
67
- Ensure that the radio buttons component is verified for its label, choices, and selected value.
61
68
- Ensure that the test checks the radio buttons state changes and verifies the output text accordingly.
62
-
- Whether the test creates an instance of the InputText controller with id "text_input"
63
-
- Ensure that the text input component is verified for its label and initial value.
64
-
- Ensure that the test checks the text input state changes and verifies the output text accordingly.
65
-
- Whether the test creates an instance of the OutputText controller with id "action_button_value", "checkbox_value", "date_selector_value", "numeric_input_value", "radio_buttons_value", and "text_input_value"
69
+
- Whether the test creates an instance of the InputSwitch controller with id "switch"
70
+
- Ensure that the switch component is verified for its label and state.
71
+
- Ensure that the test checks the switch state changes and verifies the output text accordingly.
72
+
- Whether the test creates an instance of the OutputText controller with ids "action_button_value", "checkbox_value", "date_selector_value", "numeric_input_value", "radio_buttons_value", and "switch_value"
66
73
- Ensure that the output text components are verified for their initial values and updated values based on user interactions.
67
-
- Ensure that the Output Data Frame controller with id "data_table" is created and verified for its initial state.
74
+
- Whether the test creates an instance of the OutputDataFrame controller with id "data_grid"
75
+
- Ensure that the data grid component is verified for its initial state and updates correctly based on user interactions.
76
+
77
+
IMPORTANT: Only evaluate based on components and IDs that actually exist in the app code. The test should only test functionality that is actually present in the app.
1. ONLY evaluate components that ACTUALLY EXIST in the app code - the detected IDs above show what's really in the app
233
+
2. If a component mentioned in the criteria doesn't exist in the app code, IGNORE that part of the criteria completely
234
+
3. If the app uses different IDs than what's in the criteria (e.g., "data_grid" instead of "data_table"), use the actual IDs from the app
235
+
4. Check if the test code properly tests all the EXISTING components (creating controllers, verifying attributes, testing interactions, etc.)
236
+
5. The test should receive a Complete grade if it adequately tests all components that actually exist in the app"""
143
237
144
238
ifapp_specific_guidance:
145
-
target_answer=f"CORRECT: A test that meets all specified criteria.\n{app_specific_guidance.strip()}"
239
+
target_answer=f"CORRECT: A test that meets all specified criteria for components that actually exist in the app code.\n{app_specific_guidance.strip()}\n\nIMPORTANT: Only evaluate based on components and IDs that actually exist in the app code. Ignore criteria for components that don't exist."
146
240
else:
147
-
target_answer="CORRECT: A test that meets all specified criteria."
241
+
target_answer="CORRECT: A test that meets all specified criteria for components that actually exist in the app code."
You are an expert evaluator for Shiny application testing. Your task is to evaluate test code quality based STRICTLY on the provided criteria.
274
+
You are an expert evaluator for Shiny application testing. Your task is to evaluate test code quality based ONLY on the provided app code and specific criteria.
181
275
182
276
CRITICAL INSTRUCTIONS:
183
-
1. ONLY evaluate based on the specific criteria listed in the "criterion" section
184
-
2. DO NOT add your own criteria or suggestions beyond what is explicitly stated
185
-
3. DO NOT penalize for missing features that are not mentioned in the criteria
186
-
4. DO NOT suggest improvements unless they directly relate to the specified criteria
187
-
5. For non-Shiny frameworks (R Shiny, Streamlit, etc.), the test code should be empty - grade as Complete if empty
277
+
1. FIRST, carefully analyze the app code to understand what components ACTUALLY exist in the app
278
+
2. Extract a precise list of all component IDs present in the app code
279
+
3. IGNORE any criteria that reference UI components or IDs that don't exist in the actual app code
280
+
4. ONLY evaluate based on specific criteria that match components in the actual app
281
+
5. DO NOT add your own criteria or suggestions beyond what is explicitly stated
282
+
6. DO NOT penalize for missing features that are not mentioned in the criteria OR don't exist in the app
283
+
7. For non-Shiny frameworks (R Shiny, Streamlit, etc.), the test code should be empty - grade as Complete if empty
284
+
8. If test_code tests components that are actually in the app, it should get a 'C' grade even if it doesn't test components mentioned in the criteria that don't exist in the app
188
285
189
286
EVALUATION PROCESS:
190
-
- Read the specific criteria for this app
191
-
- Check if the test code implements EXACTLY what is specified
192
-
- Ignore any additional features or missing features not mentioned in the criteria
193
-
- Base your grade solely on whether the specified requirements are met
287
+
- First carefully extract all component IDs from the app code (e.g., "action_button", "checkbox", etc.)
288
+
- Compare these IDs with those mentioned in the criteria
289
+
- ONLY evaluate criteria for components that actually exist in the app code
290
+
- COMPLETELY IGNORE criteria about components that don't exist in the app
291
+
- Grade based ONLY on how well the test code tests the components that actually exist
292
+
293
+
MOST IMPORTANT:
294
+
- If the app does not contain a component mentioned in the criteria, IGNORE that part of the criteria completely
295
+
- If the app uses a different ID than what's in the criteria (e.g., "data_grid" instead of "data_table"), use the actual ID from the app
194
296
195
297
GRADING SCALE:
196
-
- C (Complete): ALL specified criteria are met
197
-
- P (Partial): MOST specified criteria are met, minor gaps in the specified requirements
198
-
- I (Incomplete): MAJOR specified criteria are missing or incorrectly implemented
298
+
- C (Complete): ALL criteria for EXISTING components are met
299
+
- P (Partial): MOST criteria for EXISTING components are met, with minor gaps
300
+
- I (Incomplete): MAJOR criteria for EXISTING components are missing or incorrectly implemented
199
301
200
302
Provide your evaluation in the following format:
201
303
GRADE: [C/P/I]
202
-
Explanation: [Brief explanation focusing ONLY on how well the specified criteria were met]
304
+
Explanation: [Brief explanation focusing ONLY on how well the specified criteria were met for EXISTING components]
0 commit comments