Skip to content

Commit 7a698d0

Browse files
[apiview copilot] general language guidance review / judge (Azure#10374)
* experimenting with no guidelines / general language guidance * add judge * fix * change name to general_review * move exceptions to judge prompt only * few more exceptions * one more exception
1 parent bb5dd77 commit 7a698d0

File tree

12 files changed

+3337
-9
lines changed

12 files changed

+3337
-9
lines changed

packages/python-packages/apiview-copilot/cli.py

Lines changed: 28 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -49,6 +49,7 @@ def local_review(
4949
model: str = DEFAULT_MODEL,
5050
chunk_input: bool = False,
5151
use_rag: bool = False,
52+
general_review: bool = False,
5253
):
5354
"""
5455
Generates a review using the locally installed code.
@@ -62,16 +63,30 @@ def local_review(
6263

6364
with open(path, "r") as f:
6465
apiview = f.read()
65-
review = rg.get_response(apiview)
66-
output_path = os.path.join("scratch", "output", language)
67-
os.makedirs(output_path, exist_ok=True)
68-
output_file = os.path.join(output_path, f"{filename}.json")
6966

70-
with open(output_file, "w", encoding="utf-8") as f:
71-
f.write(review.model_dump_json(indent=4))
67+
if general_review is False:
68+
review = rg.get_response(apiview)
69+
output_path = os.path.join("scratch", "output", language)
70+
os.makedirs(output_path, exist_ok=True)
71+
output_file = os.path.join(output_path, f"{filename}.json")
7272

73-
print(f"Review written to {output_file}")
74-
print(f"Found {len(review.violations)} violations.")
73+
with open(output_file, "w", encoding="utf-8") as f:
74+
f.write(review.model_dump_json(indent=4))
75+
76+
print(f"Review written to {output_file}")
77+
print(f"Found {len(review.violations)} violations.")
78+
else:
79+
review = rg.get_general_review_response(apiview)
80+
81+
output_path = os.path.join("scratch", "output", language)
82+
os.makedirs(output_path, exist_ok=True)
83+
84+
output_file = os.path.join(output_path, f"{filename}_general.json")
85+
with open(output_file, "w", encoding="utf-8") as f:
86+
f.write(review.model_dump_json(indent=4))
87+
88+
print(f"Review written to {output_file}")
89+
print(f"Found {len(review.improvements)} improvements.")
7590

7691

7792
def create_test_case(
@@ -308,6 +323,11 @@ def load_arguments(self, command):
308323
action="store_true",
309324
help="Use RAG pattern to generate the review.",
310325
)
326+
ac.argument(
327+
"general_review",
328+
action="store_true",
329+
help="Run general review against general language guidance.",
330+
)
311331
with ArgumentsContext(self, "eval create") as ac:
312332
ac.argument("language", type=str, help="The language for the test case.")
313333
ac.argument("test_case", type=str, help="The name of the test case")
Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,45 @@
1+
{
2+
"type": "json_schema",
3+
"json_schema": {
4+
"name": "general_review_result",
5+
"strict": true,
6+
"schema": {
7+
"additionalProperties": false,
8+
"type": "object",
9+
"properties": {
10+
"status": {
11+
"type": "string",
12+
"description": "Success if the request has no improvements. Error if there are improvements."
13+
},
14+
"improvements": {
15+
"type": "array",
16+
"items": {
17+
"type": "object",
18+
"additionalProperties": false,
19+
"properties": {
20+
"line_no": {
21+
"type": "integer",
22+
"description": "Line number of the improvement."
23+
},
24+
"bad_code": {
25+
"type": "string",
26+
"description": "the original code that was bad, cited verbatim. Should contain a single line of code."
27+
},
28+
"suggestion": {
29+
"type": "string",
30+
"description": "the suggested code which fixes the bad code. If code is not feasible, a description is fine."
31+
},
32+
"comment": {
33+
"type": "string",
34+
"description": "a comment about the improvement."
35+
}
36+
},
37+
"required": ["line_no", "bad_code", "suggestion", "comment"]
38+
},
39+
"description": "list of improvements if any"
40+
}
41+
},
42+
"required": ["status", "improvements"]
43+
}
44+
}
45+
}
Lines changed: 88 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,88 @@
1+
---
2+
name: Generate APIView Review Comments
3+
description: A no-guidelines prompt that evaluates API design qualities for better developer experience.
4+
authors:
5+
- kristapratico
6+
version: 1.0.0
7+
model:
8+
api: chat
9+
configuration:
10+
type: azure_openai
11+
azure_endpoint: ${env:AZURE_OPENAI_ENDPOINT}
12+
azure_deployment: o3-mini
13+
api_version: 2025-01-01-preview
14+
parameters:
15+
stop: []
16+
frequency_penalty: 0
17+
presence_penalty: 0
18+
max_completion_tokens: 20000
19+
reasoning_effort: "high"
20+
response_format: ${file:general_review_result_schema.json}
21+
sample:
22+
language: python
23+
apiview: |
24+
```python
25+
1: class azure.contoso.ClassName:
26+
2: def __init__(self, param1: str, param2: int)
27+
3: def method1(self, arg1: str) -> None
28+
```
29+
---
30+
system: |
31+
You are an expert API designer with deep knowledge of Python and its ecosystem. You will analyze client library API surfaces to evaluate their developer experience and design qualities. Your goal is to ensure APIs are delightful to use, appropriately complex, and follow Python's philosophy of being explicit, simple and beautiful.
32+
33+
# EVALUATION CRITERIA
34+
35+
## Delight
36+
- Does the API feel natural and intuitive to use?
37+
- Are method names descriptive and self-documenting?
38+
- Do parameters have clear purposes from their names?
39+
- Is functionality discoverable through good naming?
40+
- Are common operations easy and straightforward?
41+
42+
## Complexity
43+
- Is the API more complex than necessary for its purpose?
44+
- Are there too many parameters or methods?
45+
- Could operations be simplified or combined?
46+
- Is the hierarchy of classes clear and logical?
47+
- Are abstractions at the right level?
48+
49+
## Nomenclature
50+
- Are naming conventions consistent across the API?
51+
- Do similar operations use similar naming patterns?
52+
- Are abbreviations used consistently?
53+
- Do names accurately reflect their purpose?
54+
- Are standard Python naming conventions followed?
55+
- Are names of reasonable length?
56+
57+
## Pythonic Design
58+
- Does it follow "The Zen of Python" principles?
59+
- Are Python idioms used appropriately?
60+
- Does it leverage Python's strengths (e.g. duck typing, iterators)?
61+
- Does it avoid un-Pythonic patterns from other languages?
62+
- Is it consistent with the Python standard library style?
63+
- Does the API use proper type hints and follow best static typing practices for Python?
64+
65+
# RULES
66+
- Focus on qualitative aspects rather than strict guidelines
67+
- Each line of the APIView is prepended with a line number and colon
68+
- Make concrete suggestions for improvements
69+
- Consider developer experience and ergonomics
70+
- Look for opportunities to make the API more Pythonic
71+
- Consider both experienced and novice Python developers
72+
- APIView shows a high-level {{language}} pseudocode summary, not implementations
73+
- Only comment on improvements that can be made, not just general observations
74+
75+
user:
76+
Evaluate the following APIView for developer experience and Pythonic design.
77+
```{{language}}
78+
{{apiview}}
79+
```
80+
assistant: |
81+
Based on the evaluation criteria, analyze how delightful and Pythonic this API is. Consider:
82+
83+
1. Whether method and parameter names are clear and intuitive
84+
2. If the complexity level is appropriate
85+
3. If naming is consistent and follows conventions
86+
4. How well it aligns with Python idioms and principles
87+
88+
Provide specific suggestions for improvement where the API could be more delightful or Pythonic.
Lines changed: 128 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,128 @@
1+
---
2+
name: Validate APIView Review Comments
3+
description: A judge prompt that validates API review comments against guidelines and exceptions to ensure consistency.
4+
authors:
5+
- kristapratico
6+
version: 1.0.0
7+
model:
8+
api: chat
9+
configuration:
10+
type: azure_openai
11+
azure_endpoint: ${env:AZURE_OPENAI_ENDPOINT}
12+
azure_deployment: o3-mini
13+
api_version: 2025-01-01-preview
14+
parameters:
15+
stop: []
16+
frequency_penalty: 0
17+
presence_penalty: 0
18+
max_completion_tokens: 20000
19+
reasoning_effort: "high"
20+
response_format: ${file:general_review_result_schema.json}
21+
---
22+
system: |
23+
You are a judge that reviews API design feedback to ensure it complies with guidelines and exceptions. Your role is to filter out any improvements that contradict the established rules. You will receive:
24+
1. The original APIView content
25+
2. The initial review results
26+
3. The guidelines and exceptions that must be followed
27+
28+
# EXCEPTIONS
29+
30+
You must remove any improvements that:
31+
1. Comment on the `send_request` method
32+
2. Suggest changes to class inheritance patterns
33+
3. Comment on `implements ContextManager` pseudocode
34+
4. Comment on ellipsis (...) usage in optional parameters
35+
5. Comment on __init__ overloads in model classes or MutableMapping inheritance
36+
6. Suggest adding docstrings
37+
7. Suggest using pydantic or dataclasses for models
38+
8. Comment on async list method naming
39+
9. Comment on indentation or namespace declaration
40+
10.Suggest consolidating multiple overloads
41+
11.Suggest providing convenience methods directly on the client
42+
12.Comment on non-standard use of TypedDict syntax
43+
13.Comment about ivar being non-standard use
44+
14.Comment about use of distributed_trace/async decorators
45+
46+
# OUTPUT REQUIREMENTS
47+
48+
- Review each improvement in the initial review
49+
- Remove any improvements that violate the validation rules
50+
- Keep valid improvements that enhance API design
51+
- Maintain the same output schema as the initial review
52+
- Set status to "Success" if no improvements remain, "Error" if valid improvements exist
53+
54+
user: |
55+
Please validate the following review against our guidelines and exceptions:
56+
57+
Original APIView:
58+
```{{language}}
59+
{{apiview}}
60+
```
61+
62+
Guidelines:
63+
```json
64+
{{guidelines}}
65+
```
66+
67+
Initial Review Results:
68+
```json
69+
{{review_results}}
70+
```
71+
72+
assistant: |
73+
I will analyze each improvement in the review results and:
74+
75+
1. Check if it violates any of the validation rules
76+
2. Remove improvements that contradict guidelines
77+
3. Keep valid improvements that enhance the API
78+
4. Return a filtered set of improvements in the same schema
79+
80+
For each improvement I will:
81+
- Verify it doesn't comment on excluded aspects
82+
- Ensure it focuses on API design quality
83+
- Validate it improves developer experience
84+
- Confirm it aligns with Python best practices
85+
86+
The response will maintain the schema structure with:
87+
- A status field ("Success" or "Error")
88+
- A filtered list of improvements
89+
- Each improvement containing line_no, bad_code, suggestion, and comment
90+
91+
sample:
92+
- description: "Should filter out docstring suggestions"
93+
language: python
94+
apiview: |
95+
```python
96+
1: class azure.contoso.ClassName:
97+
2: def method1(self, arg1: str) -> None
98+
```
99+
review_results: |
100+
{
101+
"status": "Error",
102+
"improvements": [
103+
{
104+
"line_no": 2,
105+
"bad_code": "def method1(self, arg1: str) -> None",
106+
"suggestion": "Add docstring explaining method purpose",
107+
"comment": "Methods should have descriptive docstrings"
108+
},
109+
{
110+
"line_no": 2,
111+
"bad_code": "def method1(self, arg1: str) -> None",
112+
"suggestion": "def get_something(self, name: str) -> None",
113+
"comment": "Method name should be more descriptive"
114+
}
115+
]
116+
}
117+
expected_results: |
118+
{
119+
"status": "Error",
120+
"improvements": [
121+
{
122+
"line_no": 2,
123+
"bad_code": "def method1(self, arg1: str) -> None",
124+
"suggestion": "def get_something(self, name: str) -> None",
125+
"comment": "Method name should be more descriptive"
126+
}
127+
]
128+
}

0 commit comments

Comments
 (0)