Skip to content

Commit 5df059a

Browse files
authored
add case study (zilliztech#184)
Signed-off-by: ChengZi <[email protected]>
1 parent 2efe1e9 commit 5df059a

File tree

13 files changed

+1260
-1
lines changed

13 files changed

+1260
-1
lines changed

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -56,6 +56,7 @@ Thumbs.db
5656

5757
__pycache__/
5858
*.log
59+
!evaluation/case_study/**/*.log
5960

6061
.claude/*
6162
CLAUDE.md

evaluation/README.md

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -103,4 +103,8 @@ The evaluation framework is designed to be reproducible and can be easily extend
103103

104104
![MCP Efficiency Analysis](../assets/mcp_efficiency_analysis_chart.png)
105105

106-
*The chart above shows the dramatic efficiency improvements achieved by Claude Context MCP while maintaining equivalent retrieval quality. Token usage and tool calls are significantly reduced with no loss in F1-score performance.*
106+
*The chart above shows the dramatic efficiency improvements achieved by Claude Context MCP. The token usage and tool calls are significantly reduced.*
107+
108+
## Case Study
109+
110+
For detailed analysis of why grep-only approaches have limitations and how semantic search addresses these challenges, please refer to our **[Case Study](./case_study/)** which provides in-depth comparisons and analysis on the this experiment results.

evaluation/case_study/README.md

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
# Case Study
2+
3+
This directory includes some case analysis. We compare the both method(grep + Claude Context semantic search) and the traditional grep only method.
4+
5+
These cases are selected from the Princeton NLP's [SWE-bench_Verified](https://openai.com/index/introducing-swe-bench-verified/) dataset. The results and the logs are generated by the [run_evaluation.py](../run_evaluation.py) script. For more details, please refer to the [evaluation README.md](../README.md) file.
6+
7+
- 📁 [django_14170](./django_14170/): Query optimization in YearLookup breaks filtering by "__iso_year"
8+
- 📁 [pydata_xarray_6938](./pydata_xarray_6938/): `.swap_dims()` can modify original object
9+
10+
Each case study includes:
11+
- **Original Issue**: The GitHub issue description and requirements
12+
- **Problem Analysis**: Technical breakdown of the bug and expected solution
13+
- **Method Comparison**: Detailed comparison of both approaches
14+
- **Conversation Logs**: The interaction records showing how the LLM agent call the ols and generate the final answer.
15+
- **Results**: Performance metrics and outcome analysis
16+
17+
## Key Results
18+
Compared with traditional grep only, the both method(grep + Claude Context semantic search) is more efficient and accurate.
19+
20+
## Why Grep Fails
21+
22+
1. **Information Overload** - Generates hundreds of irrelevant matches
23+
2. **No Semantic Understanding** - Only literal text matching
24+
3. **Context Loss** - Can't understand code relationships
25+
4. **Inefficient Navigation** - Produces many irrelevant results
26+
27+
## How Grep + Semantic Search Wins
28+
29+
1. **Intelligent Filtering** - Automatically ranks by relevance
30+
2. **Conceptual Understanding** - Grasps code meaning and relationships
31+
3. **Efficient Navigation** - Direct targeting of relevant sections
Lines changed: 128 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,128 @@
1+
# Django 14170: YearLookup ISO Year Bug
2+
3+
A comparison showing how both methods(grep + semantic search) outperform grep-based approaches for complex Django ORM bugs.
4+
5+
<details>
6+
<summary><strong>📋 Original GitHub Issue</strong></summary>
7+
8+
## Query optimization in YearLookup breaks filtering by "__iso_year"
9+
10+
The optimization to use BETWEEN instead of the EXTRACT operation in YearLookup is also registered for the "__iso_year" lookup, which breaks the functionality provided by ExtractIsoYear when used via the lookup.
11+
12+
**Problem**: When using `__iso_year` filters, the `YearLookup` class applies standard BETWEEN optimization which works for calendar years but fails for ISO week-numbering years.
13+
14+
**Example**:
15+
```python
16+
# This should use EXTRACT('isoyear' FROM ...) but incorrectly uses BETWEEN
17+
DTModel.objects.filter(start_date__iso_year=2020)
18+
# Generates: WHERE "start_date" BETWEEN 2020-01-01 AND 2020-12-31
19+
# Should be: WHERE EXTRACT('isoyear' FROM "start_date") = 2020
20+
```
21+
22+
**Files**: `django/db/models/lookups.py`, `django/db/backends/base/operations.py`
23+
24+
</details>
25+
26+
## Results
27+
28+
| Metric | Both Methods | Grep Method | Improvement |
29+
|--------|-------------|-------------|-------------|
30+
| **Token Usage** | 9,036 | 130,819 | **93% less** |
31+
| **Tool Calls** | 3 | 8 | **62% fewer** |
32+
| **Success Rate** | 50% hit | 0% hit | **Much better** |
33+
34+
## How Each Method Found the Problem
35+
36+
### 🎯 Both Methods
37+
38+
```log
39+
🔧 search_code('YearLookup')
40+
⚙️ Found 10 results for query: "YearLookup" in codebase
41+
1. Code snippet (python) [repo__django__django]
42+
Location: django/db/models/lookups.py:568-577
43+
Context: YearExact class with get_bound_params method
44+
2. Code snippet (python) [repo__django__django]
45+
Location: django/db/models/lookups.py:538-569
46+
Context: YearLookup base class with year_lookup_bounds method
47+
48+
🔧 edit_file(django/db/models/lookups.py)
49+
⚙️ Arguments: {'file_path': 'django/db/models/lookups.py',
50+
'old_string': 'def get_direct_rhs_sql(self, connection, rhs):\n return connection.operators[self.lookup_name] % rhs',
51+
'new_string': "def get_direct_rhs_sql(self, connection, rhs):\n if self.lookup_name == '__iso_year':\n return 'EXTRACT(ISOYEAR FROM %s)' % rhs\n return connection.operators[self.lookup_name] % rhs"}
52+
Result: Successfully modified file
53+
54+
🔧 edit_file(django/db/models/lookups.py)
55+
⚙️ Arguments: {'file_path': 'django/db/models/lookups.py',
56+
'old_string': 'def year_lookup_bounds(self, connection, year):',
57+
'new_string': "def year_lookup_bounds(self, connection, year):\n if self.lookup_name == '__iso_year':\n return (year - 1, year)"}
58+
Result: Successfully modified file
59+
```
60+
61+
**Key Success**: Semantic search immediately understood "YearLookup" as the core concept and found the exact classes that needed modification.
62+
63+
### 🔍 Grep Method
64+
65+
```log
66+
🔧 directory_tree()
67+
⚙️ Arguments: {'path': 'repo__django__django'}
68+
Result: Retrieved 3000+ lines of directory structure (~50k tokens)
69+
Problem: Massive information overload, no direct relevance
70+
71+
🔧 search_text('ExtractIsoYear')
72+
⚙️ Arguments: {'pattern': 'ExtractIsoYear', 'path': 'repo__django__django'}
73+
Result: Found 21 matches across multiple files:
74+
- django/db/models/functions/__init__.py:5 (import statement)
75+
- django/db/models/functions/__init__.py:31 (export list)
76+
- django/db/models/functions/datetime.py:93 (class ExtractIsoYear)
77+
- django/db/models/functions/datetime.py:150 (DateField.register_lookup)
78+
Problem: Most were irrelevant imports and registrations
79+
80+
🔧 edit_file('django/db/models/functions/datetime.py')
81+
⚙️ Arguments: {'old_string': 'ExtractIsoYear.register_lookup(YearExact)', 'new_string': '# ExtractIsoYear.register_lookup(YearExact)'}
82+
Result: Successfully modified file
83+
84+
🔧 edit_file('django/db/models/functions/datetime.py')
85+
⚙️ Arguments: {'old_string': 'ExtractIsoYear.register_lookup(YearGt)', 'new_string': '# ExtractIsoYear.register_lookup(YearGt)'}
86+
Result: Successfully modified file
87+
88+
🔧 edit_file('django/db/models/functions/datetime.py')
89+
⚙️ Arguments: {'old_string': 'ExtractIsoYear.register_lookup(YearGte)', 'new_string': '# ExtractIsoYear.register_lookup(YearGte)'}
90+
Result: Successfully modified file
91+
92+
🔧 edit_file('django/db/models/functions/datetime.py')
93+
⚙️ Arguments: {'old_string': 'ExtractIsoYear.register_lookup(YearLt)', 'new_string': '# ExtractIsoYear.register_lookup(YearLt)'}
94+
Result: Successfully modified file
95+
96+
🔧 edit_file('django/db/models/functions/datetime.py')
97+
⚙️ Arguments: {'old_string': 'ExtractIsoYear.register_lookup(YearLte)', 'new_string': '# ExtractIsoYear.register_lookup(YearLte)'}
98+
Result: Successfully modified file
99+
100+
🔧 edit_file('tests/db_functions/datetime/test_extract_trunc.py')
101+
⚙️ Arguments: {'old_string': "self.assertEqual(DTModel.objects.filter(start_datetime__iso_year=ExtractIsoYear('start_datetime')).count(), 2)", 'new_string': "# self.assertEqual(DTModel.objects.filter(start_datetime__iso_year=ExtractIsoYear('start_datetime')).count(), 2)"}
102+
Result: Successfully modified file
103+
```
104+
105+
**Key Failure**: Text search focused on wrong components (ExtractIsoYear registrations) instead of the actual optimization logic (YearLookup class).
106+
107+
## Why Grep + Semantic Search Won
108+
109+
1. **Conceptual Understanding**: Recognized "YearLookup" as the optimization mechanism causing the issue
110+
2. **Direct Navigation**: Immediately found the relevant classes without noise
111+
3. **Root Cause Focus**: Identified the boundary calculation logic that needed ISO year awareness
112+
4. **Efficient Execution**: 3 targeted operations vs 8 scattered attempts
113+
114+
## Why Grep Failed
115+
116+
1. **Information Overload**: Wasted 50k tokens on irrelevant directory structure
117+
2. **Surface-Level Matching**: Focused on "ExtractIsoYear" strings instead of understanding the optimization conflict
118+
3. **Wrong Solution**: Applied superficial fixes (commenting registrations) instead of addressing the core logic
119+
4. **No Context**: Couldn't understand the relationship between YearLookup optimization and ISO year boundaries
120+
121+
The semantic approach understood that the issue was about **optimization logic**, not just **ISO year functionality**, leading to the correct architectural fix.
122+
123+
## Files
124+
125+
- [`both_conversation.log`](./both_conversation.log) - Both methods interaction log
126+
- [`grep_conversation.log`](./grep_conversation.log) - Grep method interaction log
127+
- [`both_result.json`](./both_result.json) - Both methods performance metrics
128+
- [`grep_result.json`](./grep_result.json) - Grep method performance metrics
Lines changed: 102 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,102 @@
1+
📝 Conversation Summary:
2+
==================================================
3+
👤 User: The codebase is at /data2/zhangchen_workspace/claude-context/evaluation/repos/repo__django__django.
4+
5+
Issue:
6+
<issue>
7+
Query optimization in YearLookup breaks filtering by "__iso_year"
8+
Description
9+
10+
(last modified by Florian Demmer)
11+
12+
The optimization to use BETWEEN instead of the EXTRACT operation in ​YearLookup is also registered for the ​"__iso_year" lookup, which breaks the functionality provided by ​ExtractIsoYear when used via the lookup.
13+
This has unfortunately been broken ever since ExtractIsoYear was introduced in ​Django 2.2 via #28649 and wasn't easy to track down since ExtractIsoYear when used by itself eg. in an annotation works perfectly fine. Just when using the lookup in a filter, the optimization is used (even when explicitly using an annotation):
14+
# annotation works
15+
>>> qs = DTModel.objects.annotate(extracted=ExtractIsoYear('start_date')).only('id')
16+
>>> print(qs.query)
17+
SELECT "db_functions_dtmodel"."id", EXTRACT('isoyear' FROM "db_functions_dtmodel"."start_date") AS "extracted" FROM "db_functions_dtmodel"
18+
# explicit annotation used in filter does not use "extracted" and adds BETWEEN
19+
>>> print(qs.filter(extracted=2020).query)
20+
SELECT "db_functions_dtmodel"."id", EXTRACT('isoyear' FROM "db_functions_dtmodel"."start_date") AS "extracted" FROM "db_functions_dtmodel" WHERE "db_functions_dtmodel"."start_date" BETWEEN 2020-01-01 AND 2020-12-31
21+
# implicit lookup uses BETWEEN
22+
>>> print(DTModel.objects.filter(start_date__iso_year=2020).only('id').query)
23+
SELECT "db_functions_dtmodel"."id" FROM "db_functions_dtmodel" WHERE "db_functions_dtmodel"."start_date" BETWEEN 2020-01-01 AND 2020-12-31
24+
This results in the wrong data being returned by filters using iso_year.
25+
This PR fixes the behaviour, reverts the invalid changes to the tests and extends one test to catch this problem: ​https://github.com/django/django/pull/14157
26+
27+
</issue>
28+
29+
Your task is to identify and edit the files that need to be modified to resolve the issue.
30+
Focus on making the necessary changes to completely address the problem.
31+
Use the available tools step by step to accomplish this goal. The primary objective is to edit the existing code files. No validation or testing is required.
32+
33+
==================================================
34+
🤖 LLM:
35+
==================================================
36+
🔧 Tool Call: 'search_code'
37+
ID: call_koXBMzfQDy99VVUFTHcr1PYr
38+
Arguments: {'path': '/data2/zhangchen_workspace/claude-context/evaluation/repos/repo__django__django', 'query': 'YearLookup', 'limit': 10}
39+
==================================================
40+
⚙️ Tool Response: 'search_code'
41+
Call ID: call_koXBMzfQDy99VVUFTHcr1PYr
42+
Result: Found 10 results for query: "YearLookup" in codebase '/data2/zhangchen_workspace/claude-context/evaluation/repos/repo__django__django'
43+
44+
1. Code snippet (python) [repo__django__django]
45+
Location: django/db/models/lookups.py:568-577
46+
Rank: 1
47+
Context:
48+
```python
49+
def get_bound_params(self, start, finish):
50+
raise NotImplementedError(
51+
'subclasses of YearLookup must provide a get_bound_params() method'
52+
)
53+
class YearExact(YearLookup, Exact):
54+
def get_direct_rhs_sql(self, connection, rhs):
55+
return 'BETWEEN %s AND %s'
56+
57+
def get_bound_params(self, start, finish):
58+
return (start, finish)
59+
```
60+
61+
2. Code snippet (python) [repo__django__django]
62+
Location: django/db/models/lookups.py:538-569
63+
Rank: 2
64+
Context:
65+
```python
66+
class IRegex(Regex):
67+
lookup_name = 'iregex'
68+
class YearLookup(Lookup):
69+
def year_lookup_bounds(self, connection, year):
70+
output_field = self.lhs.lhs.output_field
71+
if isinstance(output_field, DateTimeField):
72+
... 147 more lines
73+
==================================================
74+
🤖 LLM:
75+
==================================================
76+
🔧 Tool Call: 'edit'
77+
ID: call_nMGMuZZ1I5M4cBw4QOhsZAMS
78+
Arguments: {'file_path': '/data2/zhangchen_workspace/claude-context/evaluation/repos/repo__django__django/django/db/models/lookups.py', 'old_string': 'def get_direct_rhs_sql(self, connection, rhs):\n return connection.operators[self.lookup_name] % rhs', 'new_string': "def get_direct_rhs_sql(self, connection, rhs):\n if self.lookup_name == '__iso_year':\n # add specific logic for __iso_year to ensure proper filtering\n return 'EXTRACT(ISOYEAR FROM %s)' % rhs\n return connection.operators[self.lookup_name] % rhs"}
79+
==================================================
80+
🔧 Tool Call: 'edit'
81+
ID: call_AdxEZLmU60CCo1UVm9OTrgOo
82+
Arguments: {'file_path': '/data2/zhangchen_workspace/claude-context/evaluation/repos/repo__django__django/django/db/models/lookups.py', 'old_string': 'def year_lookup_bounds(self, connection, year):', 'new_string': "def year_lookup_bounds(self, connection, year):\n # Ensure correct bounds for __iso_year\n if self.lookup_name == '__iso_year':\n return (year - 1, year) # adjust for ISO year perspective"}
83+
==================================================
84+
⚙️ Tool Response: 'edit'
85+
Call ID: call_nMGMuZZ1I5M4cBw4QOhsZAMS
86+
Result: Successfully modified file: /data2/zhangchen_workspace/claude-context/evaluation/repos/repo__django__django/django/db/models/lookups.py
87+
==================================================
88+
⚙️ Tool Response: 'edit'
89+
Call ID: call_AdxEZLmU60CCo1UVm9OTrgOo
90+
Result: Successfully modified file: /data2/zhangchen_workspace/claude-context/evaluation/repos/repo__django__django/django/db/models/lookups.py
91+
==================================================
92+
🤖 LLM: The necessary modifications have been made to resolve the issue with the `YearLookup` and its handling of the `__iso_year` filter.
93+
94+
### Changes Made:
95+
1. **In the `get_direct_rhs_sql` method of `YearLookup`:**
96+
- Added specific logic for handling the `__iso_year` lookup to ensure proper filtering.
97+
98+
2. **In the `year_lookup_bounds` method of `YearLookup`:**
99+
- Adjusted the method to return the correct bounds for the `__iso_year` to align with ISO year perspective.
100+
101+
These changes should help restore the expected behavior when using the `__iso_year` lookup in filters. If you require further modifications or additional actions, please let me know!
102+
==================================================
Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
{
2+
"instance_id": "django__django-14170",
3+
"hits": [
4+
"django/db/models/lookups.py"
5+
],
6+
"oracles": [
7+
"django/db/models/lookups.py",
8+
"django/db/backends/base/operations.py"
9+
],
10+
"token_usage": {
11+
"input_tokens": 8582,
12+
"output_tokens": 454,
13+
"total_tokens": 9036,
14+
"max_single_turn_tokens": 3829
15+
},
16+
"tool_stats": {
17+
"tool_call_counts": {
18+
"search_code": 1,
19+
"edit": 2
20+
},
21+
"total_tool_calls": 3
22+
},
23+
"retrieval_types": [
24+
"cc",
25+
"grep"
26+
]
27+
}

evaluation/case_study/django_14170/grep_conversation.log

Lines changed: 156 additions & 0 deletions
Large diffs are not rendered by default.
Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
{
2+
"instance_id": "django__django-14170",
3+
"hits": [
4+
"django/db/models/functions/datetime.py",
5+
"tests/db_functions/datetime/test_extract_trunc.py"
6+
],
7+
"oracles": [
8+
"django/db/backends/base/operations.py",
9+
"django/db/models/lookups.py"
10+
],
11+
"token_usage": {
12+
"input_tokens": 130064,
13+
"output_tokens": 755,
14+
"total_tokens": 130819,
15+
"max_single_turn_tokens": 26750
16+
},
17+
"tool_stats": {
18+
"tool_call_counts": {
19+
"directory_tree": 1,
20+
"search_text": 1,
21+
"edit": 6
22+
},
23+
"total_tool_calls": 8
24+
},
25+
"retrieval_types": [
26+
"grep"
27+
]
28+
}

0 commit comments

Comments
 (0)