-
Notifications
You must be signed in to change notification settings - Fork 22
saga4/fix duplication suggestion issue #284
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Changes from all commits
Commits
Show all changes
17 commits
Select commit
Hold shift + click to select a range
c75bbf6
check large diffs with black, and skipp formatting in such case (afte…
mohammedahmed18 5cd13ad
new line
mohammedahmed18 1522227
better log messages
mohammedahmed18 d3ca1cb
remove unnecessary check
mohammedahmed18 dcb084a
new line
mohammedahmed18 689a2d9
remove unused comment
mohammedahmed18 44c0f85
the max lines for formatting changes to 100
mohammedahmed18 73ef518
refactoring
mohammedahmed18 a5343fd
refactoring and improvements
mohammedahmed18 395855d
added black as dev dependency
mohammedahmed18 822d6cc
made some refactor changes that codeflash suggested
mohammedahmed18 ce15022
remove unused function
mohammedahmed18 d2a8711
formatting & using internal black dep
mohammedahmed18 f46b368
fix black import issue
mohammedahmed18 6504cc4
handle formatting files with no formatting issues
mohammedahmed18 d924b31
fix_duplication_suggestion_issue
Saga4 620ca38
Merge branch 'main' into saga4/fix_duplication_suggestion_issue
Saga4 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,47 @@ | ||
| import os | ||
|
|
||
| class UnformattedExampleClass(object): | ||
| def __init__( | ||
| self, | ||
| name, | ||
| age= None, | ||
| email= None, | ||
| phone=None, | ||
| address=None, | ||
| city=None, | ||
| state=None, | ||
| zip_code=None, | ||
| ): | ||
| self.name = name | ||
| self.age = age | ||
| self.email = email | ||
| self.phone = phone | ||
| self. address = address | ||
| self.city = city | ||
| self.state = state | ||
| self.zip_code = zip_code | ||
| self.data = {"name": name, "age": age, "email": email} | ||
|
|
||
| def get_info(self): | ||
| return f"Name: {self.name}, Age: {self.age}" | ||
|
|
||
| def update_data(self, **kwargs): | ||
| for key, value in kwargs.items(): | ||
| if hasattr(self, key): | ||
| setattr(self, key, value) | ||
| self.data.update(kwargs) | ||
|
|
||
|
|
||
| def process_data( | ||
| data_list, filter_func=None, transform_func=None, sort_key=None, reverse=False | ||
| ): | ||
| if not data_list: | ||
| return [] | ||
| if filter_func: | ||
| data_list = [ item for item in data_list if filter_func(item)] | ||
| if transform_func: | ||
| data_list = [transform_func(item) for item in data_list] | ||
| if sort_key: | ||
| data_list = sorted(data_list, key=sort_key, reverse=reverse) | ||
| return data_list | ||
|
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,147 @@ | ||
| import os,sys,json,datetime,math,random;import requests;from collections import defaultdict,OrderedDict | ||
| from typing import List,Dict,Optional,Union,Tuple,Any;import numpy as np;import pandas as pd | ||
|
|
||
| # This is a poorly formatted Python file with many style violations | ||
|
|
||
| class UnformattedExampleClass( object ): | ||
| def __init__(self,name,age=None,email=None,phone=None,address=None,city=None,state=None,zip_code=None): | ||
| self.name=name;self.age=age;self.email=email;self.phone=phone | ||
| self.address=address;self.city=city;self.state=state;self.zip_code=zip_code | ||
| self.data={"name":name,"age":age,"email":email} | ||
|
|
||
| def get_info(self ): | ||
| return f"Name: {self.name}, Age: {self.age}" | ||
|
|
||
| def update_data(self,**kwargs): | ||
| for key,value in kwargs.items(): | ||
| if hasattr(self,key):setattr(self,key,value) | ||
| self.data.update(kwargs) | ||
|
|
||
| def process_data(data_list,filter_func=None,transform_func=None,sort_key=None,reverse=False): | ||
| if not data_list:return[] | ||
| if filter_func:data_list=[item for item in data_list if filter_func(item)] | ||
| if transform_func:data_list=[transform_func(item)for item in data_list] | ||
| if sort_key:data_list=sorted(data_list,key=sort_key,reverse=reverse) | ||
| return data_list | ||
|
|
||
| def calculate_statistics(numbers): | ||
| if not numbers:return None | ||
| mean=sum(numbers)/len(numbers); median=sorted(numbers)[len(numbers)//2] | ||
| variance=sum((x-mean)**2 for x in numbers)/len(numbers);std_dev=math.sqrt(variance) | ||
| return {"mean":mean,"median":median,"variance":variance,"std_dev":std_dev,"min":min(numbers),"max":max(numbers)} | ||
|
|
||
| def complex_nested_function(x,y,z): | ||
| def inner_function_1(a,b): | ||
| def deeply_nested(c,d): | ||
| return c*d+a*b | ||
| return deeply_nested(a+1,b-1)+deeply_nested(a-1,b+1) | ||
| def inner_function_2 (a,b,c): | ||
| result=[] | ||
| for i in range(a): | ||
| for j in range(b): | ||
| for k in range(c): | ||
| if i*j*k>0:result.append(i*j*k) | ||
| elif i+j+k==0:result.append(-1) | ||
| else :result.append(0) | ||
| return result | ||
| return inner_function_1(x,y)+sum(inner_function_2(x,y,z)) | ||
|
|
||
| # Long lines and poor dictionary formatting | ||
| user_data={"users":[{"id":1,"name":"John Doe","email":"[email protected]","preferences":{"theme":"dark","notifications":True,"language":"en"},"metadata":{"created_at":"2023-01-01","last_login":"2024-01-01","login_count":150}},{"id":2,"name":"Jane Smith","email":"[email protected]","preferences":{"theme":"light","notifications":False,"language":"es"},"metadata":{"created_at":"2023-02-15","last_login":"2024-01-15","login_count":89}}]} | ||
|
|
||
| # Poor list formatting and string concatenation | ||
| long_list_of_items=['item_1','item_2','item_3','item_4','item_5','item_6','item_7','item_8','item_9','item_10','item_11','item_12','item_13','item_14','item_15','item_16','item_17','item_18','item_19','item_20'] | ||
|
|
||
| def generate_report(data,include_stats=True,include_charts=False,format_type='json',output_file=None): | ||
| if not data:raise ValueError("Data cannot be empty") | ||
| report={'timestamp':datetime.datetime.now().isoformat(),'data_count':len(data),'summary':{}} | ||
|
|
||
| # Bad formatting in loops and conditionals | ||
| for i,item in enumerate(data): | ||
| if isinstance(item,dict): | ||
| for key,value in item.items(): | ||
| if key not in report['summary']:report['summary'][key]=[] | ||
| report['summary'][key].append(value) | ||
| elif isinstance(item,(int,float)): | ||
| if 'numbers' not in report['summary']:report['summary']['numbers']=[] | ||
| report['summary']['numbers'].append(item) | ||
| else: | ||
| if 'other' not in report['summary']:report['summary']['other']=[] | ||
| report['summary']['other'].append(str(item)) | ||
|
|
||
| if include_stats and 'numbers' in report['summary']: | ||
| numbers=report['summary']['numbers'] | ||
| report['statistics']=calculate_statistics(numbers) | ||
|
|
||
| # Long conditional chain with poor formatting | ||
| if format_type=='json':result=json.dumps(report,indent=None,separators=(',',':')) | ||
| elif format_type=='pretty_json':result=json.dumps(report,indent=2) | ||
| elif format_type=='string':result=str(report) | ||
| else:result=report | ||
|
|
||
| if output_file: | ||
| with open(output_file,'w')as f:f.write(result if isinstance(result,str)else json.dumps(result)) | ||
|
|
||
| return result | ||
|
|
||
| class DataProcessor ( UnformattedExampleClass ) : | ||
| def __init__(self,data_source,config=None,debug=False): | ||
| super().__init__("DataProcessor") | ||
| self.data_source=data_source;self.config=config or{};self.debug=debug | ||
| self.processed_data=[];self.errors=[];self.warnings=[] | ||
|
|
||
| def load_data ( self ) : | ||
| try: | ||
| if isinstance(self.data_source,str): | ||
| if self.data_source.endswith('.json'): | ||
| with open(self.data_source,'r')as f:data=json.load(f) | ||
| elif self.data_source.endswith('.csv'):data=pd.read_csv(self.data_source).to_dict('records') | ||
| else:raise ValueError(f"Unsupported file type: {self.data_source}") | ||
| elif isinstance(self.data_source,list):data=self.data_source | ||
| else:data=[self.data_source] | ||
| return data | ||
| except Exception as e: | ||
| self.errors.append(str(e));return[] | ||
|
|
||
| def validate_data(self,data): | ||
| valid_items=[];invalid_items=[] | ||
| for item in data: | ||
| if isinstance(item,dict)and'id'in item and'name'in item:valid_items.append(item) | ||
| else:invalid_items.append(item) | ||
| if invalid_items:self.warnings.append(f"Found {len(invalid_items)} invalid items") | ||
| return valid_items | ||
|
|
||
| def process(self): | ||
| data=self.load_data() | ||
| if not data:return{"success":False,"error":"No data loaded"} | ||
|
|
||
| validated_data=self.validate_data(data) | ||
| processed_result=process_data(validated_data, | ||
| filter_func=lambda x:x.get('active',True), | ||
| transform_func=lambda x:{**x,'processed_at':datetime.datetime.now().isoformat()}, | ||
| sort_key=lambda x:x.get('name','')) | ||
|
|
||
| self.processed_data=processed_result | ||
| return{"success":True,"count":len(processed_result),"data":processed_result} | ||
| if __name__=="__main__": | ||
| sample_data=[{"id":1,"name":"Alice","active":True},{"id":2,"name":"Bob","active":False},{"id":3,"name":"Charlie","active":True}] | ||
|
|
||
| processor=DataProcessor(sample_data,config={"debug":True}) | ||
| result=processor.process() | ||
|
|
||
| if result["success"]: | ||
| print(f"Successfully processed {result['count']} items") | ||
| for item in result["data"][:3]:print(f"- {item['name']} (ID: {item['id']})") | ||
| else:print(f"Processing failed: {result.get('error','Unknown error')}") | ||
|
|
||
| # Generate report with poor formatting | ||
| report=generate_report(sample_data,include_stats=True,format_type='pretty_json') | ||
| print("Generated report:",report[:100]+"..."if len(report)>100 else report) | ||
|
|
||
| # Complex calculation with poor spacing | ||
| numbers=[random.randint(1,100)for _ in range(50)] | ||
| stats=calculate_statistics(numbers) | ||
| complex_result=complex_nested_function(5,3,2) | ||
|
|
||
| print(f"Statistics: mean={stats['mean']:.2f}, std_dev={stats['std_dev']:.2f}") | ||
| print(f"Complex calculation result: {complex_result}") |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,71 @@ | ||
| import os, sys, json, datetime, math, random | ||
| import requests | ||
| from collections import defaultdict, OrderedDict | ||
| from typing import List, Dict, Optional, Union, Tuple, Any | ||
| import numpy as np | ||
| import pandas as pd | ||
|
|
||
| # This is a poorly formatted Python file with many style violations | ||
|
|
||
|
|
||
| class UnformattedExampleClass(object): | ||
| def __init__( | ||
| self, | ||
| name, | ||
| age=None, | ||
| email=None, | ||
| phone=None, | ||
| address=None, | ||
| city=None, | ||
| state=None, | ||
| zip_code=None, | ||
| ): | ||
| self.name = name | ||
| self.age = age | ||
| self.email = email | ||
| self.phone = phone | ||
| self.address = address | ||
| self.city = city | ||
| self.state = state | ||
| self.zip_code = zip_code | ||
| self.data = {"name": name, "age": age, "email": email} | ||
|
|
||
| def get_info(self): | ||
| return f"Name: {self.name}, Age: {self.age}" | ||
|
|
||
| def update_data(self, **kwargs): | ||
| for key, value in kwargs.items(): | ||
| if hasattr(self, key): | ||
| setattr(self, key, value) | ||
| self.data.update(kwargs) | ||
|
|
||
|
|
||
| def process_data( | ||
| data_list, filter_func=None, transform_func=None, sort_key=None, reverse=False | ||
| ): | ||
| if not data_list: | ||
| return [] | ||
| if filter_func: | ||
| data_list = [item for item in data_list if filter_func(item)] | ||
| if transform_func: | ||
| data_list = [transform_func(item) for item in data_list] | ||
| if sort_key: | ||
| data_list = sorted(data_list, key=sort_key, reverse=reverse) | ||
| return data_list | ||
|
|
||
|
|
||
| def calculate_statistics(numbers): | ||
| if not numbers: | ||
| return None | ||
| mean = sum(numbers) / len(numbers) | ||
| median = sorted(numbers)[len(numbers) // 2] | ||
| variance = sum((x - mean) ** 2 for x in numbers) / len(numbers) | ||
| std_dev = math.sqrt(variance) | ||
| return { | ||
| "mean": mean, | ||
| "median": median, | ||
| "variance": variance, | ||
| "std_dev": std_dev, | ||
| "min": min(numbers), | ||
| "max": max(numbers), | ||
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
⚡️Codeflash found 16% (0.16x) speedup for
get_diff_lines_countincodeflash/code_utils/formatter.py⏱️ Runtime :
2.65 milliseconds→2.29 milliseconds(best of263runs)📝 Explanation and details
Here's a faster version of your program. The main bottleneck is the list comprehension calling a nested function for every line, leading to relatively high overhead. Avoiding the inner function, only iterating once, and not building an intermediate list (since we only need the count) can significantly improve performance.
You do not need to build the list of lines; simply scan and count qualifying lines in a single pass.
Optimized version.
Changes made:
is_diff_linelogic for better performance (function call overhead avoided).count) instead of a list, so memory use and processing are reduced.This is as fast and memory-efficient as this logic gets in idiomatic Python.
✅ Correctness verification report:
🌀 Generated Regression Tests Details
To test or edit this optimization locally
git merge codeflash/optimize-pr284-2025-06-04T20.10.23