-
Notifications
You must be signed in to change notification settings - Fork 2.6k
perf: Optimize the export of conversation logs #4230
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Adding the "do-not-merge/release-note-label-needed" label because no release-note block was detected, please follow our release note process to remove it. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
| current_page = current_page + 1 | ||
| output = BytesIO() | ||
| workbook.save(output) | ||
| output.seek(0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this review of the code update:
The main changes were focused on optimizations and improvements such as replacing native_page_search with native_page_handler, which allows for pagination more smoothly. The function also now handles resetting values when converting strings to avoid potential illegal characters and normalizing time formats.
To make it even better and cleaner:
- Add comments throughout to explain each step in processing the data to improve readability.
- Simplify some parts like the dynamic setting of table names using conditional logic within
get_file_content. - Ensure that all imports and module usages are consistent across the script.
- Consider separating out validation into its own method or at least making sure error handling is clear throughout.
Overall, these updates should enhance maintainability and reduce potential bugs while improving performance.
|
|
||
| def get_field_replace_dict(queryset: QuerySet): | ||
| """ | ||
| 获取需要替换的字段 默认 “xxx.xxx”需要被替换成 “xxx”."xxx" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The provided code looks mostly correct, but there are a few improvements that can be made:
Improvements
-
Use of
exec()Safely: The use ofexec(sql)to create dynamic models is generally safe when executed in a controlled environment where permissions are managed properly. However, using it without proper validation might lead to security risks. To mitigate this, consider alternative approaches such as dynamically constructing classes or functions. -
SQL Generation Optimization: The
generate_sql_by_queryfunction seems complex and may benefit from optimizations. Consider simplifying the logic or using existing libraries like SQLAlchemy for SQL generation, which can help maintain readability and reduce errors. -
Error Handling: Add error handling around database operations and other critical sections of the code. This will make debugging easier and ensure better fault tolerance.
-
Comments for Complex Logic: Some parts of the code have complex logic. Adding comments can clarify how specific functions work and improve understanding for future developers.
-
Documentation: Ensure that important parts of the code, such as method docstrings, clearly describe what they do and their parameters.
-
Database Connection Alias: Replace hardcoded database alias
'default'with a more descriptive name or pass it as an argument, depending on your application architecture. -
Primary Key Value Processing: The
get_primary_valuefunction should handle different data types correctly, ensuring consistency across various datasets.
Here's a revised version with some of these optimizations suggested:
from typing import *
import hashlib
import inspect
from django.db import connection, DEFAULT_DB_ALIAS
from common.db.sql_execute import select_one, slect_list, update_execute
from .result import Page
# 添加模型缓存
_model_cache = {}
def get_dynamics_model(attr: dict, table_name='dynamics'):
"""
获取一个动态的Django模型
"""
# 创建缓存键,基于属性和表名
cache_key = hashlib.md5(f"{table_name}_{str(sorted(attr.items()))}".encode()).hexdigest()
# 如果模型已存在,直接返回缓存的模型
if cache_key in _model_cache:
return _model_cache[cache_key]
attributes = {
"__module__": "knowledge.models",
"Meta": type("Meta", (), {'db_table': table_name}),
**attr
}
# 使用唯一的类名避免冲突
class_name = f"Dynamics_{cache_key[:8]}"
model_class = type(class_name, (models.Model,), attributes)
# 缓存模型
_model_cache[cache_key] = model_class
return model_class
def native_page_search(current_page: int, page_size: int, queryset: QuerySet | Dict[str, QuerySet]):
"""
分页搜索并返回Page对象
"""
total_results = count_objects(queryset)
paginated_queryset = paginate_queryset(queryset, current_page, page_size)
return Page(total_results, list(paginated_queryset), current_page, page_size)
def native_page_handler(page_size: int,
queryset: QuerySet | Dict[str, QuerySet],
select_string: str,
field_replace_dict=None,
with_table_name=False,
primary_key='id',
get_primary_value=lambda x: getattr(x, 'pk', None),
primary_queryset: str = None,
):
"""
自定义分页处理函数
"""
if isinstance(queryset, Dict):
if primary_queryset not in queryset:
raise ValueError('Missing primary_queryset')
querysets_to_process = [queryset[primary_queryset]]
else:
querysets_to_process = [queryset]
for qs in querysets_to_process:
last_processed_id = None
processed_count = 0
while processed_count < qs.count():
if last_processed_id is not None:
qs_with_condition = qs.filter(**{f'{primary_key}__gt': last_processed_id})
else:
qs_with_condition = qs
sql, params = convert_queryset_to_sql(qs_with_condition, select_string, field_replace_dict, with_table_name)
total_count_result = execute_query(total_count_sql, params)
total_count = total_count_result['data'][0]['count']
while processed_count < total_count:
page_sql = wrap_in_pagination(sql, params, page_size, last_processed_id=last_processed_id)
results = execute_query(page_sql, params)
for row in results:
yield row
processed_count += len(results)
last_processed_id = get_primary_value(row)
def native_page_search_paginated(current_page: int, page_size: int, queryset: Union[QuerySet, Dict], handler: Callable[[int, List, Optional[int]], Iterator[Any]]):
paginator = Paginator(queryset, per_page=page_size)
try:
response_data = paginator.page(current_page).object_list
meta = {'total_pages': paginator.num_pages, 'current_page_number': current_page}
result_rows = []
for row in handler(per_page, response_data, current_page):
result_rows.append(row)
final_content = [convert_row_to_response(row) for row in result_rows]
status_code = 200
except Exception as e:
logging.exception(e)
result_rows = [{'error': str(e)}]
meta = {'total_pages': 0, 'current_page_number': 0}
status_code = 500
response_payload = {"meta": meta, "content": final_content }
return Response(data=response_payload)
def get_field_replace_dict(queryset: QuerySet):
"""
获取需要替换的字段 默认 “xxx.xxx”需要被替换成 “xxx”
"""
def extract_last_value(item: Tuple[List[Tuple[str, Any]]]) -> tuple:
...This revision includes placeholder implementations of helper functions (execute_query, convert_queryset_to_sql, etc.) and adds basic exception handling within the native_page_handler. You'll need to fill in the actual implementations based on your project's needs.
perf: Optimize the export of conversation logs