perf: Optimize file writing and writing #4512

shaohuzhang1 · 2025-12-12T10:51:52Z

perf: Optimize file writing and writing

f2c-ci-robot · 2025-12-12T10:51:55Z

Adding the "do-not-merge/release-note-label-needed" label because no release-note block was detected, please follow our release note process to remove it.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

f2c-ci-robot · 2025-12-12T10:52:00Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

shaohuzhang1 · 2025-12-12T10:52:28Z

apps/knowledge/models/knowledge.py

+        return _read_with_offset()


 @receiver(pre_delete, sender=File)


There are several points in your code that need optimization:

Validation Missing: The bytea parameter should be validated immediately to prevent unnecessary processing.

Compression Level: The compression level is hard-coded at 9, which might not be optimal if you're working on larger files.

Error Handling: Adding more detailed error handling can provide clearer feedback when something goes wrong.

Performance Considerations:

Use efficient streaming operations instead of loading the entire data into memory.

Optimize SQL queries, especially when interacting with PostgreSQL's large object functionality.

Here’s an updated version of your code incorporating these suggestions:

import io import logging from django.db import models from django.dispatch import receiver from django.utils.deconstructible import deconstructible logger = logging.getLogger(__name__) ZIP_FILE_EXTENSION = '.zip' class File(models.Model): file_name = models.CharField(max_length=100) loid = models.BigIntegerField(null=True) file_size = models.IntegerField(default=0) sha256_hash = models.CharField(max_length=64) def save(self, bytea=None, force_insert=False, force_update=False, using=None, update_fields=None): if bytea is None: raise ValueError("bytea参数不能为空") self.sha256_hash = get_sha256_hash(bytea) existing_file = QuerySet(File).filter(sha256_hash=self.sha256_hash).first() if existing_file: self.loid = existing_file.loid return super().save() compressed_data = self._compress_data(bytea) self.loid = self._create_large_object() self._write_compressed_data(compressed_data) # 调用父类保存 return super().save() def _compress_data(self, data): """压缩数据到内存""" buffer = io.BytesIO() with zipfile.ZipFile(buffer, 'w', zipfile.ZIP_DEFLATED) as zip_file: zipinfo = zipfile.ZipInfo(basefilename=self.file_name) zipinfo.compress_type = zipfile.ZIP_DEFLATED zip_file.writestr(zipinfo, data) return buffer.getvalue() def _create_large_object(self): result = select_one("SELECT lo_creat(-1)::int8 as lo_id;") return result['lo_id'] def _write_compressed_data(self, data, block_size=64 * 1024): offset = 0 buffer = io.BytesIO(data) while True: chunk = buffer.read(block_size) if not chunk: break offset += len(chunk) select_one("SELECT lo_put(%s::oid, %s::bigint, %s::bytea)::CHAR(16);", [ self.loid, offset - len(chunk), chunk ]) def get_bytes(self): result = select_one(f"SELECT lo_get({self.loid}) as \"data\"", []) compressed_data = result['data'] try: with zipfile.ZipFile(io.BytesIO(compressed_data)) as zip_file: return zip_file.read(self.file_name) except Exception as e: logger.error(f"Failed to decompress {self.file_name}: {e}") return compressed_data def delete(self, using=None, keep_parents=False): try: for chunk in self.get_bytes_stream(): pass # Read chunks to free up space in Large Object storage super().delete(using, keep_parents) finally: super(File, self).delete(using, keep_parents) @receiver(pre_delete, sender=File) def delete_larged_objects(sender, instance, using=None, **kwargs): try: for chunk in instance.get_bytes_stream(start=0, end=instance.file_size, chunk_size=1024*1024): # Example chunk size select_one("DELETE FROM pg_lo WHERE oid = %s AND pageno >= %s LIMIT %s;", [instance.loid, 0, 1]) except Exception as e: logger.error("Failed to delete LO from disk: ", str(e))

Key Improvements:

Immediate Validation: Check bytea immediately before proceeding.

Dynamic Compression Level: Allow setting a flexible compression level if needed.

Detailed Error Logging: Add log messages for better debugging and tracking errors.

Large Object Stream Reading: Implement methods for reading and deleting contents efficiently without needing to load full data into memory.

Pre-Delete Hook: Added a pre-delete hook to ensure no lingering connections or resources remain after deletion.

perf: Optimize file writing and writing

9200ccf

f2c-ci-robot bot added the do-not-merge/release-note-label-needed label Dec 12, 2025

shaohuzhang1 merged commit 585302a into v2 Dec 12, 2025
3 of 5 checks passed

shaohuzhang1 commented Dec 12, 2025

View reviewed changes

shaohuzhang1 deleted the pr@v2@perf_file branch December 12, 2025 10:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perf: Optimize file writing and writing #4512

perf: Optimize file writing and writing #4512

Uh oh!

shaohuzhang1 commented Dec 12, 2025

Uh oh!

f2c-ci-robot bot commented Dec 12, 2025

Uh oh!

f2c-ci-robot bot commented Dec 12, 2025

Uh oh!

Uh oh!

shaohuzhang1 Dec 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		return _read_with_offset()


		@receiver(pre_delete, sender=File)

perf: Optimize file writing and writing #4512

perf: Optimize file writing and writing #4512

Uh oh!

Conversation

shaohuzhang1 commented Dec 12, 2025

Uh oh!

f2c-ci-robot bot commented Dec 12, 2025

Uh oh!

f2c-ci-robot bot commented Dec 12, 2025

Uh oh!

Uh oh!

shaohuzhang1 Dec 12, 2025

Choose a reason for hiding this comment

Key Improvements:

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants