配置文件更改 #186

Dallas98 · 2025-12-22T01:28:22Z

This pull request introduces several improvements and optimizations to the generation_service.py module, focusing on processing efficiency, resource control, and enhanced data extraction. The most significant changes include increasing concurrency limits, introducing randomization to QA generation, extracting image URLs from documents, and adjusting batch sizes for chunk processing.

Performance and Resource Management:

Increased the concurrency limit for question processing from 10 to 20 and the batch size for chunk processing from 20 to 100, allowing more tasks to be processed in parallel and improving throughput. [1] [2]

Feature Enhancements:

Added a new function, extract_img_urls, that extracts image URLs from document content using a regular expression, and integrated this extraction into the question processing workflow to store found image URLs in the data object. [1] [2]

Quality Control and Randomization:

Introduced a randomization step in the QA generation process: for each chunk, QA generation is now probabilistically skipped based on the temperature parameter in question_cfg, which can help diversify output and control resource usage.

…ration logic

…ved chunk processing

Dallas98 added 3 commits December 19, 2025 17:34

feat(generation_service): add image URL extraction and random QA gene…

98e4b0f

…ration logic

fix(generation_service): increase batch size from 20 to 100 for impro…

0437930

…ved chunk processing

fix(generation_service): increase batch size from 20 to 100 for impro…

6adde56

…ved chunk processing

Dallas98 merged commit 8fc4455 into main Dec 22, 2025
2 checks passed

Dallas98 deleted the dev branch December 24, 2025 04:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

配置文件更改 #186

配置文件更改 #186

Uh oh!

Dallas98 commented Dec 22, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

配置文件更改 #186

配置文件更改 #186

Uh oh!

Conversation

Dallas98 commented Dec 22, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants