fix: reduce memory usage in export_csv function. #1469

jcpitre · 2025-11-18T14:29:12Z

Summary:
Closes #898
Changed the way we obtain the feeds from the DB. Reduced batch size from 500 to 50. Since this is likely to take more time to execute, increased the timeout from 10 minutes to 1 hour (but 1 hour might be an overkill).
Modified the way the joins are done while querying the DB so it uses less memory.

Note: This PR also affects the API calls. All tests pass, but extra attention should be given to possible side effects.

AI summary

This pull request makes significant improvements to how related data is loaded and streamed from the database, with a focus on memory efficiency and correctness for feed export and query operations. The main changes involve switching from joinedload to selectinload for loading collections, improving batch processing and streaming, and ensuring active feeds are prioritized in exports. These changes help prevent memory issues, reduce row multiplication, and make exported data more accurate.

Database Query and Loading Strategy Improvements:

Switched from joinedload to selectinload for loading collections in queries (e.g., Gtfsfeed.latest_dataset, Gtfsdataset.validation_reports, Validationreport.features, and related entities). This change prevents row multiplication and excessive memory usage when streaming large result sets. [1] [2] [3] [4] [5] [6] [7]
Updated batch processing logic in get_all_gtfs_feeds and get_all_gtfs_rt_feeds to use smaller batch sizes, stream results with stream_results=True, and clear the SQLAlchemy session cache between batches to avoid memory leaks. Added logging for batch progress. [1] [2]
Improved filtering and loader options to ensure only "active" feeds are included in certain relationships, using with_loader_criteria for correctness.
Enhanced error handling and static analysis compatibility in bounding box filtering logic for clarity and reliability.

Export Functionality and Data Accuracy:

In the CSV export logic, changed the sorting of GTFS feed references for realtime feeds to prioritize "active" feeds first, using a stable sort, and updated test data to verify this behavior. [1] [2]
Added a track_metrics decorator to the main CSV export function to record time, memory, and CPU usage during export operations, supporting better performance monitoring. [1] [2]

General Codebase Cleanups:

Improved type hints and removed unused imports for better maintainability and static analysis. [1] [2] [3]
Added more explicit comments and docstring clarifications throughout the code to explain rationale for loader choices and batch handling. [1] [2] [3] [4]

These changes collectively make feed queries and exports more robust, memory-efficient, and accurate.

Please make sure these boxes are checked before submitting your pull request - thanks!

Run the unit tests with ./scripts/api-tests.sh to make sure you didn't break anything
Add or update any needed documentation to the repo
Format the title like "feat: [new feature short description]". Title must follow the Conventional Commit Specification(https://www.conventionalcommits.org/en/v1.0.0/).
Linked all relevant issues
Include screenshot(s) showing how this pull request works and fixes the issue(s)

api/src/shared/common/db_utils.py

functions-python/export_csv/function_config.json

davidgamez

LGTM!

jcpitre added 3 commits November 18, 2025 09:28

Modified to reduce memory usage in export_csv function.

03f4715

Added missing python package

c39136d

Increased timeout to 1 hour for export_csv

fe821db

jcpitre changed the title ~~Modified to reduce memory usage in export_csv function.~~ fix: reduce memory usage in export_csv function. Nov 18, 2025

Merge branch 'main' into Export-CSV-function-exceeds-memory-limit-898

c90b6e7

davidgamez reviewed Nov 18, 2025

View reviewed changes

api/src/shared/common/db_utils.py Show resolved Hide resolved

functions-python/export_csv/function_config.json Show resolved Hide resolved

jcpitre added 2 commits November 18, 2025 21:06

Limited the changes to code used by export_csv.

c9c30e2

Added query by batch.

49a3c8c

davidgamez approved these changes Nov 19, 2025

View reviewed changes

davidgamez merged commit 34ac9d3 into main Nov 19, 2025
2 of 3 checks passed

davidgamez deleted the Export-CSV-function-exceeds-memory-limit-898 branch November 19, 2025 18:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: reduce memory usage in export_csv function. #1469

fix: reduce memory usage in export_csv function. #1469

Uh oh!

jcpitre commented Nov 18, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

davidgamez left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

fix: reduce memory usage in export_csv function. #1469

fix: reduce memory usage in export_csv function. #1469

Uh oh!

Conversation

jcpitre commented Nov 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

AI summary

Uh oh!

Uh oh!

Uh oh!

davidgamez left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jcpitre commented Nov 18, 2025 •

edited

Loading