Skip to content

Conversation

@LTan-101104
Copy link
Collaborator

@LTan-101104 LTan-101104 commented Jul 30, 2025

This PR includes change made to preprocess to include missing columns handling logic and load_preprocessed_jobs_dataframe_from_duckdb for checking a date_back options in custom query.

For all columns that are necessary for calculation in preprocess and EfficiencyAnalysis's jobs metrics calculation (GPU-related metrics), this will enforce the dataframe to have those columns. Otherwise, it will raise the warning if the dataframe is missing some columns.

For load_preprocessed_jobs_dataframe_from_duckdb, it has some new parameters to include the dates_back options for SQL query and also support parameters of preprocess so user can use this as a complete wrapper around DatabaseConnection and preprocess_data.

…tialize test for function, make change to how custom_query is used in function
…tering by status, GPU, and elapsed time for load_jobs_dataframe_from_duckdb, add associated test
… TODO and temporarily finalize handling of missing columns in preprocess
…r allocated_vram vram_constraints calculation when deal with empty dataframe in preprocess
…jobs_dataframe_from_duckdb, finalize test for load_jobs_duck_db
…nd add comments explaining missing columns logic
@Espiobest
Copy link
Collaborator

Changes look good to me

Copy link
Collaborator

@Espiobest Espiobest left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just had 2 small comments with the tests. Otherwise the implementation looks good to me!

@bpachev bpachev merged commit 80d42bd into main Sep 3, 2025
6 checks passed
@bpachev bpachev deleted the fix/db_connection_custom_query branch September 3, 2025 20:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants