Feature/a 100 analysis #8

naryasomayaj · 2025-06-30T18:18:25Z

Here is the analysis for a100 gpu metrics. Currently, this is generating metrics as well as plots for the top3 requested gpus.

…ccuracies for a100 gpus

… 2080-ti and 1080-ti and generating metrics and plots

…hat have a vram_constraint efficiency over 100%

Espiobest · 2025-08-13T20:00:00Z

src/analysis/efficiency_analysis.py

        db = DatabaseConnection(str(db_path))

-        jobs_df = db.fetch_all_jobs(table_name=table_name)
+        jobs_df = db.fetch_all_jobs(table_name=table_name) if query is None else db.fetch_query(query=query)


This looks good, but ideally, we wait for Tan's PR to be merged and use those functions. If we don't have time to do that though, then we can stick to using this.

Espiobest · 2025-08-13T20:12:40Z

src/analysis/efficiency_analysis.py

+
+                gpu_jobs["job_hours"].sum(),  # Total GPU Hours
+                 # Mean Weighted VRAM Efficiency
+                (gpu_jobs["alloc_vram_efficiency"] * gpu_jobs["job_hours"]).sum() / gpu_jobs["job_hours"].sum(), 


we should probably use VRAM hours here since the other weighted job metrics use that. Unless there's a reason why job_hours would work better here

Espiobest · 2025-08-13T20:12:53Z

src/analysis/efficiency_analysis.py

+                 # Mean Weighted VRAM Efficiency
+                (gpu_jobs["alloc_vram_efficiency"] * gpu_jobs["job_hours"]).sum() / gpu_jobs["job_hours"].sum(), 
+                # Median Weighted VRAM Efficiency
+                (gpu_jobs["alloc_vram_efficiency"] * gpu_jobs["job_hours"]).median() / gpu_jobs["job_hours"].median()  


Same here, should we use vram_hours instead?

Espiobest · 2025-08-13T20:37:26Z

src/analysis/efficiency_analysis.py

+        job_metrics_by_gpu_type = self.compare_job_metrics_by_gpu_type()
+
+        # Create a DataFrame to hold the GPU utilization patterns
+        gpu_utilization_patterns = pd.DataFrame({


Is this just transposing the other df? It looks like we're creating rows out of the column values? Correct me if I'm wrong but could transposing the df do the same thing and be simplified? Try it and see

Espiobest · 2025-08-13T20:39:42Z

notebooks/A100_Analysis.ipynb

In this file, could we remove the older Efficiency Analysis cells (at least the plots that are irrelevant to A100s). If we aren't doing anything with users, we don't need to call the user metric functions and show the output here. I had to scroll quite a bit to find the actual A100 plots. It's better if we can simplify this notebook since we will also have another notebook later that contains the Efficiency analysis + time plots + ROC + A100 analysis for certain groups, so this notebook should only focus on A100s for easy reference to those functions

naryasomayaj added 4 commits June 25, 2025 16:46

committing changes for a100 analysis generating plot of all request a…

703379b

…ccuracies for a100 gpus

committing changes to a-100 analysis

ae94ad4

Merge branch 'feature/1a-zero-vram' into feature/a-100-analysis

4baadaa

committing changes for analysis of a100 gpus compared to top 3 users,…

a487bb6

… 2080-ti and 1080-ti and generating metrics and plots

naryasomayaj requested a review from MisterArdavan June 30, 2025 18:18

committing changes for a100 analysis resolving all ruff checks

f6b6a33

MisterArdavan marked this pull request as draft June 30, 2025 21:39

naryasomayaj added 23 commits July 8, 2025 10:35

committing changes for a100 analysis

46b2514

resolve merge conflicts

9fefe76

Merge branch 'main' into feature/a-100-analysis

0da5e62

merged main into branch

ace5dd5

committing changes for a100 analysis

a9892ac

resolved ruff issues on a-100-analysis branch

b3690b7

resolved ruff issues on a-100-analysis branch

97eca33

committing changes for analysis of a100 gpus compared to top 3 users,…

1b7d9f6

… 2080-ti and 1080-ti and generating metrics and plots

committing changes for a100 analysis resolving all ruff checks

4979449

committing changes for a100 analysis

dd291b9

merged main into branch

bd0a418

committing changes for a100 analysis

56efc92

committing changes toa100 analysis with refactored EfficiencyAnalysis

0fbf2ce

working with updated efficiency analysis

afa331d

committing changes for preprocess

d7bbd9d

committing changes for a100 analysis

fe42dbd

committing changes for a100 on the new dataset

3eb7559

committing changes to a100 analysis

271bbc7

committing changes to a100 just to switch branch

6d4eaf6

Readd dev-requirement file

be387c5

Up to date efficiency Analysis notebook

bcc49ed

merge remote-tracking branch 'origin/main' into feature/a-100-analysis

bff782b

committing changes to a-100 analysis notebook

36c30dc

naryasomayaj added 10 commits July 31, 2025 14:40

created A100 .ipynb with same structure as EfficiencyAnalysis

53d4e91

A100_analysis.ipynb

0ee8bf9

committing a100 formatting changes

662785a

committing edits for a100 format

4c433a2

Merge branch 'main' into feature/a-100-analysis

25bf316

passing all pytests in a100

891c120

resolving all ruff checks for a100

4c2b94a

resolving all ruff checks for a100

0d8279c

pyproject.toml

e9453de

resolving mypy errors

6c22061

naryasomayaj marked this pull request as ready for review August 5, 2025 14:46

naryasomayaj requested review from LTan-101104 and MisterArdavan and removed request for LTan-101104 and MisterArdavan August 5, 2025 14:46

naryasomayaj added 5 commits August 5, 2025 23:44

observing vram constraint efficiency categories, looking into users t…

0440cbb

…hat have a vram_constraint efficiency over 100%

vram efficiency > 1 and looking at GPU request types

62a5be3

vram efficiency > 1 and looking at GPU request types

fafd927

vram efficiency > 1 and looking at GPU request types

15a6f8c

committing changes for a100 visualizations

c7d52b4

Espiobest requested changes Aug 13, 2025

View reviewed changes

naryasomayaj added 2 commits August 16, 2025 15:28

Merge branch 'main' into feature/a-100-analysis

54a9c41

polishing a100 notebook

dffc6dd

MisterArdavan marked this pull request as draft August 20, 2025 18:48

bpachev closed this Sep 3, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature/a 100 analysis #8

Feature/a 100 analysis #8

Uh oh!

naryasomayaj commented Jun 30, 2025

Uh oh!

Espiobest Aug 13, 2025

Uh oh!

Espiobest Aug 13, 2025

Uh oh!

Espiobest Aug 13, 2025

Uh oh!

Espiobest Aug 13, 2025

Uh oh!

Espiobest Aug 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Feature/a 100 analysis #8

Feature/a 100 analysis #8

Uh oh!

Conversation

naryasomayaj commented Jun 30, 2025

Uh oh!

Espiobest Aug 13, 2025

Choose a reason for hiding this comment

Uh oh!

Espiobest Aug 13, 2025

Choose a reason for hiding this comment

Uh oh!

Espiobest Aug 13, 2025

Choose a reason for hiding this comment

Uh oh!

Espiobest Aug 13, 2025

Choose a reason for hiding this comment

Uh oh!

Espiobest Aug 13, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants