-
Notifications
You must be signed in to change notification settings - Fork 0
Feature/a 100 analysis #8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ccuracies for a100 gpus
… 2080-ti and 1080-ti and generating metrics and plots
… 2080-ti and 1080-ti and generating metrics and plots
…hat have a vram_constraint efficiency over 100%
| db = DatabaseConnection(str(db_path)) | ||
|
|
||
| jobs_df = db.fetch_all_jobs(table_name=table_name) | ||
| jobs_df = db.fetch_all_jobs(table_name=table_name) if query is None else db.fetch_query(query=query) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good, but ideally, we wait for Tan's PR to be merged and use those functions. If we don't have time to do that though, then we can stick to using this.
|
|
||
| gpu_jobs["job_hours"].sum(), # Total GPU Hours | ||
| # Mean Weighted VRAM Efficiency | ||
| (gpu_jobs["alloc_vram_efficiency"] * gpu_jobs["job_hours"]).sum() / gpu_jobs["job_hours"].sum(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should probably use VRAM hours here since the other weighted job metrics use that. Unless there's a reason why job_hours would work better here
| # Mean Weighted VRAM Efficiency | ||
| (gpu_jobs["alloc_vram_efficiency"] * gpu_jobs["job_hours"]).sum() / gpu_jobs["job_hours"].sum(), | ||
| # Median Weighted VRAM Efficiency | ||
| (gpu_jobs["alloc_vram_efficiency"] * gpu_jobs["job_hours"]).median() / gpu_jobs["job_hours"].median() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, should we use vram_hours instead?
| job_metrics_by_gpu_type = self.compare_job_metrics_by_gpu_type() | ||
|
|
||
| # Create a DataFrame to hold the GPU utilization patterns | ||
| gpu_utilization_patterns = pd.DataFrame({ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this just transposing the other df? It looks like we're creating rows out of the column values? Correct me if I'm wrong but could transposing the df do the same thing and be simplified? Try it and see
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this file, could we remove the older Efficiency Analysis cells (at least the plots that are irrelevant to A100s). If we aren't doing anything with users, we don't need to call the user metric functions and show the output here. I had to scroll quite a bit to find the actual A100 plots. It's better if we can simplify this notebook since we will also have another notebook later that contains the Efficiency analysis + time plots + ROC + A100 analysis for certain groups, so this notebook should only focus on A100s for easy reference to those functions
Here is the analysis for a100 gpu metrics. Currently, this is generating metrics as well as plots for the top3 requested gpus.