-
Notifications
You must be signed in to change notification settings - Fork 0
Feature/high cpu mem analysis #22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…s4cg-job-analytics into feature/high-cpu-mem-analysis
…m of nodes and total gpu_count of nodes for each job
…ic in the notebook
…g ratio of cores to ratio of gpus requested
…ncyAnalysis and its subclasses
…eature/high-cpu-mem-analysis
src/analysis/efficiency_analysis.py
Outdated
| def sort_and_filter_records_with_metrics( | ||
| self, | ||
| metrics_df_name_enum: MetricsDataFrameNameEnum, | ||
| metrics_df_name_enum: MetricsDFNameEnumT, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need to add this to every method? I'm not sure why anyone would try to pass in anything other than jobs here. Similar for other methods. I think the earlier implementation was fine. It would be good to have if the function did different stuff if different types of DFs were passed, but since this only runs on jobs, we should just use the jobs df (raise an error and calculate the metrics if it doesn't work). And merging this would break the reports, any other pieces like frequency analysis, a100, and ROC, unless they're all changed. I don't think it's feasible to do this right now in my opinion, and it's not needed unless the function handles different DFs.
| with open(self.local_path, "w") as f: | ||
| json.dump(remote_info, f, indent=2) | ||
| print(f"Fetched and saved {self.local_path.name} from remote URL.") | ||
| if os.getenv("OUTPUT_MODE") == "VERBOSE": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do we change this so that it is printed? Do we have to manually set that env variable? If so, we should add something in the documentation to specify that.
Implement ResourceHoarding class which enables the analysis of jobs and users who cause resources being inaccessible to others by requesting a disproportionate amount of CPU cores or RAM.
Also adds a demo notebook to show the functionality of the class.
Changes were made to other files when necessary for implementation of this feature.