-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Energy Draw Tracker
Track and monitor energy draw for experiments related to model training, model inference in GPUs and CPUS.
Summary
The carbon footprint caused by energy consumption of GPUs and CPUs while doing model training and model inference could be reduced, if properly tracked and taken measures to reduce. By this tool, GPU/CPU usage for model training and model inference will be monitored, and logged.
Work Phases.
Non-Coding.
- Planning
- Documentation
- Prototype Release
- Testing
Implementation.
API
- Build an API for tracking with GPU devices with Nvidia.Use the
nvidia-smicommand's features.- Build Callbacks
- Build Python API plugin
References :
* power draw callback
* GpuStat
- Build an API for tracking with CPU (Intel/Mac).
- Build Callbacks
- Build API plugin to use with python.
References :
* PyRAPL
* EnergyUsage
Docker
- Write Dockerfile and upload the image to docker hub
Distributed Run
- Track scripts running on distributed machines.
- Add support for energy tracking for hyperparameter tuning using Jako
Logging
- Write API to log Energy output to CSV file. Columns include Timestamp, start time, end time, Energy in W/H, Device Type, Processor Type.
- Write API to log output to a postgres database. Columns include Timestamp, start time, end time, Energy in W/H, Device Type, Processor Type.
- Add Hasura API to manage the postgres database.
Visualisation
- Write APIs for visualising using Plotly/Dash and/or Metabase. Use the logging outputs from csvs/database for visualisations.
Documentation.
Write End User documentation, as well as Developer documentation.
-
End User Documentation:
- Introduction
- Mission
- Summary
- Frequently Asked Questions
- Getting Started
- Installation,
- Quickstart
- Examples.
- Logging
- Output logging
- Visualisation
- Introduction
-
Developer Documentation
- Tracker
- API reference for tracking with GPU
- API reference for tracking with CPU:
- API reference for tracking with Intel based processors
- API reference for tracking with M series MAC processors
- Logging
- API reference for logging to csv files
- API reference for logging to databases
- API reference for visualisation tools
- Tracker
Testing
All the testing can use the Bitcoin price prediction example
-
For model training:
- Run the API for tracking with GPU, log the monitored output into a csv file.
- CPU Tracking:
- Run the API for tracking with Intel based processors, log the monitored output into a csv file.
- Run the API for tracking with M series Mac based processors, log the monitored output into a csv file.
-
For model inference :
- Run the API for tracking with GPU, log the monitored output into a csv file.
- CPU Tracking:
- Run the API for tracking with Intel based processors, log the monitored output into a csv file.
- Run the API for tracking with M series Mac based processors, log the monitored output into a csv file.
-
For hyperparameter tuning (Using Talos for hyperparameter tuning) :
- Run the API for tracking with GPU, log the monitored output into a csv file.
- CPU Tracking:
- Run the API for tracking with Intel based processors, log the monitored output into a csv file.
- Run the API for tracking with M series Mac based processors, log the monitored output into a csv file.