Skip to content

Energy Draw Tracker #1

@abhijithneilabraham

Description

@abhijithneilabraham

Energy Draw Tracker

Track and monitor energy draw for experiments related to model training, model inference in GPUs and CPUS.

Summary

The carbon footprint caused by energy consumption of GPUs and CPUs while doing model training and model inference could be reduced, if properly tracked and taken measures to reduce. By this tool, GPU/CPU usage for model training and model inference will be monitored, and logged.

Work Phases.

Non-Coding.

  • Planning
  • Documentation
  • Prototype Release
  • Testing

Implementation.

API

  • Build an API for tracking with GPU devices with Nvidia.Use the nvidia-smi command's features.
    • Build Callbacks
    • Build Python API plugin

References :
* power draw callback
* GpuStat

  • Build an API for tracking with CPU (Intel/Mac).
    • Build Callbacks
    • Build API plugin to use with python.

References :
* PyRAPL
* EnergyUsage

Docker

  • Write Dockerfile and upload the image to docker hub

Distributed Run

  • Track scripts running on distributed machines.
    • Add support for energy tracking for hyperparameter tuning using Jako

Logging

  • Write API to log Energy output to CSV file. Columns include Timestamp, start time, end time, Energy in W/H, Device Type, Processor Type.
  • Write API to log output to a postgres database. Columns include Timestamp, start time, end time, Energy in W/H, Device Type, Processor Type.
  • Add Hasura API to manage the postgres database.

Visualisation

  • Write APIs for visualising using Plotly/Dash and/or Metabase. Use the logging outputs from csvs/database for visualisations.

Documentation.

Write End User documentation, as well as Developer documentation.

  • End User Documentation:

    • Introduction
      • Mission
      • Summary
      • Frequently Asked Questions
    • Getting Started
      • Installation,
      • Quickstart
      • Examples.
    • Logging
      • Output logging
      • Visualisation
  • Developer Documentation

    • Tracker
      • API reference for tracking with GPU
      • API reference for tracking with CPU:
        • API reference for tracking with Intel based processors
        • API reference for tracking with M series MAC processors
    • Logging
      • API reference for logging to csv files
      • API reference for logging to databases
      • API reference for visualisation tools

Testing

All the testing can use the Bitcoin price prediction example

  • For model training:

    • Run the API for tracking with GPU, log the monitored output into a csv file.
    • CPU Tracking:
      • Run the API for tracking with Intel based processors, log the monitored output into a csv file.
      • Run the API for tracking with M series Mac based processors, log the monitored output into a csv file.
  • For model inference :

    • Run the API for tracking with GPU, log the monitored output into a csv file.
    • CPU Tracking:
      • Run the API for tracking with Intel based processors, log the monitored output into a csv file.
      • Run the API for tracking with M series Mac based processors, log the monitored output into a csv file.
  • For hyperparameter tuning (Using Talos for hyperparameter tuning) :

    • Run the API for tracking with GPU, log the monitored output into a csv file.
    • CPU Tracking:
      • Run the API for tracking with Intel based processors, log the monitored output into a csv file.
      • Run the API for tracking with M series Mac based processors, log the monitored output into a csv file.

Metadata

Metadata

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions