Skip to content

Releases: ml-energy/zeus

Zeus v0.15.0 & Zeus Daemon v0.4.0

25 Feb 06:25

Choose a tag to compare

Zeus daemon and corresponding Zeus client improvements

Zeus daemon is now more generic with three API groups that can be selectively enabled at Zeus daemon startup time:

  • gpu-read: GPU power and energy measurements, no privilege needed
  • gpu-control: GPU frequency, power limit, etc. control, privilege needed
  • cpu-read: CPU power and energy measurements, privilege needed

Plus, Zeus daemon now support JWT-based auth. You can generate a JWT token scoped to a a subset of API groups and set ZEUSD_TOKEN to call APIs in allowed API groups.

What's Changed

Full Changelog: zeus-v0.14.0...zeus-v0.15.0

Zeus v0.14.0 & Zeus Daemon v0.3.0

19 Feb 03:32

Choose a tag to compare

Distributed Power Streaming

ZeusMonitor and PowerMonitor (well, every monitor we have) are local to a single machine. However, as ML workloads scale out, we frequently need multi-node power & energy monitoring and measurement.

We extended our Zeus daemon to stream real time power measurements to subscribed clients via SSE (Server-Sent Events). The PowerStreamingClient in the Zeus Python library can subscribe to multiple Zeus daemons across multiple nodes and aggregate power samples into a single stream. It doesn't have to be PowerStreamingClient; something as simple as curl -N http://node:port/gpu/stream_power works too.

What's Changed

Full Changelog: zeus-v0.13.1...zeus-v0.14.0

Zeus v0.13.1

13 Nov 23:58

Choose a tag to compare

Better AMD GPU support

Got access to AMD MI210, M250X, and MI300X, so I smoothed out edge cases. Quick follow-up release from v0.13.0 for AMD GPU users.

What's Changed

Full Changelog: zeus-v0.13.0...zeus-v0.13.1

Zeus v0.13.0

13 Nov 23:55

Choose a tag to compare

Breaking Changes

The low-level device APIs are now all snake_case, instead of camelCase. It had to be done. It was an old mistake from following how pynvml methods were named like.

What's New

Various monitor usability improvements. Zeus now also follows logging best practices.

What's Changed

Full Changelog: zeus-v0.12.3...zeus-v0.13.0

Zeus v0.12.3

19 Oct 05:05

Choose a tag to compare

New Features

CuPy synchronization support

It's not just deep learning our users are measuring energy for. There are other CUDA-based applications (e.g., cuDF) that are Python bindings of CUDA. Now, ZeusMonitor allows cupy as another mechanism for CPU-GPU synchronization at the boundary of measurement windows.

Temperature monitor

Temperature is a metric that also has a lot to do with power. It's a nice-to-have addition.

What's Changed

Full Changelog: zeus-v0.12.2...zeus-v0.12.3

Zeus v0.12.2

30 Sep 02:23

Choose a tag to compare

This is a maintenance release focused on security.

What's Changed

Full Changelog: zeus-v0.12.1...zeus-v0.12.2

Zeus v0.12.1

26 Jul 22:10

Choose a tag to compare

Change Highlights

New PowerMonitor

Power measurement over time was not a first-class feature, but now it is. The new PowerMonitor allows you to measure (1) GPU 1s windowed average power, (2) GPU instantaneous power, and (3) GPU memory windowed average power -- if supported by your GPU model -- over time, and export deduplicated power samples into a list of timestamps and power measurements.

Grace Hopper support

Zeus now supports measurements on Grace Hopper platforms. When you use the same Zeus APIs, it'll give you back the whole module's power and energy consumption (i.e., including the Grace CPU and the Hopper GPU). Support is still early stage, so please let us know if you bump into any rough edges.

uv

We're using uv in CI and local dev flow, and now uv.lock is in our codebase as well. Notably, uv has cut our CI time to literally half of what it used to be!

What's Changed

New Contributors

Full Changelog: zeus-v0.12.0...zeus-v0.12.1

Zeus v0.12.0

17 May 06:11

Choose a tag to compare

Change Highlights

New SoC device measurement support!

We have a new device abstraction in zeus.device.soc. Measurements can be accessed from the soc field in ZeusMonitor measurement objects.

Apple Silicon

Zeus now provides energy measurement on Apple Silicon chips with component breakdowns like CPU, GPU, DRAM, and ANE (specifics depend on the underlying chip). This is done via a new child project called zeus-apple-silicon. Check out details in our documentation.

NVIDIA Jetson Platform

NVIDIA Jetson is an embedded platform for AI workloads. Zeus now supports energy measurement on Jetson platforms by reading off of its on-board power monitor. Check out details in our documentation.

Electricity price tracking

Via integration with the OpenEI API, Zeus now allows electricity price tracking with the EnergyCostMonitor class. Its API is essentially the same as ZeusMonitor (i.e., measurement windows).

What's Changed

New Contributors

Full Changelog: zeus-v0.11.0...zeus-v0.12.0

Zeus Daemon v0.2.0

03 Feb 07:18

Choose a tag to compare

Change Highlights

CPU and DRAM energy measurements

Zeus daemon now also supports CPU and DRAM energy measurements with RAPL, which also requires root privileges just for measurement. Zeus daemon has also been integrated into the Zeus Python library, so as long as you have the daemon deployed and you set the ZEUSD_SOCK_PATH environment variable, you'll be all set!

What's Changed

  • [Feat] Implement CPU and DRAM monitoring for zeusd by @wbjin in #137
  • Incorporate Zeusd for CPU and DRAM monitoring in ZeusMonitor by @michahn01 in #150
  • Trace GPU ID in Zeusd GPU routes by @jaywonchung in #152

Zeus v0.11.0

03 Feb 07:10

Choose a tag to compare

Change Highlights

Renamed to zeus!

Until now we used zeus-ml because the name zeus was taken on PyPI, but now we're finally able to move to zeus:

pip install zeus

Prometheus Metrics

Zeus power and energy measurements can now be exported as Prometheus metrics! We currently support three metrics:

  • Energy consumption of a fixed code range (Histogram)
  • Power draw over time (Gauge)
  • Cumulative energy consumption over time (Counter)

We wrote up a detailed metric monitoring guide and integration examples.

AMD GPU enhancements

We created ROCm AMDSMI Python bindings (GitHub, PyPI) and integrated it with Zeus. Before this, users had to cd into their ROCm installation's AMDSMI distribution directory and run pip install, which isn't very convenient.

Our bindings are unofficial & community-maintained. But AMDSMI maintainers did take a look (ROCm/amdsmi#8).

Carbon Emission Estimations

The new zeus.monitor.carbon.CarbonEmissionMonitor takes in a carbon intensity provider (e.g., from ElectricityMaps) and provides an estimate for operational carbon emissions. The window-based API is essentially the same as ZeusMonitor.

Full Changelog

New Contributors