Skip to content

Commit 54ac269

Browse files
authored
Added CarbonDB Docs; Changed ordering of other menu items (#131)
1 parent 0ed7580 commit 54ac269

File tree

6 files changed

+107
-4
lines changed

6 files changed

+107
-4
lines changed

content/en/docs/carbondb/_index.md

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
---
2+
title: "CarbonDB"
3+
description: " CarbonDB is a time-series database for storing and retrieving carbon and energy metrics"
4+
date: 2022-06-17T08:49:15+00:00
5+
weight: 700
6+
sidebar:
7+
collapsed: true
8+
---
9+
Lines changed: 94 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,94 @@
1+
---
2+
title: "Overview"
3+
description: "CarbonDB is a time-series database for storing and retrieving carbon and energy metrics"
4+
date: 2026-01-01T08:49:15+01:00
5+
weight: 1
6+
---
7+
8+
CarbonDB is a components of the Green Metrics Tool Suite to get an overview of your total organizations IT carbon emissions in real time.
9+
10+
Technicall it is a time-series database for storing and retrieving carbon and energy metrics.
11+
12+
## Data
13+
14+
- It can consume values either directly by ingesting them via API endpoint (`/vX/carbondb/add`) or
15+
- Importing them from components of the Green Metrics Tool Suite like
16+
* `PowerHOG`
17+
* `ScenarioRunner`
18+
* `Eco CI`
19+
20+
## Architecture and Design
21+
22+
CarbonDB consists of the following components:
23+
24+
* **API**: A FastAPI application that provides a RESTful API for adding and retrieving data from CarbonDB.
25+
* **Cron Jobs**: A set of cron jobs that are responsible for compressing and normalizing the data in CarbonDB.
26+
* **Database**: A PostgreSQL database that is used to store the data which is co-integrated with the normal GMT cluster database
27+
28+
For ingestion and querying CarbonDB provides a lot of parameters that you set to cluster and tag your data.
29+
30+
- `type`: The type of the data (e.g., `machine.desktop`, `website`).
31+
- `project`: The project that the data belongs to.
32+
- `machine`: The machine that the data was collected from.
33+
- `source`: The source of the data (e.g., `Power HOG`, `Eco CI`).
34+
- `tags`: A list of tags that are associated with the data.
35+
36+
All of these parameters can be set when sending data to the API, when querying but also in other products like *Eco CI* when you import data from GMT internal sources.
37+
38+
### Merge Window
39+
40+
To make reports and data output from CarbonDB useful you need at some point certain views to be final.
41+
42+
CarbonDB by default has a merge-window of 30 days. Which means data is considered to be immutable after 30 days.
43+
Up until this point you can still send "old" data to CarbonDB, for example energy data from 3 days ago.
44+
45+
After the merge window has passed and the timestamp of the data is to old CarbonDB will block direct inserts via API. The importer will only consider data that is less than 30 days old also.
46+
47+
## API
48+
49+
The easiest way to get data into CarbonDB is by sending data directly to the API.
50+
51+
The CarbonDB API provides the following endpoints:
52+
53+
* **/v2/carbondb/add**: This endpoint is used to add new energy data to CarbonDB. The data is sent in the form of a JSON object that contains the following core fields:
54+
* `time`: The timestamp of the data
55+
* `energy_uj`: The energy consumption in microjoules
56+
* `carbon_intensity_g`: The carbon intensity in g
57+
* `ip`: The IP that handed in the data (in case you are behind a NAT, VPN etc. )
58+
59+
The endpoints do directly backfill the IP if not explicitely supplied with the connecting IP. Carbon Intensity will not be backfilled automatically. For this a cronjob must be setup. See further down in the docs on this page.
60+
61+
* **/v2/carbondb**: This endpoint is used to retrieve data from CarbonDB. The data can be filtered by various parameters, such as `start_date`, `end_date`, `tags_include`, `tags_exclude`, etc.
62+
63+
You can find the always up to date documentation on the [self-documentating API]({{< relref "/docs/api" >}})
64+
65+
### Sending data example
66+
67+
We provide a minimal working example for a Linux agent that utilizes [Cloud Energy](https://github.com/green-coding-solutions/cloud-energy) to send the carbon emissions of a VM in your infrastructure constantly to CarbonDB.
68+
69+
👉 [CarbonDB Agent Linux](https://github.com/green-coding-solutions/carbondb-agent)
70+
71+
## Data Import
72+
73+
CarbonDB can import data from other GMT components like *Eco CI*, *ScenarioRunner* and *PowerHOG*.
74+
75+
To do so the Cronjobs must be setup.
76+
77+
Effectively data will then be copied over, de-duplicated and checked for consistency.
78+
79+
The *tags* and *projects* you manually set are retained. Some other fields like *source* and *type* are automatically set by the importer.
80+
81+
## Cron Jobs
82+
83+
Cronjobs are all placed in the `/cron` directory of GMT and are de-activated by default.
84+
85+
The following cron jobs are used to maintain the data in CarbonDB:
86+
87+
- **carbondb_copy_over_and_remove_duplicates.py**: This cron job copies data from other sources (e.g., `hog_simplified_measurements`, `ci_measurements`) into the `carbondb_data_raw` table. It also backfills missing carbon intensity data.
88+
- **carbondb_compress.py**: This cron job compresses the raw data in `carbondb_data_raw` into daily sums. It also normalizes the data by transforming text fields into integers and storing them in separate tables.
89+
- **backfill_geo.py**: This cron job will backfill geo information for IPs. They are needed to backfill carbon intensity which works on Geo coordinates.
90+
+ The cron job only backfills IP data up to 30 days. Then it considers information to be outdate
91+
+ Backfilling is done through three independent, fail-over Geo-IP Providers: ipinfo.io, ipapi.co and ip-api.com - They implementation requires no API key but may exhaust the free limit if you run a very large instance. So far this never happened ... contact us if it happened to you :)
92+
- **backfill_carbon_intensity.py**: This cron job takes existing geo coordinates of data and matches it to current carbon intensity data
93+
- The cron job works with live carbon intensity API endpoints from Electricity Maps. This means it will only backfill data up to 30 minutes to not have out of date values. You must setup the cronjob at least every 15 minutes.
94+
- It requires the `electricity_maps_token` to be set in the `config.yml`

content/en/docs/cluster/_index.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
title: "Cluster "
33
description: "Install and maintain a GMT cluster installation"
44
date: 2024-10-25T08:49:15+00:00
5-
weight: 1000
5+
weight: 1100
66
sidebar:
77
collapsed: true
88
---

content/en/docs/contributing/_index.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
title: "Contributing"
33
description: "Contributing to the Green Metrics Tool or to Example Applications"
44
date: 2022-06-15T08:49:15+00:00
5-
weight: 800
5+
weight: 900
66
sidebar:
77
collapsed: true
88
---

content/en/docs/declarations/_index.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
title: "Declarations"
33
description: "Official declarations for government bodies, certifications and enterprises"
44
date: 2024-09-30T01:49:15+00:00
5-
weight: 700
5+
weight: 800
66
sidebar:
77
collapsed: true
88
---

content/en/docs/help/_index.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
title: "Help"
33
description: "Green Metrics Tool Help"
44
date: 2022-06-20T01:49:15+00:00
5-
weight: 900
5+
weight: 1000
66
sidebar:
77
collapsed: true
88
---

0 commit comments

Comments
 (0)