Throughout this course you've been applying the IDOMI process — Impacts, Dependencies, Observations, Methodology, Implementation — to a single server component. In practice, measuring the SCI of a real application involves many components, each with different data sources and pipelines.
This module covers how to scale IDOMI to a full investigation. We'll use the Green Software Foundation website (greensoftware.foundation) as a worked example — a simple static site served on Netlify that, as you'll see, involves a surprising amount of nuance.
An investigation has three phases:
- Scoping — Define the boundary, components, grouping, and functional unit
- IDOMI per component — Apply the familiar process to each component in the tree
- Execution — Write, run, visualise, and share your manifest
This module covers the first two phases. The wrap-up module covers execution.
Before you can apply IDOMI, you need to decide what you're measuring. This scoping work is new — in the walkthrough modules, the scope was given to you (one server). In a real investigation, you have to define it yourself.
The application boundary is all the individual pieces you want to observe in order to capture a representative SCI score for your application. What carbon-emitting processes have to happen to enable someone to use your app?
There is some judgement required to define the boundary. This should certainly include the energy used to run your app and the embodied carbon of the hardware used to run it. The SCI specification suggests the following infrastructure should be in scope:
The calculation of SCI shall include all supporting infrastructure and systems that significantly contribute to the software's operation:
- compute resources
- storage
- networking equipment
- memory
- monitoring
- idle machines
- logging
- scanning
- build and deploy pipelines
- testing
- training ML models
- operations
- backup
- resources to support redundancy
- resources to support failover
- end user devices
- IoT devices
- Edge devices
This isn't necessarily comprehensive, but it gives a good sense of what's expected. Other factors might be appropriate in some circumstances, and several of these suggestions might be irrelevant for your application.
Here's an example inventory for the GSF website:
- Github storage (storing website source code on Github servers)
- Netlify builds (creating build artefacts from source code using Netlify)
- Netlify static site storage (storing the static site data on a Netlify server)
- Cache storage across content delivery network (caching static site data at several nodes across a CDN)
- Data transferred over network (transferring site data from server to user)
- End users viewing site in browser (energy required to display site in the user's browser)
- Embodied carbon of Github server (for storing and serving source code)
- Embodied carbon of static site servers, incl CDN (for storing and serving static site)
- Embodied carbon of end user devices (for viewing content)
The individual components identified within the application boundary can be grouped under common parents. This matters because when you aggregate impacts up your tree, you get an aggregated value for each parent node, so designing your grouping well makes it easier to gain insights into which parts of your stack are emitting the most carbon.
Let's explore the grouping for the GSF website:
- Development
- Github storage (storing website source code on Github servers)
- Servers
- Netlify builds
- Storing static site data at origin server
- Embodied carbon for web server
- Embodied carbon for CDN
- Embodied carbon for Github server
- User-devices
- End user operational carbon
- Embodied carbon for end user devices
- Networking
- Networking energy to serve static site over the wire
The specific hierarchy should reflect the classes of activity you want to break your total carbon emissions into, because this gives you the best insight into where to focus mitigation strategies. For different applications, the right hierarchy might be different — maybe you don't have much development activity, but you do want to divide 10,000 servers by region to see which geographic location you should focus efforts on. There's no fixed rule; it's about deciding what surfaces information you can action, and what makes sense when you aggregate information up the tree from child to parent.
Determining the appropriate functional unit involves answering two questions:
- What is a sensible functional unit to express the SCI for this application?
- Is data for this functional unit available?
For the GSF website, it made sense to express carbon emissions as a mass per page visit. This data was available from Google Analytics at a daily time resolution.
A sensible functional unit is one that allows you to demonstrate variations in the carbon efficiency of your application even as it scales. For a website, visits makes a lot of sense because the page exists for the purpose of enabling visitors, and total energy usage scales with the number of visits. Therefore, it won't appear as though there was backwards progress if the site's overall carbon emissions increase as a result of more visitors. Likewise, normalising to visits protects against misinterpreting total carbon emissions decreasing due to fewer visitors as improvements in carbon efficiency.
The other factor to consider is comparability with other similar systems. Two websites might have very different technology stacks, scales, deployment details and purposes, but their carbon intensity can both be measured in mass of carbon per visit, making it possible to compare the efficiency of one against another.
With the scope defined, you now apply the IDOMI process you've been practising throughout this course — but to each component in your tree.
For an SCI investigation, the target impact is the same for every component: carbon (which feeds into the SCI calculation). The dependency tree for each component will follow the familiar pattern:
☑️ sci
- ☑️ operational-carbon
- ☑️ operational-energy
- ☑️ carbon-intensity
- ☑️ embodied-carbon
- ☑️ functional-unit
However, the specifics of how you reach each of those dependencies will vary from component to component.
For each component, you need to audit what data is available and build out the dependency tree with concrete observations. This is the D and O of IDOMI.
Observations can be categorised into several types:
- Direct measurements
- Power / energy
- Proxies: cpu-utilisation, memory-utilisation, data transfer, cost
- Indirect measurements
- Manufacturer data sheets
- Analogue systems
- Digital twins
- Heuristics and generalisations
- Coefficients gathered from literature
- Regional/global averages
- Qualitative estimates
- Educated guesses
Ideally, you would always directly measure power consumption for each individual component, but that is rarely possible, especially in the cloud. Instead, you audit what data is available that can be translated into the data you want using some model. Typically, the smaller the pipeline of operations required to link the observation to the energy value you actually want, the better (although this isn't always true, as simple coefficients are easy to implement but may be inaccurate).
For example, for a virtual machine running on Azure, you can check the Azure Portal and find a dashboard that shows metrics such as CPU utilisation, memory utilisation, network traffic, etc. This data is available via a monitor API, meaning you can programmatically scrape the data rather than manually extract it from the dashboard.
Here's what this looks like for one component of the GSF website:
- Servers
- Netlify builds
- Direct measurements:
- Number of builds per day from Netlify dashboard
- Indirect measurements:
- CPU util and memory util during build, inferred from running build process on local machine and gathering metrics using
top
- CPU util and memory util during build, inferred from running build process on local machine and gathering metrics using
- Heuristics and generalisations:
- Coefficient for memory → energy from CCF
- Power curve relating CPU util to TDP factor from Teads article
- Processor TDP from GSF data based on assumed server specs used for Netlify build
- Direct measurements:
- Netlify builds
This is the M and I of IDOMI — connecting each component's available observations to SCI by choosing the right plugins and building a pipeline.
In some cases, new plugins might be needed, but many investigations can be performed using the standard library of builtins alone.
We have observations of the data transferred over the network when a user loads the website. We then rely on a coefficient published by CCF (0.000392 kWh/GB) to convert that value into energy.
Working through IDOMI for this component:
- Impact: carbon (from networking)
- Dependencies: energy → carbon (via carbon-intensity)
- Observation: data transferred in GB
- Methodology: coefficient to convert GB → kWh, then multiply by carbon intensity
- Implementation:
Coefficient→Multiply→Sci
We can sketch out the pipeline's inputs and outputs:
Coefficient:
Inputs:
data-transferred: GB
coefficient: 0.000392 kWh/GB
Outputs:
energy: kWhMultiply:
Inputs:
energy: kWh
carbon-intensity: gCO2e/kWh
Outputs:
carbon: gCO2eSci:
Inputs:
carbon: gCO2e
site-visits: visits
Outputs:
sci: gCO2e/visitYou would repeat this IDOMI process for every component in your tree. Some components will have similar pipelines (e.g. multiple servers might share the same CPU → energy → carbon chain you built in the walkthrough). Others will be quite different (e.g. embodied carbon for end-user devices might use a simple coefficient rather than the SciEmbodied plugin).
- An investigation starts with scoping: defining the boundary, components, grouping, and functional unit.
- You then apply IDOMI per component — the same process you've been practising, but repeated for each component in your tree.
- Different components will have different observation types and pipelines, but the process for developing each one is the same.
- The IDOMI process you learned in modules 3–10 scales naturally to complex, multi-component applications.