Skip to content

Commit ffe0c25

Browse files
authored
Add logfire.instrument_system_metrics() (#373)
1 parent 5d9a16f commit ffe0c25

File tree

22 files changed

+541
-192
lines changed

22 files changed

+541
-192
lines changed

docs/guides/onboarding_checklist/add_metrics.md

Lines changed: 11 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,16 @@
11
**Pydantic Logfire** can be used to collect metrics from your application and send them to a metrics backend.
22

3-
Let's see how to create, and use metrics in your application.
3+
Metrics are a great way to record numerical values where you want to see an aggregation of the data (e.g. over time),
4+
rather than the individual values.
5+
6+
## System Metrics
7+
8+
The easiest way to start using metrics is to enable system metrics.
9+
See the [System Metrics][system-metrics] documentation to learn more.
10+
11+
## Manual Metrics
12+
13+
Let's see how to create and use custom metrics in your application.
414

515
```py
616
import logfire
@@ -13,11 +23,6 @@ def send_message():
1323
messages_sent.add(1)
1424
```
1525

16-
## Metric Types
17-
18-
Metrics are a great way to record number values where you want to see an aggregation of the data (e.g. over time),
19-
rather than the individual values.
20-
2126
### Counter
2227

2328
The Counter metric is particularly useful when you want to measure the frequency or occurrence of a certain
@@ -250,18 +255,6 @@ logfire.metric_up_down_counter_callback(
250255

251256
You can read more about the Up-Down Counter metric in the [OpenTelemetry documentation][up-down-counter-callback-metric].
252257

253-
## System Metrics
254-
255-
By default, **Logfire** does not collect system metrics.
256-
257-
To enable metrics, you need just need install the `logfire[system-metrics]` extra:
258-
259-
{{ install_logfire(extras=['system-metrics']) }}
260-
261-
**Logfire** will automatically collect system metrics if the `logfire[system-metrics]` extra is installed.
262-
263-
To know more about which system metrics are collected, check the [System Metrics][system-metrics] documentation.
264-
265258
[counter-metric]: https://opentelemetry.io/docs/specs/otel/metrics/api/#counter
266259
[histogram-metric]: https://opentelemetry.io/docs/specs/otel/metrics/api/#histogram
267260
[up-down-counter-metric]: https://opentelemetry.io/docs/specs/otel/metrics/api/#updowncounter

docs/guides/web_ui/dashboards.md

Lines changed: 11 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -19,14 +19,20 @@ This dashboard offers a high-level view of your web services' well-being. It lik
1919
* **Percent of 5XX Requests:** Percentage of requests that resulted in server errors (status codes in the 500 range).
2020
* **Log Type Ratio**: Breakdown of the different log types generated by your web service (e.g., info, warning, error).
2121

22-
## System Metrics
22+
## Basic System Metrics
2323

24-
This dashboard focuses on system resource utilization, potentially including:
24+
This dashboard shows essential system resource utilization metrics. It comes in two variants:
25+
26+
- **Basic System Metrics (Logfire):** Uses the data exported by [`logfire.instrument_system_metrics()`](../../integrations/system_metrics.md).
27+
- **Basic System Metrics (OpenTelemetry):** Uses data exported by any OpenTelemetry-based instrumentation following the standard semantic conventions.
28+
29+
Both variants include the following metrics:
2530

26-
* **CPU Usage:** Percentage of processing power utilized by the system.
27-
* **Memory Usage:** Amount of memory currently in use by the system.
2831
* **Number of Processes:** Total number of running processes on the system.
29-
* **Swap Usage:** Amount of swap space currently in use by the system.
32+
* **System CPU usage %:** Percentage of total available processing power utilized by the whole system, i.e. the average across all CPU cores.
33+
* **Process CPU usage %:** CPU used by a single process, where e.g. using 2 CPU cores to full capacity would result in a value of 200%.
34+
* **Memory Usage %:** Percentage of memory currently in use by the system.
35+
* **Swap Usage %:** Percentage of swap space currently in use by the system.
3036

3137
## Custom Dashboards
3238

Lines changed: 80 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -1,28 +1,84 @@
1-
By default, **Logfire** does not collect system metrics.
1+
The [`logfire.instrument_system_metrics()`][logfire.Logfire.instrument_system_metrics] method can be used to collect system metrics with **Logfire**, such as CPU and memory usage.
22

3-
To enable metrics, you need to install the `logfire[system-metrics]` extra:
3+
## Installation
4+
5+
Install `logfire` with the `system-metrics` extra:
46

57
{{ install_logfire(extras=['system-metrics']) }}
68

7-
### Available Metrics
8-
9-
Logfire collects the following system metrics:
10-
11-
* `system.cpu.time`: CPU time spent in different modes.
12-
* `system.cpu.utilization`: CPU utilization in different modes.
13-
* `system.memory.usage`: Memory usage.
14-
* `system.memory.utilization`: Memory utilization in different modes.
15-
* `system.swap.usage`: Swap usage.
16-
* `system.swap.utilization`: Swap utilization
17-
* `system.disk.io`: Disk I/O operations (read/write).
18-
* `system.disk.operations`: Disk operations (read/write).
19-
* `system.disk.time`: Disk time (read/write).
20-
* `system.network.dropped.packets`: Dropped packets (transmit/receive).
21-
* `system.network.packets`: Packets (transmit/receive).
22-
* `system.network.errors`: Network errors (transmit/receive).
23-
* `system.network.io`: Network I/O (transmit/receive).
24-
* `system.network.connections`: Network connections (family/type).
25-
* `system.thread_count`: Thread count.
26-
* `process.runtime.memory`: Process memory usage.
27-
* `process.runtime.cpu.time`: Process CPU time.
28-
* `process.runtime.gc_count`: Process garbage collection count.
9+
## Usage
10+
11+
```py
12+
import logfire
13+
14+
logfire.configure()
15+
16+
logfire.instrument_system_metrics()
17+
```
18+
19+
Then in your project, click on 'Dashboards' in the top bar, click 'New Dashboard', and select 'Basic System Metrics (Logfire)' from the dropdown.
20+
21+
## Configuration
22+
23+
By default, `instrument_system_metrics` collects only the metrics it needs to display the 'Basic System Metrics (Logfire)' dashboard. You can choose exactly which metrics to collect and how much data to collect about each metric. The default is equivalent to this:
24+
25+
```py
26+
logfire.instrument_system_metrics({
27+
'process.runtime.cpu.utilization': None, # (1)!
28+
'system.cpu.simple_utilization': None, # (2)!
29+
'system.memory.utilization': ['available'], # (3)!
30+
'system.swap.utilization': ['used'], # (4)!
31+
})
32+
```
33+
34+
1. `process.runtime.cpu.utilization` will lead to exporting a metric that is actually named `process.runtime.cpython.cpu.utilization` or a similar name depending on the Python implementation used. The `None` value means that there are no fields to configure for this metric. The value of this metric is `[psutil.Process().cpu_percent()](https://psutil.readthedocs.io/en/latest/#psutil.Process.cpu_percent) / 100`, i.e. the fraction of CPU time used by this process, where 1 means using 100% of a single CPU core. The value can be greater than 1 if the process uses multiple cores.
35+
2. The `None` value means that there are no fields to configure for this metric. The value of this metric is `[psutil.cpu_percent()](https://psutil.readthedocs.io/en/latest/#psutil.cpu_percent) / 100`, i.e. the fraction of CPU time used by the whole system, where 1 means using 100% of all CPU cores.
36+
3. The value here is a list of 'modes' of memory. The full list can be seen in the [`psutil` documentation](https://psutil.readthedocs.io/en/latest/#psutil.virtual_memory). `available` is "the memory that can be given instantly to processes without the system going into swap. This is calculated by summing different memory metrics that vary depending on the platform. It is supposed to be used to monitor actual memory usage in a cross platform fashion." The value of the metric is a number between 0 and 1, and subtracting the value from 1 gives the fraction of memory used.
37+
4. This is the fraction of available swap used. The value is a number between 0 and 1.
38+
39+
To collect lots of detailed data about all available metrics, use `logfire.instrument_system_metrics(base='full')`.
40+
41+
!!! warning
42+
The amount of data collected by `base='full'` can be expensive, especially if you have many servers,
43+
and this is easy to forget about. If you enable this, be sure to monitor your usage and costs.
44+
45+
The most expensive metrics are `system.cpu.utilization/time` which collect data for each core and each mode,
46+
and `system.disk.*` which collect data for each disk device. The exact number depends on the machine hardware,
47+
but this can result in hundreds of data points per minute from each instrumented host.
48+
49+
`logfire.instrument_system_metrics(base='full')` is equivalent to:
50+
51+
```py
52+
logfire.instrument_system_metrics({
53+
'system.cpu.simple_utilization': None,
54+
'system.cpu.time': ['idle', 'user', 'system', 'irq', 'softirq', 'nice', 'iowait', 'steal', 'interrupt', 'dpc'],
55+
'system.cpu.utilization': ['idle', 'user', 'system', 'irq', 'softirq', 'nice', 'iowait', 'steal', 'interrupt', 'dpc'],
56+
'system.memory.usage': ['available', 'used', 'free', 'active', 'inactive', 'buffers', 'cached', 'shared', 'wired', 'slab', 'total'],
57+
'system.memory.utilization': ['available', 'used', 'free', 'active', 'inactive', 'buffers', 'cached', 'shared', 'wired', 'slab'],
58+
'system.swap.usage': ['used', 'free'],
59+
'system.swap.utilization': ['used'],
60+
'system.disk.io': ['read', 'write'],
61+
'system.disk.operations': ['read', 'write'],
62+
'system.disk.time': ['read', 'write'],
63+
'system.network.dropped.packets': ['transmit', 'receive'],
64+
'system.network.packets': ['transmit', 'receive'],
65+
'system.network.errors': ['transmit', 'receive'],
66+
'system.network.io': ['transmit', 'receive'],
67+
'system.thread_count': None,
68+
'process.runtime.memory': ['rss', 'vms'],
69+
'process.runtime.cpu.time': ['user', 'system'],
70+
'process.runtime.gc_count': None,
71+
'process.runtime.thread_count': None,
72+
'process.runtime.cpu.utilization': None,
73+
'process.runtime.context_switches': ['involuntary', 'voluntary'],
74+
'process.open_file_descriptor.count': None,
75+
})
76+
```
77+
78+
Each key here is a metric name. The values have different meanings for different metrics. For example, for `system.cpu.utilization`, the value is a list of CPU modes. So there will be a separate row for each CPU core saying what percentage of time it spent idle, another row for the time spent waiting for IO, etc. There are no fields to configure for `system.thread_count`, so the value is `None`.
79+
80+
For convenient customizability, the first dict argument is merged with the base. For example, if you want to collect disk read operations (but not writes) you can write:
81+
82+
- `logfire.instrument_system_metrics({'system.disk.operations': ['read']})` to collect that data in addition to the basic defaults.
83+
- `logfire.instrument_system_metrics({'system.disk.operations': ['read']}, base='full')` to collect detailed data about all metrics, excluding disk write operations.
84+
- `logfire.instrument_system_metrics({'system.disk.operations': ['read']}, base=None)` to collect only disk read operations and nothing else.

logfire-api/logfire_api/__init__.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -123,6 +123,8 @@ def instrument_openai(self, *args, **kwargs) -> ContextManager[None]:
123123

124124
def instrument_aiohttp_client(self, *args, **kwargs) -> None: ...
125125

126+
def instrument_system_metrics(self, *args, **kwargs) -> None: ...
127+
126128
def shutdown(self, *args, **kwargs) -> None: ...
127129

128130
DEFAULT_LOGFIRE_INSTANCE = Logfire()
@@ -158,6 +160,7 @@ def shutdown(self, *args, **kwargs) -> None: ...
158160
instrument_redis = DEFAULT_LOGFIRE_INSTANCE.instrument_redis
159161
instrument_pymongo = DEFAULT_LOGFIRE_INSTANCE.instrument_pymongo
160162
instrument_mysql = DEFAULT_LOGFIRE_INSTANCE.instrument_mysql
163+
instrument_system_metrics = DEFAULT_LOGFIRE_INSTANCE.instrument_system_metrics
161164
shutdown = DEFAULT_LOGFIRE_INSTANCE.shutdown
162165

163166
def no_auto_trace(x):

logfire-api/logfire_api/__init__.pyi

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ from .integrations.logging import LogfireLoggingHandler as LogfireLoggingHandler
1111
from .integrations.structlog import LogfireProcessor as StructlogProcessor
1212
from .version import VERSION as VERSION
1313

14-
__all__ = ['Logfire', 'LogfireSpan', 'LevelName', 'ConsoleOptions', 'PydanticPlugin', 'configure', 'span', 'instrument', 'log', 'trace', 'debug', 'notice', 'info', 'warn', 'error', 'exception', 'fatal', 'force_flush', 'log_slow_async_callbacks', 'install_auto_tracing', 'instrument_fastapi', 'instrument_openai', 'instrument_anthropic', 'instrument_asyncpg', 'instrument_httpx', 'instrument_celery', 'instrument_requests', 'instrument_psycopg', 'instrument_django', 'instrument_flask', 'instrument_starlette', 'instrument_aiohttp_client', 'instrument_sqlalchemy', 'instrument_redis', 'instrument_pymongo', 'instrument_mysql', 'AutoTraceModule', 'with_tags', 'with_settings', 'shutdown', 'load_spans_from_file', 'no_auto_trace', 'METRICS_PREFERRED_TEMPORALITY', 'ScrubMatch', 'ScrubbingOptions', 'VERSION', 'suppress_instrumentation', 'StructlogProcessor', 'LogfireLoggingHandler', 'TailSamplingOptions']
14+
__all__ = ['Logfire', 'LogfireSpan', 'LevelName', 'ConsoleOptions', 'PydanticPlugin', 'configure', 'span', 'instrument', 'log', 'trace', 'debug', 'notice', 'info', 'warn', 'error', 'exception', 'fatal', 'force_flush', 'log_slow_async_callbacks', 'install_auto_tracing', 'instrument_fastapi', 'instrument_openai', 'instrument_anthropic', 'instrument_asyncpg', 'instrument_httpx', 'instrument_celery', 'instrument_requests', 'instrument_psycopg', 'instrument_django', 'instrument_flask', 'instrument_starlette', 'instrument_aiohttp_client', 'instrument_sqlalchemy', 'instrument_redis', 'instrument_pymongo', 'instrument_mysql', 'instrument_system_metrics', 'AutoTraceModule', 'with_tags', 'with_settings', 'shutdown', 'load_spans_from_file', 'no_auto_trace', 'METRICS_PREFERRED_TEMPORALITY', 'ScrubMatch', 'ScrubbingOptions', 'VERSION', 'suppress_instrumentation', 'StructlogProcessor', 'LogfireLoggingHandler', 'TailSamplingOptions']
1515

1616
DEFAULT_LOGFIRE_INSTANCE = Logfire()
1717
span = DEFAULT_LOGFIRE_INSTANCE.span
@@ -35,6 +35,7 @@ instrument_sqlalchemy = DEFAULT_LOGFIRE_INSTANCE.instrument_sqlalchemy
3535
instrument_redis = DEFAULT_LOGFIRE_INSTANCE.instrument_redis
3636
instrument_pymongo = DEFAULT_LOGFIRE_INSTANCE.instrument_pymongo
3737
instrument_mysql = DEFAULT_LOGFIRE_INSTANCE.instrument_mysql
38+
instrument_system_metrics = DEFAULT_LOGFIRE_INSTANCE.instrument_system_metrics
3839
shutdown = DEFAULT_LOGFIRE_INSTANCE.shutdown
3940
with_tags = DEFAULT_LOGFIRE_INSTANCE.with_tags
4041
with_settings = DEFAULT_LOGFIRE_INSTANCE.with_settings

0 commit comments

Comments
 (0)