Skip to content

Conversation

@thomasywang
Copy link
Contributor

Summary:
Stack context:

Our tracing subscriber has 3 layers:
- File logging
- Scuba
- Sqlite (usually off)

Although the actual Scuba logging is done in a background thread and we are using a non-blocking file writer, we still have a good chunk of work that happens for events & spans. The solution to this, is to create a `UnifiedLayer` that just sends everything into a background worker, that then delivers all traces to each `Exporter` to handle.

In this diff, we will create an initial `UnifiedLayer` and incrementally move each existing layer into an `Exporter`.

To test correctness, we will run both the old and unified implementations for initializing telemetry on a variety of workloads, and ensure that both are producing the same results

In this diff we will create an Exporter meant to replace otel::tracing_layer() (which is really just an alias for scuba). We log to two different scuba tables: monarch_tracing and monarch_executions. We will test correctness by injecting a mock scuba client that simply appends all samples it intends to log, and ensure that both the old and the unified implementations produce the same samples

Differential Revision: D87363772

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Nov 19, 2025
@meta-codesync
Copy link

meta-codesync bot commented Nov 19, 2025

@thomasywang has exported this pull request. If you are a Meta employee, you can view the originating Diff in D87363772.

thomasywang added a commit to thomasywang/monarch-1 that referenced this pull request Nov 19, 2025
Summary:

Stack context:
```
Our tracing subscriber has 3 layers:
- File logging
- Scuba
- Sqlite (usually off)

Although the actual Scuba logging is done in a background thread and we are using a non-blocking file writer, we still have a good chunk of work that happens for events & spans. The solution to this, is to create a `UnifiedLayer` that just sends everything into a background worker, that then delivers all traces to each `Exporter` to handle.

In this diff, we will create an initial `UnifiedLayer` and incrementally move each existing layer into an `Exporter`.

To test correctness, we will run both the old and unified implementations for initializing telemetry on a variety of workloads, and ensure that both are producing the same results
```

In this diff we will create an `Exporter` meant to replace `otel::tracing_layer()` (which is really just an alias for scuba). We log to two different scuba tables: monarch_tracing and monarch_executions. We will test correctness by injecting a mock scuba client that simply appends all samples it intends to log, and ensure that both the old and the unified implementations produce the same samples

Differential Revision: D87363772
@thomasywang thomasywang force-pushed the export-D87363772 branch 2 times, most recently from 99f057a to 414b7fc Compare November 19, 2025 16:27
thomasywang added a commit to thomasywang/monarch-1 that referenced this pull request Nov 19, 2025
Summary:

Stack context:
```
Our tracing subscriber has 3 layers:
- File logging
- Scuba
- Sqlite (usually off)

Although the actual Scuba logging is done in a background thread and we are using a non-blocking file writer, we still have a good chunk of work that happens for events & spans. The solution to this, is to create a `UnifiedLayer` that just sends everything into a background worker, that then delivers all traces to each `Exporter` to handle.

In this diff, we will create an initial `UnifiedLayer` and incrementally move each existing layer into an `Exporter`.

To test correctness, we will run both the old and unified implementations for initializing telemetry on a variety of workloads, and ensure that both are producing the same results
```

In this diff we will create an `Exporter` meant to replace `otel::tracing_layer()` (which is really just an alias for scuba). We log to two different scuba tables: monarch_tracing and monarch_executions. We will test correctness by injecting a mock scuba client that simply appends all samples it intends to log, and ensure that both the old and the unified implementations produce the same samples

Differential Revision: D87363772
thomasywang added a commit to thomasywang/monarch-1 that referenced this pull request Nov 19, 2025
Summary:

Stack context:
```
Our tracing subscriber has 3 layers:
- File logging
- Scuba
- Sqlite (usually off)

Although the actual Scuba logging is done in a background thread and we are using a non-blocking file writer, we still have a good chunk of work that happens for events & spans. The solution to this, is to create a `UnifiedLayer` that just sends everything into a background worker, that then delivers all traces to each `Exporter` to handle.

In this diff, we will create an initial `UnifiedLayer` and incrementally move each existing layer into an `Exporter`.

To test correctness, we will run both the old and unified implementations for initializing telemetry on a variety of workloads, and ensure that both are producing the same results
```

In this diff we will create an `Exporter` meant to replace `otel::tracing_layer()` (which is really just an alias for scuba). We log to two different scuba tables: monarch_tracing and monarch_executions. We will test correctness by injecting a mock scuba client that simply appends all samples it intends to log, and ensure that both the old and the unified implementations produce the same samples

Differential Revision: D87363772
thomasywang and others added 6 commits November 24, 2025 08:26
Summary: We disallow methods relating to time to ensure that we use `hyperactor::clock`, but need to make an exception for this for telemetry because we would create a circular dependency if we tried to use `hyperactor::clock`

Differential Revision: D87664116
Summary: Each process only logs to monarch_executions once at the beginning of the  execution so there is no need to add a scuba client that logs to this table into our tracing subscriber

Differential Revision: D87664117
Differential Revision: D87363773
Differential Revision: D87363775
Differential Revision: D87363774
Summary:
Pull Request resolved: meta-pytorch#1931

Stack context:
```
Our tracing subscriber has 3 layers:
- File logging
- Scuba
- Sqlite (usually off)

Although the actual Scuba logging is done in a background thread and we are using a non-blocking file writer, we still have a good chunk of work that happens for events & spans. The solution to this, is to create a `UnifiedLayer` that just sends everything into a background worker, that then delivers all traces to each `Exporter` to handle.

In this diff, we will create an initial `UnifiedLayer` and incrementally move each existing layer into an `Exporter`.

To test correctness, we will run both the old and unified implementations for initializing telemetry on a variety of workloads, and ensure that both are producing the same results
```

In this diff we will create an `Exporter` meant to replace `otel::tracing_layer()` (which is really just an alias for scuba). We log to two different scuba tables: monarch_tracing and monarch_executions. We will test correctness by injecting a mock scuba client that simply appends all samples it intends to log, and ensure that both the old and the unified implementations produce the same samples

Reviewed By: mariusae

Differential Revision: D87363772
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot. fb-exported meta-exported

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant