Skip to content

Conversation

@QuinnWilton
Copy link

@QuinnWilton QuinnWilton commented Feb 10, 2026

The span ETS table is created without {heir, Pid, Data}. When the
otel_span_ets process crashes, the table is destroyed and all in-flight
spans are silently lost. The process restarts and creates a fresh empty
table, but spans that were active between crash and restart are gone.
The try/catch in storage_insert/1 prevents cascading badarg errors but
masks the data loss.

Use the supervisor (otel_span_sup) as heir. On crash, ownership
transfers atomically to the supervisor. The restarted otel_span_ets
sees the table already exists via ets:info/2, skips ets:new, and
resumes operating on the preserved data. The table is public, so all
read/write operations continue to work regardless of which process
owns it.

@QuinnWilton QuinnWilton requested a review from a team as a code owner February 10, 2026 04:25
@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented Feb 10, 2026

CLA Signed

The committers listed above are authorized under a signed CLA.

  • ✅ login: QuinnWilton / name: Quinn Wilton (5a7956a)

@QuinnWilton QuinnWilton changed the title Preserve span table across otel_span_ets crashes + enable read_concurrency Preserve span table across otel_span_ets crashes Feb 10, 2026
The span ETS table is created without {heir, Pid, Data}. When the
otel_span_ets process crashes, the table is destroyed and all in-flight
spans are silently lost. The process restarts and creates a fresh empty
table, but spans that were active between crash and restart are gone.
The try/catch in storage_insert/1 prevents cascading badarg errors but
masks the data loss.

Use the supervisor (otel_span_sup) as heir. On crash, ownership
transfers atomically to the supervisor. The restarted otel_span_ets
sees the table already exists via ets:info/2, skips ets:new, and
resumes operating on the preserved data. The table is public, so all
read/write operations continue to work regardless of which process
owns it.
@QuinnWilton
Copy link
Author

I originally had another commit in here for enabling read_concurrency on the table, but after running some benchmarks, it wasn't a clear win and I opted to remove that change and keep things simple.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant