remove redundancies

giortzisg · giortzisg · commit f84179f16ecd · 2025-11-07T10:29:53.000+01:00
diff --git a/develop-docs/sdk/miscellaneous/telemetry-buffer.mdx b/develop-docs/sdk/miscellaneous/telemetry-buffer.mdx
@@ -34,7 +34,7 @@ Introduce a `Buffer` layer between the `Client` and the `Transport`. This `Buffe
                                 ▼
 ┌────────────────────────────────────────────────────────────────────────────┐
 │                               Buffer                                       │
-│  Add(item) · Flush(timeout) · Close(timeout)                               │
+│             Add(item) · Flush(timeout) · Close(timeout)                    │
 │                                                                            │
 │   ┌──────────────────────┐  ┌──────────────────────┐  ┌──────────────────┐ │
 │   │  Error Store         │  │  Check-in Store      │  │  Log Store       │ │
@@ -71,8 +71,8 @@ Introduce a `Buffer` layer between the `Client` and the `Transport`. This `Buffe
 ### Priorities
 - CRITICAL: Error, Feedback.
 - HIGH: Session, CheckIn.
-- MEDIUM: Log, ClientReport, Span.
-- LOW: Transaction, Profile, ProfileChunk.
+- MEDIUM: Transaction, ClientReport, Span.
+- LOW: Log, Profile, ProfileChunk.
 - LOWEST: Replay.
 
 Configurable via weights.
@@ -90,8 +90,8 @@ Each telemetry category maintains a store interface; a fixed-size circular array
 - **Batching configuration**:
   - `batchSize`: Number of items to combine into a single batch (1 for errors, transactions, and monitors; 100 for logs).
   - `timeout`: Maximum time to wait before sending a partial batch (5 seconds for logs).
-- **Bucketed Storage Support**: The storage interface should satisfy both bucketed and single-item implementations, allowing sending spans per trace id.
-- **Observability**: Each store tracks offered, accepted, and dropped item counts for client reports.
+- **Bucketed Storage Support**: The storage interface should satisfy both bucketed and single-item implementations, allowing sending spans per trace id (required for Span First).
+- **Observability**: Each store tracks dropped item counts for client reports.
 
 ##### Single-item ring buffer (default)
 
@@ -104,15 +104,20 @@ Each telemetry category maintains a store interface; a fixed-size circular array
 
 ##### Bucketed-by-trace storage (spans)
 
-- Purpose: keep spans from the same trace together and flush them as a unit to avoid partial-trace delivery under pressure.
-- Grouping: a new bucket is created per trace id; a map (`traceIndex`) provides O(1) lookup. Items without a trace id are accepted but grouped without an index.
-- Capacity model: two limits are enforced—overall `itemCapacity` and a derived `bucketCapacity ~= capacity/10` (minimum 10). Additionally, a `perBucketItemLimit` (100) prevents a single trace from monopolizing storage.
-- Readiness: when total buffered items reach `batchSize` or `timeout` elapses, the entire oldest bucket is flushed to preserve trace coherence.
-- Overflow behavior:
-  - `drop_oldest`: evict the oldest bucket (dropping all its items) and invoke the dropped callback for each (`buffer_full_drop_oldest_bucket`). Preferred for spans to drop an entire trace.
+- **Purpose**: keep spans from the same trace together and flush them as a unit to avoid partial-trace delivery under pressure. This addresses a gap in standard implementations where individual span drops can create incomplete traces.
+- **Grouping**: a new bucket is created per trace id; a map (`traceIndex`) provides O(1) lookup.
+- **Capacity model**: two limits are enforced—overall `itemCapacity` and a derived `bucketCapacity ~= capacity/10` (minimum 10).
+- **Readiness**: when total buffered items reach `batchSize` or `timeout` elapses, the entire oldest bucket is flushed to preserve trace coherence.
+- **Overflow behavior**:
+  - `drop_oldest`: evict the oldest bucket (dropping all its items). Preferred for spans to drop an entire trace.
   - `drop_newest`: reject the incoming item (`buffer_full_drop_newest`).
 - Lifecycle: empty buckets are removed and their trace ids are purged from the index; `MarkFlushed()` updates `lastFlushTime`.
 
+##### Trace Consistency Trade-offs
+
+There still remains a small subset of cases that might result in partial traces, where either an old trace bucket was dropped and a new span with the same trace arrived, or we dropped an incoming span of this trace.
+The preferred overflow behavior in most cases should be `drop_oldest` since it results in the fewest incomplete traces from the two scenarios.
+
 Stores are mapped to [DataCategories](https://github.com/getsentry/relay/blob/master/relay-base-schema/src/data_category.rs), which determine their scheduling priority and rate limits.
 
 #### Scheduler
@@ -135,18 +140,14 @@ The transport layer handles HTTP communication with Sentry's ingestion endpoints
 ### Configuration
 
 #### Buffer Options
-- **Capacity**: 100 items for errors, logs, and monitors; 1000 for transactions.
+- **Capacity**: 100 items for errors and check-ins, 10*BATCH_SIZE for logs, 1000 for transactions.
 - **Overflow policy**: `drop_oldest`.
 - **Batch size**: 1 for errors and monitors (immediate send), 100 for logs.
 - **Batch timeout**: 5 seconds for logs.
 
 #### Scheduler Options
 - **Priority weights**: CRITICAL=5, HIGH=4, MEDIUM=3, LOW=2, LOWEST=1.
 
-#### Transport Options
-- **Queue size**: 1000 envelopes for AsyncTransport.
-- **HTTP timeout**: 30 seconds.
-
 ### Implementation Example (Go)
 
 The `sentry-go` SDK provides a reference implementation of this architecture:
@@ -177,17 +178,6 @@ type Storage[T any] interface {
     // Category/Priority
     Category() ratelimit.Category
     Priority() ratelimit.Priority
-
-    // Metrics
-    OfferedCount() int64
-    DroppedCount() int64
-    AcceptedCount() int64
-    DropRate() float64
-    GetMetrics() BufferMetrics
-
-    // Configuration
-    SetDroppedCallback(callback func(item T, reason string))
-    Clear()
 }
 
 
@@ -298,39 +288,3 @@ func (b *Buffer) Flush(timeout time.Time) {
   transport.flush(timeout)
 }
 ```
-
-### Batching Policies
-
-Different telemetry types use batching strategies optimized for their characteristics:
-
-- **Errors**: Single-item envelopes for immediate delivery (latency-sensitive).
-- **Monitors**: Single-item envelopes to maintain check-in timing accuracy.
-- **Logs**: Batches of up to 100 items or 5-second timeout, whichever comes first (volume-optimized).
-- **Transactions**: Single-item envelopes (trace-aware batching is a future enhancement).
-
-#### Batch Processing Details
-
-For high-volume telemetry like logs, the buffer uses time and count-based batching:
-
-**Timeout-based flushing**:
-- When the first item enters an empty log buffer, a timeout starts (5 seconds).
-- When the timeout expires, all buffered log items are sent regardless of batch size.
-- The timeout resets after each flush.
-
-**Count-based flushing**:
-- When the number of buffered log items reaches the batch size (100), they are sent immediately.
-
-**Ordering and lifecycle**:
-- Filtering and sampling happen before buffering to avoid wasting buffer space.
-- Rate limiting is checked before dispatch; if limited, items remain buffered.
-- Items are batched into a single envelope with multiple entries of the same type (logs).
-
-### Observability
-
-The buffer system exposes metrics to help you understand telemetry flow and identify issues:
-
-- **Per-category counters**: Items offered, sent successfully, and dropped.
-- **Drop reasons**: Distinguish between buffer overflow and rate limit drops.
-- **Buffer utilization**: Current size vs. capacity for each category.
-
-These metrics enable dashboards that visualize why events are being dropped, helping you tune buffer sizes or identify rate limiting issues.