-
Notifications
You must be signed in to change notification settings - Fork 4
Span Buffer Multiprocess Enhancement with Health Monitoring #6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: span-flusher-stable
Are you sure you want to change the base?
Conversation
PR Compliance Guide 🔍Below is a summary of compliance checks for this PR:
Compliance status legend🟢 - Fully Compliant🟡 - Partial Compliant 🔴 - Not Compliant ⚪ - Requires Further Human Verification 🏷️ - Compliance label |
|||||||||||||||||||||||||
PR Code Suggestions ✨Explore these optional code suggestions:
|
|||||||||||||||||
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
WalkthroughThe changes implement multi-process support for Sentry's span flusher consumer. A new Changes
Sequence DiagramsequenceDiagram
participant CLI as CLI Config
participant Factory as ProcessSpansStrategyFactory
participant Flusher as SpanFlusher
participant PM as Process Manager
participant P1 as Process 1<br/>(Shards 0,2)
participant P2 as Process 2<br/>(Shards 1,3)
participant Kafka as Kafka
CLI->>Factory: flusher_processes=2
Factory->>Flusher: max_processes=2
Flusher->>PM: _create_processes(shards=[0,1,2,3])
PM->>P1: main(shards=[0,2])
PM->>P2: main(shards=[1,3])
rect rgb(200, 220, 255)
note over P1,P2: Parallel Processing
P1->>P1: buffer spans for shards 0,2<br/>health & backpressure tracking
P2->>P2: buffer spans for shards 1,3<br/>health & backpressure tracking
end
P1->>Kafka: flush segments (shards 0,2)
P2->>Kafka: flush segments (shards 1,3)
rect rgb(200, 255, 200)
note over Flusher: Health Checks
Flusher->>P1: _ensure_processes_alive
Flusher->>P2: _ensure_processes_alive
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (2)
src/sentry/spans/consumers/process/flusher.py (2)
127-132: Verify usage of_create_process_for_shardmethod.This method is defined but not called anywhere in the provided code. The health check logic (line 259) uses
_create_process_for_shardsinstead. Consider removing this method if it's unused, or document its intended future use.#!/bin/bash # Search for any calls to _create_process_for_shard rg -nP '\b_create_process_for_shard\s*\(' --type=py
247-251: Consider extracting long exception message.The exception message is quite long. Per static analysis (TRY003), consider defining the message format in the exception class or as a constant for maintainability.
This is a minor style issue and can be deferred.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
CLAUDE.md(1 hunks)src/sentry/consumers/__init__.py(1 hunks)src/sentry/spans/consumers/process/factory.py(3 hunks)src/sentry/spans/consumers/process/flusher.py(8 hunks)tests/sentry/spans/consumers/process/test_consumer.py(4 hunks)tests/sentry/spans/consumers/process/test_flusher.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
tests/sentry/spans/consumers/process/test_consumer.py (3)
src/sentry/spans/consumers/process/flusher.py (2)
poll(215-216)join(328-347)src/sentry/spans/consumers/process/factory.py (2)
ProcessSpansStrategyFactory(24-120)create_with_partitions(58-116)tests/sentry/spans/consumers/process/test_flusher.py (1)
append(24-26)
src/sentry/spans/consumers/process/flusher.py (1)
src/sentry/processing/backpressure/memory.py (1)
ServiceMemory(12-24)
🪛 Ruff (0.14.5)
tests/sentry/spans/consumers/process/test_consumer.py
106-106: Unused function argument: force
(ARG001)
src/sentry/spans/consumers/process/flusher.py
248-250: Avoid specifying long messages outside the exception class
(TRY003)
337-337: Loop control variable process_index not used within loop body
Rename unused process_index to _process_index
(B007)
🔇 Additional comments (8)
src/sentry/spans/consumers/process/flusher.py (2)
43-79: LGTM! Clean per-process state initialization.The round-robin shard distribution and per-process state tracking are well-structured. Defaulting
max_processesto the shard count ensures backward compatibility.
261-316: LGTM! Correct multi-process coordination.The backpressure and memory checks properly aggregate across all processes. The logic correctly rejects messages if any process is under backpressure or if aggregate memory exceeds limits.
CLAUDE.md (1)
448-458: LGTM! Helpful type-checking guidance.The added example clearly demonstrates the correct use of
isinstance()overhasattr()for union type guards. This is valuable documentation for maintaining type safety.tests/sentry/spans/consumers/process/test_flusher.py (1)
83-83: LGTM! Correct assertion for multiprocess backpressure.The updated assertion properly checks if any process has entered backpressure state by iterating over the per-process dictionary values.
src/sentry/consumers/__init__.py (1)
430-438: LGTM! Clear CLI option with sensible default.The
--flusher-processesoption is well-documented and defaults to 1 for backward compatibility. The help text clearly explains its purpose.tests/sentry/spans/consumers/process/test_consumer.py (2)
1-1: LGTM! Necessary test adjustments for multiprocessing.The addition of
time.sleep(0.1)after advancing drift andtransaction=Trueon the decorator ensure proper synchronization and database isolation for the multiprocess flusher behavior.Also applies to: 12-12, 60-62
84-123: LGTM! Comprehensive test for process limiting.The test correctly verifies that:
- Only 2 processes are created despite 4 shards
- All 4 shards are distributed across the 2 processes
- The flusher respects the
max_processeslimitNote: The unused
forceparameter on line 106 is part of the commit callback signature and cannot be removed (false positive from static analysis).src/sentry/spans/consumers/process/factory.py (1)
41-41: LGTM! Clean parameter threading.The
flusher_processesparameter is cleanly passed through from the CLI toSpanFlusher, enabling configurable process limits for the span flusher.Also applies to: 52-52, 74-74
| with metrics.timer("spans.buffer.flusher.produce", tags={"shard": shard_tag}): | ||
| for flushed_segment in flushed_segments.values(): | ||
| if not flushed_segment.spans: | ||
| continue | ||
|
|
||
| spans = [span.payload for span in flushed_segment.spans] | ||
| kafka_payload = KafkaPayload(None, orjson.dumps({"spans": spans}), []) | ||
| metrics.timing("spans.buffer.segment_size_bytes", len(kafka_payload.value)) | ||
| metrics.timing( | ||
| "spans.buffer.segment_size_bytes", | ||
| len(kafka_payload.value), | ||
| tags={"shard": shard_tag}, | ||
| ) | ||
| produce(kafka_payload) | ||
|
|
||
| with metrics.timer("spans.buffer.flusher.wait_produce"): | ||
| with metrics.timer("spans.buffer.flusher.wait_produce", tags={"shards": shard_tag}): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix inconsistent metric tag names.
Line 185 uses tags={"shard": shard_tag} while line 199 uses tags={"shards": shard_tag}. The inconsistency (singular vs plural) makes querying metrics more difficult. Use the same tag name for both.
Apply this diff to standardize on "shard":
- with metrics.timer("spans.buffer.flusher.wait_produce", tags={"shards": shard_tag}):
+ with metrics.timer("spans.buffer.flusher.wait_produce", tags={"shard": shard_tag}):📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| with metrics.timer("spans.buffer.flusher.produce", tags={"shard": shard_tag}): | |
| for flushed_segment in flushed_segments.values(): | |
| if not flushed_segment.spans: | |
| continue | |
| spans = [span.payload for span in flushed_segment.spans] | |
| kafka_payload = KafkaPayload(None, orjson.dumps({"spans": spans}), []) | |
| metrics.timing("spans.buffer.segment_size_bytes", len(kafka_payload.value)) | |
| metrics.timing( | |
| "spans.buffer.segment_size_bytes", | |
| len(kafka_payload.value), | |
| tags={"shard": shard_tag}, | |
| ) | |
| produce(kafka_payload) | |
| with metrics.timer("spans.buffer.flusher.wait_produce"): | |
| with metrics.timer("spans.buffer.flusher.wait_produce", tags={"shards": shard_tag}): | |
| with metrics.timer("spans.buffer.flusher.produce", tags={"shard": shard_tag}): | |
| for flushed_segment in flushed_segments.values(): | |
| if not flushed_segment.spans: | |
| continue | |
| spans = [span.payload for span in flushed_segment.spans] | |
| kafka_payload = KafkaPayload(None, orjson.dumps({"spans": spans}), []) | |
| metrics.timing( | |
| "spans.buffer.segment_size_bytes", | |
| len(kafka_payload.value), | |
| tags={"shard": shard_tag}, | |
| ) | |
| produce(kafka_payload) | |
| with metrics.timer("spans.buffer.flusher.wait_produce", tags={"shard": shard_tag}): |
🤖 Prompt for AI Agents
In src/sentry/spans/consumers/process/flusher.py around lines 185 to 199, the
metric tag names are inconsistent: one timer uses tags={"shard": shard_tag}
while the later timer uses tags={"shards": shard_tag}; change the later timer to
use tags={"shard": shard_tag} so both timers use the same "shard" tag name, and
run tests or a quick local check to ensure metrics emit with the unified tag.
| for process_index, process in self.processes.items(): | ||
| if deadline is not None: | ||
| remaining_time = deadline - time.time() | ||
| if remaining_time <= 0: | ||
| break | ||
|
|
||
| while process.is_alive() and (deadline is None or deadline > time.time()): | ||
| time.sleep(0.1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion | 🟠 Major
Remove unused loop variable.
The process_index variable is not used within the loop body. Either rename it to _process_index to indicate it's intentionally unused, or iterate over self.processes.values() directly.
Apply this diff to iterate over values only:
# Wait for all processes to finish
- for process_index, process in self.processes.items():
+ for process in self.processes.values():
if deadline is not None:
remaining_time = deadline - time.time()
if remaining_time <= 0:📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| for process_index, process in self.processes.items(): | |
| if deadline is not None: | |
| remaining_time = deadline - time.time() | |
| if remaining_time <= 0: | |
| break | |
| while process.is_alive() and (deadline is None or deadline > time.time()): | |
| time.sleep(0.1) | |
| for process in self.processes.values(): | |
| if deadline is not None: | |
| remaining_time = deadline - time.time() | |
| if remaining_time <= 0: | |
| break | |
| while process.is_alive() and (deadline is None or deadline > time.time()): | |
| time.sleep(0.1) |
🧰 Tools
🪛 Ruff (0.14.5)
337-337: Loop control variable process_index not used within loop body
Rename unused process_index to _process_index
(B007)
🤖 Prompt for AI Agents
In src/sentry/spans/consumers/process/flusher.py around lines 337 to 344, the
for-loop currently unpacks an unused process_index variable; change the
iteration to loop over self.processes.values() (or rename process_index to
_process_index) so the unused variable is removed, leaving the loop body to
operate on process only and keeping semantics unchanged.
User description
PR #6
PR Type
Enhancement
Description
Implement multiprocess span flusher with per-shard process management
Add configurable
flusher_processesparameter to limit concurrent processesDistribute shards across multiple processes for parallel processing
Add shard-level health monitoring and metrics tagging
Update tests to verify process limiting and shard distribution
Diagram Walkthrough
File Walkthrough
__init__.py
Add flusher processes CLI optionsrc/sentry/consumers/init.py
--flusher-processesCLI option to span consumer configurationfactory.py
Pass flusher processes config to SpanFlushersrc/sentry/spans/consumers/process/factory.py
flusher_processesparameter to factory initializationmax_processesparameter to SpanFlusher during creationflusher.py
Implement multiprocess flusher with shard distributionsrc/sentry/spans/consumers/process/flusher.py
process-to-shards mapping
max_processeslimitrestart counters
ServiceMemoryimport for memory tracking across multiple bufferstest_consumer.py
Add multiprocess flusher tests and transaction handlingtests/sentry/spans/consumers/process/test_consumer.py
timeimport for sleep operations in tests@pytest.mark.django_dbto usetransaction=Truefor propertransaction handling
test_flusher_processes_limitto verify process limitingand shard distribution
test_flusher.py
Update backpressure test for multiprocesstests/sentry/spans/consumers/process/test_flusher.py
values
backpressure_sinceto dictionary-basedprocess_backpressure_sinceCLAUDE.md
Add union type checking best practicesCLAUDE.md
isinstance()for typeunions
hasattr()for type checkingSummary by CodeRabbit
Release Notes
New Features
--flusher-processesCLI option to configure the number of span processing workers (default: 1)Bug Fixes
Documentation
✏️ Tip: You can customize this high-level summary in your review settings.