-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Python tracing docs refresh #13125
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Python tracing docs refresh #13125
Changes from 14 commits
426c39e
302e288
2f2680d
dd4ad91
0397fcf
3c601a2
2cdca6b
6d4e3bf
d6e2490
4a3d732
084a25e
a192aec
9770d5a
f69c5ea
d808dd9
c430801
2f791b2
03f5d6a
137dbb8
5117a30
44289dd
c8dba3b
92a9a3d
407c560
0a89965
b131bbf
fef0ae1
61dd75e
1a9da90
dd5d44a
8b00344
ab9ebf7
6de450f
e9bb6af
e39d413
d8e046c
7e5c6e4
d09e2b4
1519f54
8f17aa2
a2e0e4d
75033d4
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| @@ -0,0 +1,288 @@ | ||||||||||||||||||||||||||||||||||||||||||||||||
| --- | ||||||||||||||||||||||||||||||||||||||||||||||||
szokeasaurusrex marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||||||||||||||||||||||||||||||||||||||||
| title: Configure Sampling | ||||||||||||||||||||||||||||||||||||||||||||||||
| description: "Learn how to configure sampling in your app." | ||||||||||||||||||||||||||||||||||||||||||||||||
| sidebar_order: 30 | ||||||||||||||||||||||||||||||||||||||||||||||||
| --- | ||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||
| Sentry's tracing functionality helps you monitor application performance by capturing distributed traces, attaching attributes, and adding span performance metrics across your application. However, capturing traces for every transaction can generate significant volumes of data. Sampling allows you to control the amount of spans that are sent to Sentry from your application. | ||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||
| Effective sampling is key to getting the most value from Sentry's performance monitoring while minimizing overhead. The `traces_sampler` function gives you precise control over which transactions to record, allowing you to focus on the most important parts of your application. | ||||||||||||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||||||||||||
| Sentry's tracing functionality helps you monitor application performance by capturing distributed traces, attaching attributes, and adding span performance metrics across your application. However, capturing traces for every transaction can generate significant volumes of data. Sampling allows you to control the amount of spans that are sent to Sentry from your application. | |
| Effective sampling is key to getting the most value from Sentry's performance monitoring while minimizing overhead. The `traces_sampler` function gives you precise control over which transactions to record, allowing you to focus on the most important parts of your application. | |
| If you find that Sentry's tracing functionality is generating too much data – for example, if you notice your spans quota is being quickly exhausted – you can opt to sample your traces. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough. I like how to the point it is about exhausting your quota!
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| The Python SDK provides two main options for controlling the sampling rate: | |
| The Python SDK provides two ways to control how your traces are sampled. |
sfanahata marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple things here:
- First of all, it is incorrect to call the
traces_sample_ratea percentage, since it is a value between 0 and 1, not 0% and 100%. - It would be more accurate to state that this controls the "probability" with which each transaction is sampled
- I would use "sampled" rather than "captured"
Something like the following would work:
| This option sets a fixed percentage of transactions to be captured: | |
| `traces_sample_rate` is a floating-point value between `0.0` and `1.0`, inclusive, which controls the probability with which each transaction will be sampled: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great suggestion, and thanks for clarifying the functionality.
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nit] I would reformulate as follows to make this more technically accurate and specific
| With `traces_sample_rate` set to `0.25`, approximately 25% of transactions will be recorded and sent to Sentry. This provides an even cross-section of transactions regardless of where in your app they occur. | |
| With `traces_sample_rate` set to `0.25`, each transaction in your application is randomly sampled with a probability of `0.25`, so you can expect that one in every four transactions will be sent to Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is much clearer. Thanks!
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Current wording may be confusing. traces_sampler is a function that the user has to define and provide, saying they can "use the" function makes it sound like we provide a function called traces_sampler that users can use (we don't)
| For more granular control, you can use the `traces_sampler` function. This approach allows you to: | |
| For more granular control, you can provide a `traces_sampler` function. This approach allows you to: |
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nit] I think we might be including too many examples here, for a page which should be an introductory page. I would include at most one simple example on this page. We can create and link to a separate page to list more examples for users who are interested
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to leave the examples in, but I collapsed them under an expandable section.
sfanahata marked this conversation as resolved.
Show resolved
Hide resolved
sfanahata marked this conversation as resolved.
Show resolved
Hide resolved
sentrivana marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is the hasRecentErrors coming from? If it's a custom attribute, this part is a bit confusing since it makes it seems like it's something the SDK sets.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is meant to be an example of a custom attribute to customize sampling decisions. Are you saying it reads like it's something the SDK sets because it's provided in the example, or is it the way the method is written that makes it confusing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically if I was coming to this example as someone who hasn't used the SDK yet, I'd see no distinction between the data that the SDK puts in the sampling_context vs what I myself as a user need to put there explicitly. I'd expect some introductory paragraph that clarifies that not all the keys in this example will be there out of the box. See also #13125 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this example it's also not 100% clear what's coming from the SDK and what the user needs to have set for it to appear in the sampling context (user.tier, hasRecentErrors) -- might lead to misunderstandings if folks expect this data to be there from the get go. Maybe we can clarify that some of these are custom attributes that need to be set by the user?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the feedback! I'll add more details in the sample comments that makes it clearer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also added a bullet at the top of the section to call out using custom attributes as a part of useing traces_sampler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey, have the changes already been applied? As is the examples are still mixing prepopulated keys with custom ones and it's still not clear which is which. If I was a new user I'd be confused as to why there's a parent_sampled in my sampling_context but no data.hasRecentErrors.
I feel like the current page https://docs.sentry.io/platforms/python/configuration/sampling/#sampling-context-data does a great job of making the distinction as well as making it clear how you can get custom data into the sampling_context. I'm not sure if this section stays as is after the reorganization -- if not, can we somehow port it to this page? And if it does, can we reference it at the start of this page when mentioning custom attributes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for that example. I see what you mean now! The custom_sampling_context would be a better way to group the custom data together in the examples. I'll update them, and link to the sampling page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed the examples to more explicitly use transaction.set_data and added a note about set_data vs custom_sampling_context, with a link to the sampling page.
sentrivana marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
sentrivana marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
sentrivana marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd also make clear the attributes here have to be added and are not there by default (db_connections etc.).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I only picked db_connections randomly, the other attrs are also custom (duration_ms, memory_usage_mb, cpu_percent) -- so we should probably also mark them as such.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a line as a part of the section's intro that this outlined how to use custom attributes, but happy to add notes in the examples too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
sfanahata marked this conversation as resolved.
Show resolved
Hide resolved
sentrivana marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this was incorrect in the docs before, as well
| "parent_sampled": bool, # whether the parent transaction was sampled (if any) | |
| "parent_sampled": bool | None, # whether the parent transaction was sampled, `None` if no parent |
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd either remove the code comments from the snippet above or delete this section. We don't need to have the same information twice
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I mentioned in a previous comment, our introductory examples should include the code to ensure inheritance in distributed tracing. Most people will probably only look at the first example (assuming that is the default recommendation), and that default recommendation should include the code for inheritance, since that is the best practice.
My opinion is we should completely delete this section.
| ## Inheritance in Distributed Tracing | |
| In distributed systems, trace information is propagated between services. You can implement inheritance logic like this: | |
| ```python | |
| def traces_sampler(sampling_context): | |
| # Examine provided context data | |
| if "transaction_context" in sampling_context: | |
| name = sampling_context["transaction_context"].get("name", "") | |
| # Apply specific rules first | |
| if "critical-path" in name: | |
| return 1.0 # Always sample | |
| # Inherit parent sampling decision if available | |
| if sampling_context.get("parent_sampled") is not None: | |
| return sampling_context["parent_sampled"] | |
| # Otherwise use a default rate | |
| return 0.1 | |
| ``` | |
| This approach ensures consistent sampling decisions across your entire distributed trace. All transactions in a given trace will share the same sampling decision, preventing broken or incomplete traces. |
We can perhaps replace this section with a section about overriding the parent sampling decision. However, such a section likely belongs on a more advanced page, not on the introductory page.
In short: the default best practice is to always inherit the parent sampling decision. The current structure makes it look like this is something which is "nice to have" for more advanced users. In fact, the opposite is true: novice users should inherit the parent sampling decision; overriding this decision is an advanced topic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left it for now, to explain how it works, but I also added the snippet from the above comment higher up in the docs under the explainer on traces_sampler
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Idk, I do still think having this snippet adds too much complexity to the page
what do you all think @antonpirker @sentrivana?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The section reads a bit confusing to me. The headline says it's about Inheritance in Distributed Tracing, but the example then explicitly ignores the parent sampling decision, so out of all examples on this page (which always start with: if there's a parent decision, just use that), this actually uses inheritance the least? Maybe we need a better headline/description for this section that's actually about when -- if ever -- it makes sense to ignore the parent sampling decision.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds like it actually makes the most sense to remove this section as an explicit call-out, and add a bit more description higher up the page with the call-out of the parent_sampled method. I'll give that a try.
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| 2. If `traces_sampler` is defined, its decision is used (can consider parent sampling) | |
| 2. If `traces_sampler` is defined, its decision is used. Although the `traces_sampler` can override the parent sampling decision, most users will want to ensure their `traces_sampler` respects the parent sampling decision. |
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| 3. If no `traces_sampler` but parent sampling is available, parent decision is used | |
| 3. If no `traces_sampler` is defined, but there is a parent sampling decision from an incoming distributed trace, we use the parent sampling decision |
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| 5. If none of the above are set, no transactions are sampled (0%) | |
| 5. If none of the above are set, no transactions are sampled. This is equivalent to setting `traces_sample_rate=0.0`. |
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically, the decision can also be propagated via other means (e.g. headers in task queue, environment variables). HTTP headers are only used when the downstream service is contacted over the network via HTTP request; although this might be the most common case, it is not the only one.
As this is an introductory page, we can omit the details of how the trace is propagated
| - This decision is propagated to all downstream services via HTTP headers | |
| - This decision is propagated to all downstream services |
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is really advanced stuff – not sure we should mention it at all on this page tbh
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For most users, the distributed tracing gets handled automatically by the SDK
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| The two key headers are: | |
| - `sentry-trace`: Contains trace ID, span ID, and sampling decision | |
| - `baggage`: Contains additional trace metadata including sample rate | |
| The Sentry Python SDK automatically attaches these headers to outgoing HTTP requests when using auto-instrumentation with libraries like `requests`, `urllib3`, or `httpx`. For other communication channels, you can manually propagate trace information: | |
| ```python | |
| # Extract trace data from the current scope | |
| trace_data = sentry_sdk.get_current_scope().get_trace_context() | |
| sentry_trace_header = trace_data.get("sentry-trace") | |
| baggage_header = trace_data.get("baggage") | |
| # Add to your custom request (example using a message queue) | |
| message = { | |
| "data": "Your data here", | |
| "metadata": { | |
| "sentry_trace": sentry_trace_header, | |
| "baggage": baggage_header | |
| } | |
| } | |
| queue.send(json.dumps(message)) | |
| ``` |
Just noticed there is a separate page about distributed tracing. This entire section should therefore be deleted from this page. If needed, we can move it to the distributed tracing page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I cleaned this up to add a brief explainer about how sampling shows up in distributed tracing, and linked to that section.
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure this line is needed here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed.
sfanahata marked this conversation as resolved.
Show resolved
Hide resolved
|
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,6 +1,6 @@ | ||
| --- | ||
| title: Custom Instrumentation | ||
|
||
| sidebar_order: 40 | ||
| sidebar_order: 10 | ||
| --- | ||
|
|
||
| <PlatformContent includePath="distributed-tracing/custom-instrumentation/" /> | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can probably remove this section entirely at this point.
1.17.0is a very old Python SDK version – I am sure we have other pages that describe how to upgrade the SDK