Skip to content

Commit 593e1c7

Browse files
committed
added documentation on how to reduce overhead (#153)
also, added documentation for the sample_rate config, and reduced the flush_interval setting to 10s closes #153
1 parent 769d0c3 commit 593e1c7

File tree

5 files changed

+97
-5
lines changed

5 files changed

+97
-5
lines changed

docs/configuration.asciidoc

Lines changed: 16 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -346,7 +346,7 @@ If your service handles data like this, we advise to only enable this feature wi
346346

347347
|============
348348
| Environment | Django/Flask | Default
349-
| `ELASTIC_APM_FLUSH_INTERVAL` | `FLUSH_INTERVAL` | `60`
349+
| `ELASTIC_APM_FLUSH_INTERVAL` | `FLUSH_INTERVAL` | `10`
350350
|============
351351

352352
Interval with which transactions should be sent to the APM server, in seconds.
@@ -374,8 +374,8 @@ Setting an upper limit will prevent overloading the agent and the APM server wit
374374
==== `max_queue_size`
375375

376376
|============
377-
| Environment | Django/Flask | Default
378-
| `ELASTIC_APM_MAX_EVENT_QUEUE_LENGTH` | `MAX_QUEUE_SIZE` | `500`
377+
| Environment | Django/Flask | Default
378+
| `ELASTIC_APM_MAX_QUEUE_SIZE` | `MAX_QUEUE_SIZE` | `500`
379379
|============
380380

381381
Maximum queue length of transactions before sending transactions to the APM server.
@@ -405,6 +405,19 @@ For more information, see <<sanitizing-data, Sanitizing Data>>.
405405
WARNING: We recommend to always include the default set of validators if you customize this setting.
406406

407407

408+
[float]
409+
[[config-transaction-sample-rate]]
410+
==== `transaction_sample_rate`
411+
412+
|============
413+
| Environment | Django/Flask | Default
414+
| `ELASTIC_APM_TRANSACTION_SAMPLE_RATE` | `TRANSACTION_SAMPLE_RATE` | `1.0`
415+
|============
416+
417+
By default, the agent will sample every transaction (e.g. request to your service).
418+
To reduce overhead and storage requirements, you can set the sample rate to a value between `0.0` and `1.0`.
419+
We still record overall time and the result for unsampled transactions, but no context information, tags, or spans.
420+
408421
[float]
409422
[[config-include-paths]]
410423
==== `include_paths`

docs/index.asciidoc

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -39,3 +39,4 @@ include::./sanitizing-data.asciidoc[Sanitizing Data]
3939
include::./run-tests-locally.asciidoc[Run Tests Locally]
4040

4141
include::./api.asciidoc[API documentation]
42+
include::./tuning.asciidoc[Tuning and Overhead considerations]

docs/tuning.asciidoc

Lines changed: 75 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,75 @@
1+
[[tuning-and-overhead]]
2+
== Tuning and Overhead considerations
3+
4+
Using an APM solution comes with certain trade-offs, and the Python agent for Elastic APM is no different.
5+
Instrumenting your code, measuring timings, recording context data etc. all need resources:
6+
7+
* CPU time
8+
* memory
9+
* bandwidth use
10+
* Elasticsearch storage
11+
12+
We invested and continue to invest a lot of effort to keep the overhead of using Elastic APM as low as possible.
13+
But because every deployment is different, there are some knobs you can turn to adapt it to your specific needs.
14+
15+
[float]
16+
[[tuning-sample-rate]]
17+
=== Transaction Sample Rate
18+
19+
The most straight forward way to reduce the overhead of the agent is to tell the agent to do less.
20+
If you set the <<config-transaction-sample-rate,`transaction_sample_rate`>> to a value below `1.0`,
21+
the agent will randomly sample only a subset of transactions.
22+
If a transaction is not sampled, the agent has to do a lot less work,
23+
as we only record the the name of the transaction, the overall transaction time and the result for unsampled transactions.
24+
25+
[options="header"]
26+
|============
27+
| Field | Sampled | Unsampled
28+
| Transaction name | yes | yes
29+
| Duration | yes | yes
30+
| Result | yes | yes
31+
| Context | yes | no
32+
| Tags | yes | no
33+
| Spans | yes | no
34+
|============
35+
36+
Reducing the sample rate to a fraction of all transactions can make a huge difference in all four of the mentioned resource types.
37+
38+
[float]
39+
[[tuning-queue]]
40+
=== Transaction Queue
41+
42+
To reduce the load on the APM Server, the agent does not send every transaction up as it happens.
43+
Instead, it queues them up, and flushes the queue periodically, or when it reaches a maximum size, using a background thread.
44+
45+
While this reduces the load on the APM Server (and to a certain extent on the agent),
46+
holding on to the transaction data in a queue uses memory.
47+
If you notice that using the Python agent results in a large increase of memory use,
48+
you can use these settings:
49+
50+
* <<config-flush-interval,`flush_interval`>> to reduce the time between queue flushes
51+
* <<config-max-queue-size,`max_queue_size`>> to reduce the maximum size of the queue
52+
53+
The first setting, `flush_interval`, is helpful if you have a sustained high number of transactions.
54+
The second setting, `max_queue_size`, can help if you experience peaks of transactions
55+
(a large amount of transactions in a short period of time).
56+
57+
Keep in mind that reducing the value of either setting will cause the agent to send more HTTP requests to the APM Server,
58+
potentially causing a higher load.
59+
60+
61+
[float]
62+
[[tuning-max-spans]]
63+
=== Spans per transaction
64+
65+
The average amount of spans per transaction can influence how much time the agent spends in each transaction collecting contextual data for each span,
66+
and the the storage space needed in Elasticsearch.
67+
In our experience, most usual transactions should have well below 100 spans.
68+
In some cases however, the number of spans can explode:
69+
70+
* long-running transactions
71+
* unoptimized code, e.g. doing hundreds of SQL queries in a loop
72+
73+
To avoid that such edge cases overload both the agent and the APM Server,
74+
the agent stops recording spans when a limit is reached.
75+
You can configure this limit by changing the <<config-transaction-max-spans,`transaction_max_spans`>> setting.

elasticapm/conf/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -155,7 +155,7 @@ class Config(_ConfigBase):
155155
'elasticapm.processors.sanitize_http_request_querystring',
156156
'elasticapm.processors.sanitize_http_request_body',
157157
])
158-
flush_interval = _ConfigValue('FLUSH_INTERVAL', type=int, default=60)
158+
flush_interval = _ConfigValue('FLUSH_INTERVAL', type=int, default=10)
159159
transaction_sample_rate = _ConfigValue('TRANSACTION_SAMPLE_RATE', type=float, default=1.0)
160160
transaction_max_spans = _ConfigValue('TRANSACTION_MAX_SPANS', type=int, default=500)
161161
max_queue_size = _ConfigValue('MAX_QUEUE_SIZE', type=int, default=500)

tests/contrib/django/django_tests.py

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1030,7 +1030,10 @@ def test_perf_database_render_no_instrumentation(benchmark, django_elasticapm_cl
10301030

10311031

10321032
@pytest.mark.django_db
1033-
@pytest.mark.parametrize('django_elasticapm_client', [{'_wait_to_first_send': 100}], indirect=True)
1033+
@pytest.mark.parametrize('django_elasticapm_client', [{
1034+
'_wait_to_first_send': 100,
1035+
'flush_interval': 100
1036+
}], indirect=True)
10341037
def test_perf_transaction_with_collection(benchmark, django_elasticapm_client):
10351038
django_elasticapm_client.instrumentation_store.get_all()
10361039
with mock.patch("elasticapm.traces.TransactionsStore.should_collect") as should_collect:

0 commit comments

Comments
 (0)