Skip to content

Commit 3373e06

Browse files
carsonipraultorrecillacolleenmcginnis
authored
apm: Add known issue about HTTP 502 (#4956)
* apm: Add known issue about HTTP 502 * Change versions * Update docs/en/observability/apm/known-issues.asciidoc Co-authored-by: Colleen McGinnis <[email protected]> --------- Co-authored-by: Raúl Torrecilla <[email protected]> Co-authored-by: Colleen McGinnis <[email protected]>
1 parent 3ee64a9 commit 3373e06

File tree

1 file changed

+30
-8
lines changed

1 file changed

+30
-8
lines changed

docs/en/observability/apm/known-issues.asciidoc

Lines changed: 30 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,28 @@ _Versions: XX.XX.XX, YY.YY.YY, ZZ.ZZ.ZZ_
2121
// If applicable, link to fix
2222
////
2323

24+
[discrete]
25+
== APM occasionally returning HTTP 502 "backend connection closed" or "use of closed network connection"
26+
27+
_Elastic Stack versions: >=8.0.0 and <8.18.8 or <8.19.5, >=9.0.0 and <9.0.8 or <9.1.5_
28+
_Environments: ECH, ECE
29+
30+
APM Server on ECH and ECE might sometimes return HTTP 502 with error message "backend connection closed" or "use of closed network connection" for any requests due to a rare race condition.
31+
When this happens to an intake request, Elastic APM agents will log an error but will not retry, leading to data loss.
32+
33+
Note that there may be other causes to "backend connection closed" or "use of closed network connection", and the provided workaround and released bugfix will only resolve the case related to the mentioned race condition.
34+
35+
*Workaround*
36+
37+
To work around this issue:
38+
39+
* Go to *Kibana* > *Fleet* > *Elastic Cloud agent policy*,
40+
* Next to *Elastic APM*, select the *...* icon, then *Edit Integration*.
41+
* Under *General*, select *Advanced options*, then change *Idle time before underlying connection is closed* to *200s*.
42+
* Select *Save Integration*
43+
44+
This bug will be fixed in 8.18.7, 8.19.4, 9.0.7, 9.1.4 for new deployments, and 8.18.8, 8.19.5, 9.0.8, 9.1.5, 9.2.0 for upgraded deployments.
45+
2446
[discrete]
2547
== APM Integration might be unreachable after upgrading to 8.19.0 and 9.1.0
2648

@@ -99,18 +121,18 @@ PUT _component_template/metrics-apm.internal@custom
99121
== `prefer_ilm` required in component templates to create custom lifecycle policies
100122

101123
_Elastic Stack versions: 8.15.1+_
102-
124+
103125
// The conditions in which this issue occurs
104126
The issue occurs when creating a _new_ cluster using version 8.15.1+.
105127
The issue occurs for any APM data streams created in 8.15.1+.
106128
The issue does _not_ occur if custom component template has been created in or before version 8.15.0.
107129

108130
// Describe why it happens
109-
In 8.15.0, APM Server began using the https://github.com/elastic/elasticsearch/tree/main/x-pack/plugin/apm-data[apm-data plugin]
110-
to manage data streams, ingest pipelines, lifecycle policies, and more. In 8.15.1, a fix was introduced to address
111-
unmanaged indices in older clusters using default ILM policies. This fix added a fallback to the default ILM policy
112-
(if it exists) and set the `prefer_ilm` configuration to `false`. This setting impacts clusters where both ILM and
113-
data stream lifecycles (DSL) are in effect—such as when configuring custom ILM policies using `@custom` component
131+
In 8.15.0, APM Server began using the https://github.com/elastic/elasticsearch/tree/main/x-pack/plugin/apm-data[apm-data plugin]
132+
to manage data streams, ingest pipelines, lifecycle policies, and more. In 8.15.1, a fix was introduced to address
133+
unmanaged indices in older clusters using default ILM policies. This fix added a fallback to the default ILM policy
134+
(if it exists) and set the `prefer_ilm` configuration to `false`. This setting impacts clusters where both ILM and
135+
data stream lifecycles (DSL) are in effect—such as when configuring custom ILM policies using `@custom` component
114136
templates, under the conditions mentioned above.
115137

116138
// How to fix it
@@ -122,7 +144,7 @@ to `true` by following the {observability-guide}/apm-ilm-how-to.html[updated gui
122144

123145
_Elastic Stack versions: 8.15.0, 8.15.1, 8.15.2, 8.15.3_ +
124146
_Fixed in Elastic Stack version 8.15.4_
125-
147+
126148
// The conditions in which this issue occurs
127149
The issue only occurs when _upgrading_ the {stack} from 8.12.2 or lower directly to any 8.15.x version prior to 8.15.4.
128150
The issue does _not_ occur when creating a _new_ cluster using any 8.15.x version, or when upgrading
@@ -132,7 +154,7 @@ from 8.12.2 to 8.13.x or 8.14.x and then to 8.15.x.
132154
In APM Servers versions prior to 8.13.0, an ingestion pipeline exists to perform a check on the version.
133155
The version check would fail any APM document produced with a different version of APM server compared to the version of the installed APM’s ingest pipeline.
134156
In 8.13.0 the version check in the ingest pipeline was removed.
135-
Due to the combination of an internal change in how apm data management assets are set up from 8.15 onwards and a bug in Elasticsearch,
157+
Due to the combination of an internal change in how apm data management assets are set up from 8.15 onwards and a bug in Elasticsearch,
136158
related to https://github.com/elastic/elasticsearch/issues/112781[lazy rollover of data streams], the ingestion pipeline conducting the version check is not removed on upgrade and prevents the ingestion of data.
137159

138160
// How to fix it

0 commit comments

Comments
 (0)