apm: Add known issue about HTTP 502 (#4956)

carsonip · raultorrecilla · colleenmcginnis · web-flow · commit 3373e06985b6 · 2025-09-18T11:03:53.000+02:00
* apm: Add known issue about HTTP 502

* Change versions

* Update docs/en/observability/apm/known-issues.asciidoc

Co-authored-by: Colleen McGinnis &lt;colleen.j.mcginnis@gmail.com&gt;

---------

Co-authored-by: Raúl Torrecilla &lt;raul.torrecilla@gmail.com&gt;
Co-authored-by: Colleen McGinnis &lt;colleen.j.mcginnis@gmail.com&gt;
diff --git a/docs/en/observability/apm/known-issues.asciidoc b/docs/en/observability/apm/known-issues.asciidoc
@@ -21,6 +21,28 @@ _Versions: XX.XX.XX, YY.YY.YY, ZZ.ZZ.ZZ_
 // If applicable, link to fix
 ////
 
+[discrete]
+== APM occasionally returning HTTP 502 "backend connection closed" or "use of closed network connection"
+
+_Elastic Stack versions: >=8.0.0 and <8.18.8 or <8.19.5, >=9.0.0 and <9.0.8 or <9.1.5_
+_Environments: ECH, ECE
+
+APM Server on ECH and ECE might sometimes return HTTP 502 with error message "backend connection closed" or "use of closed network connection" for any requests due to a rare race condition.
+When this happens to an intake request, Elastic APM agents will log an error but will not retry, leading to data loss.
+
+Note that there may be other causes to "backend connection closed" or "use of closed network connection", and the provided workaround and released bugfix will only resolve the case related to the mentioned race condition.
+
+*Workaround*
+
+To work around this issue:
+
+* Go to *Kibana* > *Fleet* > *Elastic Cloud agent policy*,
+* Next to *Elastic APM*, select the *...* icon, then *Edit Integration*.
+* Under *General*, select *Advanced options*, then change *Idle time before underlying connection is closed* to *200s*.
+* Select *Save Integration*
+
+This bug will be fixed in 8.18.7, 8.19.4, 9.0.7, 9.1.4 for new deployments, and 8.18.8, 8.19.5, 9.0.8, 9.1.5, 9.2.0 for upgraded deployments.
+
 [discrete]
 == APM Integration might be unreachable after upgrading to 8.19.0 and 9.1.0
 
@@ -99,18 +121,18 @@ PUT _component_template/metrics-apm.internal@custom
 == `prefer_ilm` required in component templates to create custom lifecycle policies
 
 _Elastic Stack versions: 8.15.1+_
-    
+
 // The conditions in which this issue occurs
 The issue occurs when creating a _new_ cluster using version 8.15.1+.
 The issue occurs for any APM data streams created in 8.15.1+.
 The issue does _not_ occur if custom component template has been created in or before version 8.15.0.
 
 // Describe why it happens
-In 8.15.0, APM Server began using the https://github.com/elastic/elasticsearch/tree/main/x-pack/plugin/apm-data[apm-data plugin] 
-to manage data streams, ingest pipelines, lifecycle policies, and more. In 8.15.1, a fix was introduced to address 
-unmanaged indices in older clusters using default ILM policies. This fix added a fallback to the default ILM policy 
-(if it exists) and set the `prefer_ilm` configuration to `false`. This setting impacts clusters where both ILM and 
-data stream lifecycles (DSL) are in effect—such as when configuring custom ILM policies using `@custom` component 
+In 8.15.0, APM Server began using the https://github.com/elastic/elasticsearch/tree/main/x-pack/plugin/apm-data[apm-data plugin]
+to manage data streams, ingest pipelines, lifecycle policies, and more. In 8.15.1, a fix was introduced to address
+unmanaged indices in older clusters using default ILM policies. This fix added a fallback to the default ILM policy
+(if it exists) and set the `prefer_ilm` configuration to `false`. This setting impacts clusters where both ILM and
+data stream lifecycles (DSL) are in effect—such as when configuring custom ILM policies using `@custom` component
 templates, under the conditions mentioned above.
 
 // How to fix it
@@ -122,7 +144,7 @@ to `true` by following the {observability-guide}/apm-ilm-how-to.html[updated gui
 
 _Elastic Stack versions: 8.15.0, 8.15.1, 8.15.2, 8.15.3_ +
 _Fixed in Elastic Stack version 8.15.4_
-    
+
 // The conditions in which this issue occurs
 The issue only occurs when _upgrading_ the {stack} from 8.12.2 or lower directly to any 8.15.x version prior to 8.15.4.
 The issue does _not_ occur when creating a _new_ cluster using any 8.15.x version, or when upgrading
@@ -132,7 +154,7 @@ from 8.12.2 to 8.13.x or 8.14.x and then to 8.15.x.
 In APM Servers versions prior to 8.13.0, an ingestion pipeline exists to perform a check on the version.
 The version check would fail any APM document produced with a different version of APM server compared to the version of the installed APM’s ingest pipeline.
 In 8.13.0 the version check in the ingest pipeline was removed.
-Due to the combination of an internal change in how apm data management assets are set up from 8.15 onwards and a bug in Elasticsearch, 
+Due to the combination of an internal change in how apm data management assets are set up from 8.15 onwards and a bug in Elasticsearch,
 related to https://github.com/elastic/elasticsearch/issues/112781[lazy rollover of data streams], the ingestion pipeline conducting the version check is not removed on upgrade and prevents the ingestion of data.
 
 // How to fix it