Skip to content

Commit 6010089

Browse files
committed
OSDOCS-3731: Update to remove tech preview
1 parent c0094cd commit 6010089

File tree

4 files changed

+195
-148
lines changed

4 files changed

+195
-148
lines changed

modules/investigating-kernel-crashes.adoc

Lines changed: 23 additions & 148 deletions
Original file line numberDiff line numberDiff line change
@@ -2,162 +2,37 @@
22
//
33
// * support/troubleshooting/troubleshooting-operating-system-issues.adoc
44

5-
:_content-type: PROCEDURE
5+
:_content-type: CONCEPT
66
[id="investigating-kernel-crashes"]
77
= Investigating kernel crashes
88

9-
== Enabling kdump
9+
The `kdump` service, included in the `kexec-tools` package, provides a crash-dumping mechanism. You can use this service to save the contents of a system's memory for later analysis.
1010

11-
The `kdump` service, included in `kexec-tools`, provides a crash-dumping mechanism. You can use this service to save the contents of the system's memory for later analysis.
11+
The `x86_64` architecture supports kdump in General Availability (GA) status, whereas other architectures support kdump in Technology Preview (TP) status.
1212

13-
The `kdump` service is a Technology Preview feature only. Technology Preview features are not supported with Red Hat production service level agreements (SLAs) and might not be functionally complete. Red Hat does not recommend using them in production. These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process.
13+
The following table provides details about the support level of kdump for different architectures.
1414

15-
For more information about the support scope of Red Hat Technology Preview
16-
features, see link:https://access.redhat.com/support/offerings/techpreview/[].
15+
.Kdump support in {op-system}
16+
[cols=",^v,^v width="100%",options="header"]
17+
|===
18+
|Architecture |Support level
1719

18-
{op-system} ships with `kexec-tools`, but manual configuration is required to enable `kdump`.
20+
a|
21+
`x86_64`
22+
| GA
1923

20-
.Procedure
24+
a|
25+
`arm64`
26+
| TP
2127

22-
Perform the following steps to enable `kdump` on {op-system}.
28+
a|
29+
`s390x`
30+
| TP
2331

24-
. To reserve memory for the crash kernel during the first kernel booting, provide kernel arguments by entering the following command:
25-
+
26-
[source, terminal]
27-
----
28-
# rpm-ostree kargs --append='crashkernel=256M'
29-
----
32+
a|
33+
`ppc64le`
34+
| TP
35+
|===
3036

31-
. Optional: To write the crash dump over the network or to some other location, rather than to the default local `/var/crash` location, edit the `/etc/kdump.conf` configuration file.
32-
+
33-
[NOTE]
34-
====
35-
Network dumps are required when using LUKS. `kdump` does not support local crash dumps on LUKS-encrypted devices.
36-
====
37-
+
38-
For details on configuring the `kdump` service, see the comments in `/etc/sysconfig/kdump`, `/etc/kdump.conf`, and the `kdump.conf` manual page.
39-
ifdef::openshift-enterprise[]
40-
Also refer to the link:https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/managing_monitoring_and_updating_the_kernel/configuring-kdump-on-the-command-line_managing-monitoring-and-updating-the-kernel[RHEL `kdump` documentation] for further information on configuring the dump target.
41-
endif::[]
42-
43-
. Enable the `kdump` systemd service.
44-
+
45-
[source, terminal]
46-
----
47-
# systemctl enable kdump.service
48-
----
49-
50-
. Reboot your system.
51-
+
52-
[source, terminal]
53-
----
54-
# systemctl reboot
55-
----
56-
57-
. Ensure that `kdump` has loaded a crash kernel by checking that the `kdump.service` has started and exited successfully and that `cat /sys/kernel/kexec_crash_loaded` prints `1`.
58-
59-
== Enabling kdump on day-1
60-
The `kdump` service is intended to be enabled per node to debug kernel problems. Because there are costs to having `kdump` enabled, and these costs accumulate with each additional `kdump`-enabled node, it is recommended that `kdump` only be enabled on each node as needed. Potential costs of enabling `kdump` on each node include:
61-
62-
* Less available RAM due to memory being reserved for the crash kernel.
63-
* Node unavailability while the kernel is dumping the core.
64-
* Additional storage space being used to store the crash dumps.
65-
* Not being production-ready because the `kdump` service is in link:https://access.redhat.com/support/offerings/techpreview[Technology Preview].
66-
67-
If you are aware of the downsides and trade-offs of having the `kdump` service enabled, it is possible to enable `kdump` in a cluster-wide fashion. Although machine-specific machine configs are not yet supported, you can perform the previous steps through a `systemd` unit in a `MachineConfig` object on day-1 and have kdump enabled on all nodes in the cluster. You can create a `MachineConfig` object and inject that object into the set of manifest files used by Ignition during cluster setup. See "Customizing nodes" in the _Installing -> Installation configuration_ section for more information and examples on how to use Ignition configs.
68-
69-
.Procedure
70-
71-
Create a `MachineConfig` object for cluster-wide configuration:
72-
73-
. Create a Butane config file, `99-worker-kdump.bu`, that configures and enables kdump:
74-
+
75-
[source,yaml]
76-
----
77-
variant: openshift
78-
version: 4.10.0
79-
metadata:
80-
name: 99-worker-kdump <1>
81-
labels:
82-
machineconfiguration.openshift.io/role: worker <1>
83-
openshift:
84-
kernel_arguments: <2>
85-
- crashkernel=256M
86-
storage:
87-
files:
88-
- path: /etc/kdump.conf <3>
89-
mode: 0644
90-
overwrite: true
91-
contents:
92-
inline: |
93-
path /var/crash
94-
core_collector makedumpfile -l --message-level 7 -d 31
95-
96-
- path: /etc/sysconfig/kdump <4>
97-
mode: 0644
98-
overwrite: true
99-
contents:
100-
inline: |
101-
KDUMP_COMMANDLINE_REMOVE="hugepages hugepagesz slub_debug quiet log_buf_len swiotlb"
102-
KDUMP_COMMANDLINE_APPEND="irqpoll nr_cpus=1 reset_devices cgroup_disable=memory mce=off numa=off udev.children-max=2 panic=10 rootflags=nofail acpi_no_memhotplug transparent_hugepage=never nokaslr novmcoredd hest_disable"
103-
KEXEC_ARGS="-s"
104-
KDUMP_IMG="vmlinuz"
105-
106-
systemd:
107-
units:
108-
- name: kdump.service
109-
enabled: true
110-
----
111-
+
112-
<1> Replace `worker` with `master` in both locations when creating a `MachineConfig` object for control plane nodes.
113-
<2> Provide kernel arguments to reserve memory for the crash kernel. You can add other kernel arguments if necessary.
114-
<3> If you want to change the contents of `/etc/kdump.conf` from the default, include this section and modify the `inline` subsection accordingly.
115-
<4> If you want to change the contents of `/etc/sysconfig/kdump` from the default, include this section and modify the `inline` subsection accordingly.
116-
117-
. Use Butane to generate a machine config YAML file, `99-worker-kdump.yaml`, containing the configuration to be delivered to the nodes:
118-
+
119-
[source,terminal]
120-
----
121-
$ butane 99-worker-kdump.bu -o 99-worker-kdump.yaml
122-
----
123-
124-
. Put the YAML file into manifests during cluster setup. You can also create this `MachineConfig` object after cluster setup with the YAML file:
125-
+
126-
[source,terminal]
127-
----
128-
$ oc create -f ./99-worker-kdump.yaml
129-
----
130-
131-
== Testing the kdump configuration
132-
133-
ifdef::openshift-enterprise[]
134-
See the link:https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/managing_monitoring_and_updating_the_kernel/configuring-kdump-on-the-command-line_managing-monitoring-and-updating-the-kernel#testing-the-kdump-configuration_configuring-kdump-on-the-command-line[Testing the kdump configuration] section in the {op-system-base} documentation for `kdump`.
135-
endif::[]
136-
137-
ifdef::openshift-origin[]
138-
See the link:https://fedoraproject.org/wiki/How_to_use_kdump_to_debug_kernel_crashes#Step_2:_Capturing_the_Dump[Capturing the Dump] section in the {op-system-base} documentation for `kdump`.
139-
endif::[]
140-
141-
== Analyzing a core dump
142-
143-
ifdef::openshift-enterprise[]
144-
See the link:https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/managing_monitoring_and_updating_the_kernel/analyzing-a-core-dump_managing-monitoring-and-updating-the-kernel[Analyzing a core dump] section in the {op-system-base} documentation for `kdump`.
145-
endif::[]
146-
147-
ifdef::openshift-origin[]
148-
See the link:https://fedoraproject.org/wiki/How_to_use_kdump_to_debug_kernel_crashes#Step_3:_Dump_Analysis[Dump Analysis] section in the {op-system-base} documentation for `kdump`.
149-
endif::[]
150-
151-
[role="_additional-resources"]
152-
.Additional resources
153-
ifdef::openshift-origin[]
154-
* link:https://docs.fedoraproject.org/en-US/fedora-coreos/debugging-kernel-crashes/[Fedora CoreOS Docs on debugging kernel crashes]
155-
* link:https://fedoraproject.org/wiki/How_to_use_kdump_to_debug_kernel_crashes[Setting up kdump in Fedora]
156-
endif::[]
157-
ifdef::openshift-enterprise[]
158-
* link:https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/managing_monitoring_and_updating_the_kernel/configuring-kdump-on-the-command-line_managing-monitoring-and-updating-the-kernel[Setting up kdump in RHEL]
159-
endif::[]
160-
* link:https://www.kernel.org/doc/html/latest/admin-guide/kdump/kdump.html[Linux kernel documentation for kdump]
161-
* kdump.conf(5) — a manual page for the `/etc/kdump.conf` configuration file containing the full documentation of available options
162-
* kexec(8) — a manual page for `kexec`
163-
* link:https://access.redhat.com/site/solutions/6038[Red Hat Knowledgebase article] regarding `kexec` and `kdump`.
37+
:FeatureName: Kdump support, for the preceding three architectures in the table,
38+
include::snippets/technology-preview.adoc[leveloffset=+1]
Lines changed: 82 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,82 @@
1+
// Module included in the following assemblies:
2+
//
3+
// * support/troubleshooting-operating-system-issues.adoc
4+
5+
:_content-type: PROCEDURE
6+
[id="enabling-kdump-day-one"]
7+
= Enabling kdump on day-1
8+
9+
The `kdump` service is intended to be enabled per node to debug kernel problems. Because there are costs to having kdump enabled, and these costs accumulate with each additional kdump-enabled node, it is recommended that the `kdump` service only be enabled on each node as needed. Potential costs of enabling the `kdump` service on each node include:
10+
11+
* Less available RAM due to memory being reserved for the crash kernel.
12+
* Node unavailability while the kernel is dumping the core.
13+
* Additional storage space being used to store the crash dumps.
14+
15+
If you are aware of the downsides and trade-offs of having the `kdump` service enabled, it is possible to enable kdump in a cluster-wide fashion. Although machine-specific machine configs are not yet supported, you can use a `systemd` unit in a `MachineConfig` object as a day-1 customization and have kdump enabled on all nodes in the cluster. You can create a `MachineConfig` object and inject that object into the set of manifest files used by Ignition during cluster setup.
16+
17+
[NOTE]
18+
====
19+
See "Customizing nodes" in the _Installing -> Installation configuration_ section for more information and examples on how to use Ignition configs.
20+
====
21+
22+
.Procedure
23+
24+
Create a `MachineConfig` object for cluster-wide configuration:
25+
26+
. Create a Butane config file, `99-worker-kdump.bu`, that configures and enables kdump:
27+
+
28+
[source,yaml]
29+
----
30+
variant: openshift
31+
version: 4.11.0
32+
metadata:
33+
name: 99-worker-kdump <1>
34+
labels:
35+
machineconfiguration.openshift.io/role: worker <1>
36+
openshift:
37+
kernel_arguments: <2>
38+
- crashkernel=256M
39+
storage:
40+
files:
41+
- path: /etc/kdump.conf <3>
42+
mode: 0644
43+
overwrite: true
44+
contents:
45+
inline: |
46+
path /var/crash
47+
core_collector makedumpfile -l --message-level 7 -d 31
48+
49+
- path: /etc/sysconfig/kdump <4>
50+
mode: 0644
51+
overwrite: true
52+
contents:
53+
inline: |
54+
KDUMP_COMMANDLINE_REMOVE="hugepages hugepagesz slub_debug quiet log_buf_len swiotlb"
55+
KDUMP_COMMANDLINE_APPEND="irqpoll nr_cpus=1 reset_devices cgroup_disable=memory mce=off numa=off udev.children-max=2 panic=10 rootflags=nofail acpi_no_memhotplug transparent_hugepage=never nokaslr novmcoredd hest_disable"
56+
KEXEC_ARGS="-s"
57+
KDUMP_IMG="vmlinuz"
58+
59+
systemd:
60+
units:
61+
- name: kdump.service
62+
enabled: true
63+
----
64+
+
65+
<1> Replace `worker` with `master` in both locations when creating a `MachineConfig` object for control plane nodes.
66+
<2> Provide kernel arguments to reserve memory for the crash kernel. You can add other kernel arguments if necessary.
67+
<3> If you want to change the contents of `/etc/kdump.conf` from the default, include this section and modify the `inline` subsection accordingly.
68+
<4> If you want to change the contents of `/etc/sysconfig/kdump` from the default, include this section and modify the `inline` subsection accordingly.
69+
70+
. Use Butane to generate a machine config YAML file, `99-worker-kdump.yaml`, containing the configuration to be delivered to the nodes:
71+
+
72+
[source,terminal]
73+
----
74+
$ butane 99-worker-kdump.bu -o 99-worker-kdump.yaml
75+
----
76+
77+
. Put the YAML file into the `<installation_directory>/manifests/` directory during cluster setup. You can also create this `MachineConfig` object after cluster setup with the YAML file:
78+
+
79+
[source,terminal]
80+
----
81+
$ oc create -f 99-worker-kdump.yaml
82+
----
Lines changed: 48 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,48 @@
1+
// Module included in the following assemblies:
2+
//
3+
// * support/troubleshooting/troubleshooting-operating-system-issues.adoc
4+
5+
:_content-type: PROCEDURE
6+
[id="enabling-kdump"]
7+
= Enabling kdump
8+
9+
{op-system} ships with the `kexec-tools` package, but manual configuration is required to enable the `kdump` service.
10+
11+
.Procedure
12+
13+
Perform the following steps to enable kdump on {op-system}.
14+
15+
. To reserve memory for the crash kernel during the first kernel booting, provide kernel arguments by entering the following command:
16+
+
17+
[source,terminal]
18+
----
19+
# rpm-ostree kargs --append='crashkernel=256M'
20+
----
21+
22+
. Optional: To write the crash dump over the network or to some other location, rather than to the default local `/var/crash` location, edit the `/etc/kdump.conf` configuration file.
23+
+
24+
[NOTE]
25+
====
26+
If your node uses LUKS-encrypted devices, you must use network dumps as kdump does not support saving crash dumps to LUKS-encrypted devices.
27+
====
28+
+
29+
For details on configuring the `kdump` service, see the comments in `/etc/sysconfig/kdump`, `/etc/kdump.conf`, and the `kdump.conf` manual page.
30+
ifdef::openshift-enterprise[]
31+
Also refer to the link:https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/managing_monitoring_and_updating_the_kernel/configuring-kdump-on-the-command-line_managing-monitoring-and-updating-the-kernel[RHEL kdump documentation] for further information on configuring the dump target.
32+
endif::[]
33+
34+
. Enable the `kdump` systemd service.
35+
+
36+
[source,terminal]
37+
----
38+
# systemctl enable kdump.service
39+
----
40+
41+
. Reboot your system.
42+
+
43+
[source,terminal]
44+
----
45+
# systemctl reboot
46+
----
47+
48+
. Ensure that kdump has loaded a crash kernel by checking that the `kdump.service` systemd service has started and exited successfully and that the command, `cat /sys/kernel/kexec_crash_loaded`, prints the value `1`.

support/troubleshooting/troubleshooting-operating-system-issues.adoc

Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,3 +10,45 @@ toc::[]
1010

1111
// Investigating kernel crashes
1212
include::modules/investigating-kernel-crashes.adoc[leveloffset=+1]
13+
14+
include::modules/troubleshooting-enabling-kdump.adoc[leveloffset=+2]
15+
16+
include::modules/troubleshooting-enabling-kdump-day-one.adoc[leveloffset=+2]
17+
18+
[id="testing-kdump-configuration"]
19+
=== Testing the kdump configuration
20+
21+
ifdef::openshift-enterprise[]
22+
See the link:https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/managing_monitoring_and_updating_the_kernel/configuring-kdump-on-the-command-line_managing-monitoring-and-updating-the-kernel#testing-the-kdump-configuration_configuring-kdump-on-the-command-line[Testing the kdump configuration] section in the {op-system-base} documentation for kdump.
23+
endif::[]
24+
25+
ifdef::openshift-origin[]
26+
See the link:https://fedoraproject.org/wiki/How_to_use_kdump_to_debug_kernel_crashes#Step_2:_Capturing_the_Dump[Capturing the Dump] section in the {op-system-base} documentation for kdump.
27+
endif::[]
28+
29+
[id="analyzing-core-dumps"]
30+
=== Analyzing a core dump
31+
32+
ifdef::openshift-enterprise[]
33+
See the link:https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/managing_monitoring_and_updating_the_kernel/analyzing-a-core-dump_managing-monitoring-and-updating-the-kernel[Analyzing a core dump] section in the {op-system-base} documentation for kdump.
34+
endif::[]
35+
36+
ifdef::openshift-origin[]
37+
See the link:https://fedoraproject.org/wiki/How_to_use_kdump_to_debug_kernel_crashes#Step_3:_Dump_Analysis[Dump Analysis] section in the {op-system-base} documentation for kdump.
38+
endif::[]
39+
40+
[discrete]
41+
[role="_additional-resources"]
42+
[id="additional-resources_investigating-kernel-crashes"]
43+
=== Additional resources
44+
ifdef::openshift-origin[]
45+
* link:https://docs.fedoraproject.org/en-US/fedora-coreos/debugging-kernel-crashes/[Fedora CoreOS Docs on debugging kernel crashes]
46+
* link:https://fedoraproject.org/wiki/How_to_use_kdump_to_debug_kernel_crashes[Setting up kdump in Fedora]
47+
endif::[]
48+
ifdef::openshift-enterprise[]
49+
* link:https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/managing_monitoring_and_updating_the_kernel/configuring-kdump-on-the-command-line_managing-monitoring-and-updating-the-kernel[Setting up kdump in RHEL]
50+
endif::[]
51+
* link:https://www.kernel.org/doc/html/latest/admin-guide/kdump/kdump.html[Linux kernel documentation for kdump]
52+
* kdump.conf(5) — a manual page for the `/etc/kdump.conf` configuration file containing the full documentation of available options
53+
* kexec(8) — a manual page for the `kexec` package
54+
* link:https://access.redhat.com/site/solutions/6038[Red Hat Knowledgebase article] regarding kexec and kdump

0 commit comments

Comments
 (0)