Skip to content

Commit 5b00336

Browse files
authored
Merge pull request #270140 from dramasamy/log-collector
[NC 3.11.0] NAKS log collector script
2 parents 96b4b16 + 43fcfaf commit 5b00336

File tree

2 files changed

+121
-0
lines changed

2 files changed

+121
-0
lines changed

articles/operator-nexus/TOC.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -165,6 +165,8 @@
165165
href: how-to-customize-kubernetes-cluster-dns.md
166166
- name: Customize Worker Nodes
167167
href: howto-kubernetes-cluster-customize-workers.md
168+
- name: Collect debug logs for support ticket
169+
href: howto-kubernetes-cluster-log-collector-script.md
168170
- name: Nexus Virtual Machine
169171
expanded: false
170172
items:
Lines changed: 119 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,119 @@
1+
---
2+
title: "Azure Operator Nexus: How to run log collector script"
3+
description: Learn how to run the log collector script.
4+
author: dramasamy
5+
ms.author: dramasamy
6+
ms.service: azure-operator-nexus
7+
ms.topic: how-to
8+
ms.date: 03/25/2024
9+
ms.custom: template-how-to
10+
---
11+
12+
# Run the log collector script on the Azure Operator Nexus Kubernetes cluster node
13+
14+
Microsoft support may need deeper visibility within the Nexus Kubernetes cluster in certain scenarios. To facilitate this, a log-collection script is available for you to use. This script retrieves all the necessary logs, enabling Microsoft support to gain a better understanding of the issue and troubleshoot it effectively.
15+
16+
## What it collects
17+
18+
The log collector script is designed to comprehensively gather data across various aspects of the system for troubleshooting and analysis purposes. Below is an overview of the types of diagnostic data it collects:
19+
20+
### System and kernel diagnostics
21+
22+
- Kernel information: Logs, human-readable messages, version, and architecture, for in-depth kernel diagnostics.
23+
- Operating System Logs: Essential logs detailing system activity and container logs for system services.
24+
25+
### Hardware and resource usage
26+
27+
- CPU and IO throttled processes: Identifies throttling issues, providing insights into performance bottlenecks.
28+
- Network Interface Statistics: Detailed statistics for network interfaces to diagnose errors and drops.
29+
30+
### Software and services
31+
32+
- Installed packages: A list of all installed packages, vital for understanding the system's software environment.
33+
- Active system services: Information on active services, process snapshots, and detailed system and process statistics.
34+
- Container runtime and Kubernetes components logs: Logs for Kubernetes components and other vital services for cluster diagnostics.
35+
36+
### Networking and connectivity
37+
38+
- Network connection tracking information: Conntrack statistics and connection lists for firewall diagnostics.
39+
- Network configuration and interface details: Interface configurations, IP routing, addresses, and neighbor information.
40+
- Any additional interface configuration and logs: Logs related to the configuration of all interfaces inside the Node.
41+
- Network connectivity tests: Tests external network connectivity and Kubernetes API server communication.
42+
- DNS resolution configuration: DNS resolver configuration for diagnosing domain name resolution issues.
43+
- Networking configuration and logs: Comprehensive networking data including connection tracking and interface configurations.
44+
- Container network interface (CNI) configuration: Configuration of CNI for container networking diagnostics.
45+
46+
### Security and compliance
47+
48+
- SELinux status: Reports the SELinux mode to understand access control and security contexts.
49+
- IPtables rules: Configuration of IPtables rulesets for insights into firewall settings.
50+
51+
### Storage and filesystems
52+
53+
- Mount points and volume information: Detailed information on mount points, volumes, disk usage, and filesystem specifics.
54+
55+
### Configuration and management
56+
57+
- System configuration: Sysctl parameters for a comprehensive view of kernel runtime configuration.
58+
- Kubernetes configuration and health: Kubernetes setup details, including configurations and service listings.
59+
- Container runtime information: Configuration, version information, and details on running containers.
60+
- Container runtime interface (CRI) information: Operations data for container runtime interface, aiding in container orchestration diagnostics.
61+
62+
## Prerequisite
63+
64+
- Ensure that you have SSH access to the Nexus Kubernetes cluster node. If you have direct IP reachability to the node, establish an SSH connection directly. Otherwise, use Azure Arc for servers with the command `az ssh arc`. For more information about various connectivity methods, check out the [connect to the cluster](./howto-kubernetes-cluster-connect.md) article.
65+
66+
## Execution
67+
68+
Once you have SSH access to the node, run the log collector script by executing the command `sudo /opt/log-collector/collect.sh`.
69+
70+
Upon execution, you observe an output similar to:
71+
72+
``` bash
73+
Trying to check for root...
74+
Trying to check for required utilities...
75+
Trying to create required directories...
76+
Trying to check for disk space...
77+
Trying to start collecting logs... Trying to collect common operating system logs...
78+
Trying to collect mount points and volume information...
79+
Trying to collect SELinux status...
80+
.
81+
.
82+
Trying to archive gathered information...
83+
Finishing up...
84+
85+
Done... your bundled logs are located in /var/log/<node_name_date_time-UTC>.tar.gz
86+
```
87+
88+
## How to download the log file
89+
90+
Once the log file is generated, you can download the generated log file from your cluster node to your local machine using various methods, including SCP, SFTP, or Azure CLI. However, it's important to note that SCP or SFTP are only possible if you have direct IP reachability to the cluster node. If you don't have direct IP reachability, you can use Azure CLI to download the log file.
91+
92+
This command should look familiar to you, as it's the same command used to SSH into the Nexus Kubernetes cluster node. To download the generated log file from the node to your local machine, use this command again, with the addition of the `cat` command at the end to copy the file.
93+
94+
``` bash
95+
RESOURCE_GROUP="myResourceGroup"
96+
CLUSTER_NAME="myNexusK8sCluster"
97+
SUBSCRIPTION_ID="<Subscription ID>"
98+
USER_NAME="azureuser"
99+
SSH_PRIVATE_KEY_FILE="<vm_ssh_id_rsa>"
100+
MANAGED_RESOURCE_GROUP=$(az networkcloud kubernetescluster show -n $CLUSTER_NAME -g $RESOURCE_GROUP --subscription $SUBSCRIPTION_ID --output tsv --query managedResourceGroupConfiguration.name)
101+
```
102+
103+
> [!NOTE]
104+
> Replace the placeholders variables with actual values relevant to your Azure environment and Nexus Kubernetes cluster.
105+
106+
```azurecli
107+
az ssh arc --subscription $SUBSCRIPTION_ID \
108+
--resource-group $MANAGED_RESOURCE_GROUP \
109+
--name <VM Name> \
110+
--local-user $USER_NAME \
111+
--private-key-file $SSH_PRIVATE_KEY_FILE
112+
'sudo cat /var/log/node_name_date_time-UTC.tar.gz' > <Local machine path>/node_name_date_time-UTC.tar.gz
113+
```
114+
115+
In the preceding command, replace `node_name_date_time-UTC.tar.gz` with the name of the log file created in your cluster node, and `<Local machine path>` with the location on your local machine where you want to save the file.
116+
117+
## Next steps
118+
119+
After downloading the tar file to your local machine, you can upload it to the support ticket for the Microsoft support to review the logs.

0 commit comments

Comments
 (0)