Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
75 commits
Select commit Hold shift + click to select a range
dc94ba8
pub telemetry changes
priti-parate Apr 30, 2026
5cedc5a
service to scrape metrics from OTEL collector
priti-parate May 4, 2026
84fd15b
Merge branch 'pub/q2_dev' of https://github.com/priti-parate/omnia in…
priti-parate May 4, 2026
edeab4f
vmservice to scrape metrics from otel collector
priti-parate May 4, 2026
cf00983
update endpoints
priti-parate May 4, 2026
b5c05d0
revert other changes
priti-parate May 4, 2026
57bd382
revert merge changes as per head
priti-parate May 5, 2026
d46e67e
revert variable set
priti-parate May 5, 2026
83bd12c
revert changes
priti-parate May 5, 2026
a09467e
revert changes
priti-parate May 5, 2026
4534675
pylint fixes
priti-parate May 5, 2026
fb86981
ansible lint fixes
priti-parate May 5, 2026
b0d44cc
updating completion messaage
priti-parate May 5, 2026
61d8b68
telemetry validation while prepare oim
priti-parate May 5, 2026
f91f826
update condition
priti-parate May 5, 2026
e8ba725
added check for LDMS
priti-parate May 5, 2026
65dd6bd
Fix for crashloopback state on node reboot
priti-parate May 6, 2026
9240faf
Merge branch 'dell:pub/q2_dev' into pub/q2_dev
priti-parate May 6, 2026
5c299a3
Merge branch 'dell:pub/q2_dev' into pub/q2_dev
priti-parate May 6, 2026
6842efd
addressing review comment to move into vars
priti-parate May 6, 2026
8d75e1b
Merge branch 'pub/q2_dev' of https://github.com/priti-parate/omnia in…
priti-parate May 6, 2026
304026b
remove rsyslog layer, update vmscraper and enabled external health mo…
priti-parate May 6, 2026
93a4c53
Merge branch 'dell:pub/q2_dev' into pub/q2_dev
priti-parate May 7, 2026
4c7fcc1
fix for k8s_server_ip undefined variable
priti-parate May 7, 2026
280daa9
Merge branch 'pub/q2_dev' of https://github.com/priti-parate/omnia in…
priti-parate May 7, 2026
8c78056
fix for syntax error
priti-parate May 7, 2026
531a9a6
fix for UT issues - DNS resolution and keep powerscale configuration …
priti-parate May 7, 2026
7692972
ansible lint fixes
priti-parate May 7, 2026
3d45aea
NAtive operator based vmscraper
priti-parate May 7, 2026
3a3af65
Merge branch 'pub/q2_dev' into pub/q2_dev
priti-parate May 7, 2026
e0cb65d
remove usused syslog template
priti-parate May 8, 2026
2598efb
Merge branch 'pub/q2_dev' of https://github.com/priti-parate/omnia in…
priti-parate May 8, 2026
af1649a
fix for nfs_client_param check in telemetry config
priti-parate May 8, 2026
93ab4f0
update storage_config variable
priti-parate May 8, 2026
8acbd41
fix for k8s_nfs_server_path undefined variable
priti-parate May 8, 2026
2ae3050
fix for kustomization error
priti-parate May 8, 2026
901b7d1
remove powerscale syslog configuration
priti-parate May 8, 2026
90f36ab
support for external health monitor metrics
priti-parate May 8, 2026
e173f4f
CSM authorization support
priti-parate May 8, 2026
79f72dd
deploy using template
priti-parate May 8, 2026
47f6db6
REmove powerscale authorization support
priti-parate May 8, 2026
85ba40b
Merge branch 'pub/q2_dev' of https://github.com/priti-parate/omnia in…
priti-parate May 8, 2026
dd7e765
remove authorizaton specific files
priti-parate May 8, 2026
d5576e2
fix for DNS resolution
priti-parate May 8, 2026
d8c3521
UT sisue fixes
priti-parate May 9, 2026
b3091f9
FQDN for lod collection
priti-parate May 9, 2026
77bb185
controller side external health monitor
priti-parate May 9, 2026
2fbcd19
cuxtome exporter for scraping powerscale health monitor metrics
priti-parate May 9, 2026
55e500e
revert vlinsert url
priti-parate May 9, 2026
63f3b23
template rendering for cSI volume exporter
priti-parate May 9, 2026
0c64fd7
update condition for csi volume exporter
priti-parate May 9, 2026
881524e
UT issue fixes
priti-parate May 9, 2026
a532ec5
variable declaration in telemetry
priti-parate May 9, 2026
1f40d67
correct setting of volume monitoring enabled
priti-parate May 9, 2026
719277b
DNS reolution issue for syslog
priti-parate May 9, 2026
39d969c
collect volume stat metrics
priti-parate May 10, 2026
3ae01b1
ansible lint fixes
priti-parate May 10, 2026
50d6858
update csi volume exporter
priti-parate May 10, 2026
b953170
input validation of CSI secret file presence when syslog is enabled
priti-parate May 11, 2026
9de81e2
Merge branch 'pub/q2_dev' of https://github.com/priti-parate/omnia in…
priti-parate May 13, 2026
fba93af
remove unwanted task
priti-parate May 13, 2026
59a84d5
remove unused message
priti-parate May 13, 2026
1efcc11
Merge branch 'dell:pub/q2_dev' into pub/q2_dev
priti-parate May 13, 2026
9b551f5
disable and enable telemetry
priti-parate May 13, 2026
d1935f6
Merge branch 'pub/q2_dev' of https://github.com/priti-parate/omnia in…
priti-parate May 13, 2026
aebd242
fix for pxe mapping valdiation
priti-parate May 13, 2026
382583f
fix for creadentils files parsing is getting overwritten with flat va…
priti-parate May 13, 2026
1b9142b
ansible lint fixes
priti-parate May 14, 2026
1d7cbaf
update packages path
priti-parate May 14, 2026
0e95115
create seperate role
priti-parate May 14, 2026
cbf6bc5
Signed-off-by: priti-parate <140157516+priti-parate@users.noreply.git…
priti-parate May 14, 2026
e532e3a
update input vars
priti-parate May 14, 2026
79afdec
update input_dir task
priti-parate May 14, 2026
a391355
include input_dir
priti-parate May 14, 2026
2a90400
update telemetry enable and disable tasks
priti-parate May 14, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -394,6 +394,11 @@ def validate_mapping_file_entries(mapping_file_path):
if not reader.fieldnames:
raise ValueError("CSV header not found in mapping file.")

# Check for leading/trailing whitespace in header names
for fn in reader.fieldnames:
if fn != fn.strip():
raise ValueError(f"Header '{fn}' has leading or trailing whitespace. Please remove all whitespace from header names in mapping file.")

# Map header names case-insensitively to original names
fieldname_map = {fn.strip().upper(): fn for fn in reader.fieldnames}

Expand All @@ -411,6 +416,12 @@ def validate_mapping_file_entries(mapping_file_path):
row_seen = False
for row_idx, row in enumerate(reader, start=2): # start=2 approximates CSV row number
row_seen = True

# Check for leading/trailing whitespace in all field values
for col, val in row.items():
if val is not None and val != val.strip():
raise ValueError(f"Field '{col}' at CSV row {row_idx} has leading or trailing whitespace. Please remove all whitespace from field values in mapping file.")

# Check presence and non-empty for all required headers
for hdr in required_headers:
col = fieldname_map[hdr]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,6 @@
when:
- kube_vip is defined
- kube_vip | length > 0
tags: telemetry_deployment
block:
- name: Set kube_vip reachability fact to false initially
ansible.builtin.set_fact:
Expand Down
33 changes: 33 additions & 0 deletions common/tasks/common/load_ha_config.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
# Copyright 2026 Dell Inc. or its subsidiaries. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
---

- name: Load high_availability_config.yml
ansible.builtin.include_vars:
file: "{{ ha_config_file }}"
register: ha_config_loaded
ignore_errors: true

- name: Set kube_vip fact
ansible.builtin.set_fact:
kube_vip: "{{ service_k8s_cluster_ha[0].virtual_ip_address | default('') }}"
when: ha_config_loaded is succeeded

- name: Fail if kube_vip is empty
ansible.builtin.fail:
msg: "kube_vip is not set in high_availability_config.yml. Please configure service_k8s_cluster_ha[0].virtual_ip_address"
when:
- ha_config_loaded is succeeded
- kube_vip is defined
- kube_vip | length == 0
2 changes: 1 addition & 1 deletion provision/roles/telemetry/tasks/main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@
ansible.builtin.include_tasks: load_service_images.yml

- name: Check kube_vip reachability for validation
ansible.builtin.include_tasks: check_kube_vip_reachability.yml
ansible.builtin.include_tasks: "{{ playbook_dir }}/../common/tasks/telemetry/check_kube_vip_reachability.yml"
when:
- victoria_metrics_support | default(false) | bool
- kube_vip is defined
Expand Down
46 changes: 46 additions & 0 deletions provision/roles/telemetry/templates/telemetry/telemetry.sh.j2
Original file line number Diff line number Diff line change
Expand Up @@ -100,4 +100,50 @@ else
fi
{% endif %}

{% if victoria_logs_support %}
# Check reachability of additional log write endpoints
{% if telemetry_config.telemetry_sinks.victoria_logs.additional_log_write_endpoints | default([]) %}
echo "Checking reachability of additional log write endpoints..."
# Wait for VLAgent to be ready before checking endpoint reachability
echo " Waiting for VLAgent to be ready..."
kubectl wait --for=condition=ready --timeout=300s statefulset/vlagent -n telemetry || echo " WARNING: VLAgent not ready within timeout"

VLAGENT_POD=$(kubectl get pod -n telemetry -l app.kubernetes.io/name=vlagent -o jsonpath='{.items[0].metadata.name}' 2>/dev/null)
if [ -n "$VLAGENT_POD" ]; then
{% for endpoint in telemetry_config.telemetry_sinks.victoria_logs.additional_log_write_endpoints %}
echo " Testing connectivity to: {{ endpoint.url }}"
# Test connectivity using wget (more reliable than curl in minimal containers)
kubectl exec -n telemetry "$VLAGENT_POD" -- wget -T 5 -q --spider "{{ endpoint.url }}" 2>/dev/null && \
echo " ✓ Endpoint reachable" || \
echo " WARNING: Endpoint unreachable - logs may not be forwarded to {{ endpoint.url }}"
{% endfor %}
else
echo " WARNING: Could not find VLAgent pod to check endpoint reachability"
fi
{% endif %}
{% endif %}

{% if victoria_metrics_support %}
# Check reachability of additional metric remote write endpoints
{% if telemetry_config.telemetry_sinks.victoria_metrics.additional_metric_remote_write_endpoints | default([]) %}
echo "Checking reachability of additional metric remote write endpoints..."
# Wait for vmagent to be ready before checking endpoint reachability
echo " Waiting for vmagent to be ready..."
kubectl wait --for=condition=ready --timeout=300s deployment/vmagent -n telemetry || echo " WARNING: vmagent not ready within timeout"

VMAGENT_POD=$(kubectl get pod -n telemetry -l app.kubernetes.io/name=vmagent -o jsonpath='{.items[0].metadata.name}' 2>/dev/null)
if [ -n "$VMAGENT_POD" ]; then
{% for endpoint in telemetry_config.telemetry_sinks.victoria_metrics.additional_metric_remote_write_endpoints %}
echo " Testing connectivity to: {{ endpoint.url }}"
# Test connectivity using wget (more reliable than curl in minimal containers)
kubectl exec -n telemetry "$VMAGENT_POD" -- wget -T 5 -q --spider "{{ endpoint.url }}" 2>/dev/null && \
echo " ✓ Endpoint reachable" || \
echo " WARNING: Endpoint unreachable - metrics may not be forwarded to {{ endpoint.url }}"
{% endfor %}
else
echo " WARNING: Could not find vmagent pod to check endpoint reachability"
fi
{% endif %}
{% endif %}

echo "===== Telemetry Stack Deployment Complete ====="
Original file line number Diff line number Diff line change
Expand Up @@ -84,8 +84,9 @@ spec:
command: ["/bin/sh", "-c"]
args:
- |
pip3 install prometheus_client==0.20.0 kubernetes==33.1.0 \
--find-links="{{ offline_pip_module_path }}/prometheus_client==0.20.0/" \
pip3 install \
"{{ offline_pip_module_path }}/prometheus_client==0.20.0/prometheus_client-0.20.0-py3-none-any.whl" \
"{{ offline_pip_module_path }}/kubernetes==33.1.0/kubernetes-33.1.0-py2.py3-none-any.whl" \
--trusted-host "{{ pulp_server_ip }}" \
--no-index || \
pip3 install prometheus_client kubernetes
Expand Down Expand Up @@ -132,7 +133,6 @@ spec:
'Total PowerScale PVCs by phase',
['phase'])


# ── Health event metrics (from CSI external-health-monitor-controller) ──
volume_condition_abnormal = Gauge(
'powerscale_volume_health_abnormal',
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
# Copyright 2026 Dell Inc. or its subsidiaries. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
---

- name: Scale down OTEL Collector
ansible.builtin.command:
kubectl scale deployment --replicas=0 -n {{ telemetry_namespace }} otel-collector
delegate_to: "{{ kube_vip }}"
failed_when: false
changed_when: false

- name: Scale down karavi-metrics-powerscale
ansible.builtin.command:
kubectl scale deployment --replicas=0 -n {{ telemetry_namespace }} karavi-metrics-powerscale
delegate_to: "{{ kube_vip }}"
failed_when: false
changed_when: false

- name: Scale down csi-volume-exporter
ansible.builtin.command:
kubectl scale deployment --replicas=0 -n {{ telemetry_namespace }} csi-volume-exporter
delegate_to: "{{ kube_vip }}"
failed_when: false
changed_when: false

- name: Scale down karavi-observability-cert-manager
ansible.builtin.command:
kubectl scale deployment --replicas=0 -n {{ telemetry_namespace }} karavi-observability-cert-manager
delegate_to: "{{ kube_vip }}"
failed_when: false
changed_when: false

- name: Scale down karavi-observability-cert-manager-cainjector
ansible.builtin.command:
kubectl scale deployment --replicas=0 -n {{ telemetry_namespace }} karavi-observability-cert-manager-cainjector
delegate_to: "{{ kube_vip }}"
failed_when: false
changed_when: false

- name: Scale down karavi-observability-cert-manager-webhook
ansible.builtin.command:
kubectl scale deployment --replicas=0 -n {{ telemetry_namespace }} karavi-observability-cert-manager-webhook
delegate_to: "{{ kube_vip }}"
failed_when: false
changed_when: false

- name: Display PowerScale metric workloads scaled down
ansible.builtin.debug:
msg: "{{ powerscale_metrics_scaled_down_msg }}"
37 changes: 37 additions & 0 deletions telemetry/roles/telemetry_disable/tasks/main.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
# Copyright 2026 Dell Inc. or its subsidiaries. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
---

- name: Prerequisite setup
tags: always
block:
- name: Fail if no tags provided
ansible.builtin.fail:
msg: "{{ tags_required_msg }}"
when: ansible_run_tags | default(['all']) | length == 1 and 'all' in ansible_run_tags | default(['all'])

- name: Load telemetry configuration
ansible.builtin.include_vars:
file: "{{ telemetry_config_file }}"

- name: Load HA configuration
ansible.builtin.include_tasks: "{{ playbook_dir }}/../common/tasks/common/load_ha_config.yml"

- name: Disable PowerScale metrics
tags:
- powerscale
when: kube_vip is defined and kube_vip | length > 0
block:
- name: Disable PowerScale metrics
ansible.builtin.include_tasks: disable_powerscale_metrics.yml
17 changes: 17 additions & 0 deletions telemetry/roles/telemetry_disable/vars/main.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
# Copyright 2026 Dell Inc. or its subsidiaries. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
---

tags_required_msg: "No tags provided. Please run this playbook with the --tags flag. Example: ansible-playbook telemetry_disable.yml --tags powerscale"
powerscale_metrics_scaled_down_msg: "PowerScale metrics workloads have been scaled down"
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
# Copyright 2026 Dell Inc. or its subsidiaries. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
---

- name: Scale up karavi-observability-cert-manager to replica count 1
ansible.builtin.command:
kubectl scale deployment --replicas=1 -n {{ telemetry_namespace }} karavi-observability-cert-manager
delegate_to: "{{ kube_vip }}"
failed_when: false
changed_when: false

- name: Scale up karavi-observability-cert-manager-cainjector to replica count 1
ansible.builtin.command:
kubectl scale deployment --replicas=1 -n {{ telemetry_namespace }} karavi-observability-cert-manager-cainjector
delegate_to: "{{ kube_vip }}"
failed_when: false
changed_when: false

- name: Scale up karavi-observability-cert-manager-webhook to replica count 1
ansible.builtin.command:
kubectl scale deployment --replicas=1 -n {{ telemetry_namespace }} karavi-observability-cert-manager-webhook
delegate_to: "{{ kube_vip }}"
failed_when: false
changed_when: false

- name: Scale up karavi-metrics-powerscale to replica count 1
ansible.builtin.command:
kubectl scale deployment --replicas=1 -n {{ telemetry_namespace }} karavi-metrics-powerscale
delegate_to: "{{ kube_vip }}"
failed_when: false
changed_when: false

- name: Scale up csi-volume-exporter to replica count 1
ansible.builtin.command:
kubectl scale deployment --replicas=1 -n {{ telemetry_namespace }} csi-volume-exporter
delegate_to: "{{ kube_vip }}"
failed_when: false
changed_when: false

- name: Wait for csi-volume-exporter to be ready
ansible.builtin.command:
kubectl wait deployment csi-volume-exporter -n {{ telemetry_namespace }} --for condition=available --timeout=5m
delegate_to: "{{ kube_vip }}"
failed_when: false
changed_when: false

- name: Scale up OTEL Collector to replica count 1
ansible.builtin.command:
kubectl scale deployment --replicas=1 -n {{ telemetry_namespace }} otel-collector
delegate_to: "{{ kube_vip }}"
failed_when: false
changed_when: false

- name: Wait for OTEL Collector to be ready
ansible.builtin.command:
kubectl wait deployment otel-collector -n {{ telemetry_namespace }} --for condition=available --timeout=5m
delegate_to: "{{ kube_vip }}"
failed_when: false
changed_when: false

- name: Display PowerScale metric workloads scaled up
ansible.builtin.debug:
msg: "{{ powerscale_metrics_scaled_up_msg }}"
37 changes: 37 additions & 0 deletions telemetry/roles/telemetry_enable/tasks/main.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
# Copyright 2026 Dell Inc. or its subsidiaries. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
---

- name: Prerequisite setup
tags: always
block:
- name: Fail if no tags provided
ansible.builtin.fail:
msg: "{{ tags_required_msg }}"
when: ansible_run_tags | default(['all']) | length == 1 and 'all' in ansible_run_tags | default(['all'])

- name: Load telemetry configuration
ansible.builtin.include_vars:
file: "{{ telemetry_config_file }}"

- name: Load HA configuration
ansible.builtin.include_tasks: "{{ playbook_dir }}/../common/tasks/common/load_ha_config.yml"

- name: Enable PowerScale metrics
tags:
- powerscale
when: kube_vip is defined and kube_vip | length > 0
block:
- name: Enable PowerScale metrics
ansible.builtin.include_tasks: enable_powerscale_metrics.yml
17 changes: 17 additions & 0 deletions telemetry/roles/telemetry_enable/vars/main.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
# Copyright 2026 Dell Inc. or its subsidiaries. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
---

tags_required_msg: "No tags provided. Please run this playbook with the --tags flag. Example: ansible-playbook telemetry_enable.yml --tags powerscale"
powerscale_metrics_scaled_up_msg: "PowerScale metrics workloads have been scaled up"
Loading
Loading