Skip to content

Enable GIVC and logging for nvidia targets#1734

Open
vadika wants to merge 1 commit intotiiuae:mainfrom
vadika:givc-logging-nvidia
Open

Enable GIVC and logging for nvidia targets#1734
vadika wants to merge 1 commit intotiiuae:mainfrom
vadika:givc-logging-nvidia

Conversation

@vadika
Copy link
Contributor

@vadika vadika commented Feb 9, 2026

Enable GIVC and logging for nvidia targets

Description of Changes

Type of Change

  • New Feature
  • Bug Fix
  • Improvement / Refactor

Related Issues / Tickets

Checklist

  • Clear summary in PR description
  • Detailed and meaningful commit message(s)
  • Commits are logically organized and squashed if appropriate
  • Contribution guidelines followed
  • Ghaf documentation updated with the commit - https://tiiuae.github.io/ghaf/
  • Author has run make-checks and it passes
  • All automatic GitHub Action checks pass - see actions
  • Author has added reviewers and removed PR draft status

Testing Instructions

Applicable Targets

  • Orin AGX aarch64
  • Orin NX aarch64
  • Lenovo X1 x86_64
  • Dell Latitude x86_64
  • System 76 x86_64

Installation Method

  • Requires full re-installation
  • Can be updated with nixos-rebuild ... switch
  • Other:

Test Steps To Verify:

  1. ...

@vadika vadika force-pushed the givc-logging-nvidia branch from cef9b17 to b56afb9 Compare February 12, 2026 13:33
@vadika vadika added the Needs Testing CI Team to pre-verify label Feb 12, 2026
@leivos-unikie
Copy link
Contributor

leivos-unikie commented Feb 13, 2026

Tested on Orin AGX (flashed qspi, flashed image to USB SSD and booted from USB SSD)

GIVC part seems to work ok:

[ghaf@ghaf-host:~]$ systemctl -l | grep givc
  givc-ghaf-host.service                                                                                                      loaded active running   GIVC remote service manager for the host.

[ghaf@ghaf-1404711486:~]$ systemctl -l | grep givc
  etc-givc.mount                                                                          loaded active mounted   /etc/givc
  givc-dbusproxy-system.service                                                           loaded active running   GIVC local xdg-dbus-proxy system service
  givc-net-vm.service                                                                     loaded active running   GIVC remote service manager for system VMs
  givc-setup.target                                                                       loaded active active    Ghaf givc target

[ghaf@admin-vm:~]$ systemctl -l | grep givc
  etc-givc.mount                                                                          loaded active mounted   /etc/givc
  etc-locale\x2dgivc.conf.mount                                                           loaded active mounted   /etc/locale-givc.conf
  givc-admin.service                                                                      loaded active running   GIVC admin module.
  • I got log forwarding to grafana working only via WiFi connection, it didn't work with eth-RJ45 or eth-adapter connection
  • I was able to see logs only from admin-vm in grafana but Vadim said he saw logs from ghaf-host and net-vm too. Could there be something wrong on grafana side?

@leivos-unikie
Copy link
Contributor

leivos-unikie commented Feb 13, 2026

Tested on Orin NX (flashed qspi)

  • givc service found only on admin-vm. No givc services found on ghaf-host or net-vm.
[ghaf@admin-vm:~]$ systemctl -l | grep givc
  etc-locale\x2dgivc.conf.mount                                                           loaded active     mounted      /etc/locale-givc.conf
  • Is this normal?
[ghaf@admin-vm:~]$ journalctl -u givc-admin.service
-- No entries --

(On AGX there was: Starting GIVC admin module.... Started GIVC admin module...)

  • I could not see any logs forwarded to grafana

@leivos-unikie leivos-unikie added bug on Orin NX Cross Issues found on NVIDIA Jetson NX Orin cross-compiled while checking this PR and removed Needs Testing CI Team to pre-verify labels Feb 13, 2026
@leivos-unikie
Copy link
Contributor

Services failing on Orin NX

[ghaf@ghaf-host:~]$ systemctl list-units --state=failed
  UNIT                      LOAD   ACTIVE SUB    DESCRIPTION     
● audit-rules-nixos.service loaded failed failed Load Audit Rules
[ghaf@admin-vm:~]$ systemctl list-units --state=failed
  UNIT                               LOAD   ACTIVE SUB    DESCRIPTION                                                  
● alloy.service                      loaded failed failed alloy.service
● audit-rules-nixos.service          loaded failed failed Load Audit Rules
● ghaf-journal-alloy-recover.service loaded failed failed Recover journald/alloy after time jump
● journal-fss-verify.service         loaded failed failed Verify systemd journal integrity using Forward Secure Sealing
● stunnel.service                    loaded failed failed stunnel TLS tunneling service
[ghaf@ghaf-0291556753:~]$ systemctl list-units --state=failed
  UNIT                      LOAD   ACTIVE SUB    DESCRIPTION     
● audit-rules-nixos.service loaded failed failed Load Audit Rules

@vadika vadika force-pushed the givc-logging-nvidia branch from 87ae663 to b46666c Compare February 16, 2026 12:30
@leivos-unikie
Copy link
Contributor

leivos-unikie commented Mar 17, 2026

Tested on Orin AGX
(nix build .#nvidia-jetson-orin-agx-debug-from-x86_64-flash-script)

GIVC services are ok. No failed services.

Checked grafana log forwarding:

  • At first connected to internet via eth adapter: grafana showed logs from admin-vm and ghaf-0784888707 (no logs from ghaf-host)
  • When connected to internet via wifi grafana showed logs from admin-vm, ghaf-0784888707 and net-vm (no logs from ghaf-host)
image

Grafana shows "net-vm" logs only some time after boot (second boot). "ghaf-0784888707" includes those logs and has also later logs.

https://ghaflogs.vedenemo.dev/explore?schemaVersion=1&panes=%7B%226gw%22:%7B%22datasource%22:%22P982945308D3682D1%22,%22queries%22:%5B%7B%22refId%22:%22A%22,%22expr%22:%22%7Bmachine%3D%5C%2200-2e-c8-73-83%5C%22,%20host%3D%5C%22ghaf-0784888707%5C%22%7D%22,%22queryType%22:%22range%22,%22datasource%22:%7B%22type%22:%22loki%22,%22uid%22:%22P982945308D3682D1%22%7D,%22editorMode%22:%22builder%22,%22direction%22:%22backward%22%7D%5D,%22range%22:%7B%22from%22:%221773698400000%22,%22to%22:%221773784799000%22%7D,%22compact%22:false%7D%7D&orgId=1

@leivos-unikie
Copy link
Contributor

leivos-unikie commented Mar 17, 2026

Tested also with Orin AGX ghaf image booted from USB SSD.

  • GIVC services are ok
  • No failed services
  • At first boot connected to internet via eth adapter: grafana showed logs from admin-vm, ghaf-1397731447 and net-vm (no logs from ghaf-host). Net-vm logs are only from a short period after boot. Same behavior with WiFi connection.

@leivos-unikie
Copy link
Contributor

leivos-unikie commented Mar 17, 2026

Checked grafana logs with Orin NX: same as with AGX, logs from admin-vm, ghaf-2161388417 and net-vm show up, no logs from ghaf-host. "net-vm" logs didn't show up at every boot.

@leivos-unikie leivos-unikie removed the Needs Testing CI Team to pre-verify label Mar 17, 2026
@vadika
Copy link
Contributor Author

vadika commented Mar 17, 2026

Checked grafana logs with Orin NX: same as with AGX, logs from admin-vm, ghaf-2161388417 and net-vm show up, no logs from ghaf-host. "net-vm" logs didn't show up at every boot.

fixed the host logs, my falut, now it works. Alloy on admin-vm logs occasional Loki 429 Too Many Requests (maximum active stream limit exceeded), but I'm not sure I can do anything about it.

@milva-unikie
Copy link

Alloy on admin-vm logs occasional Loki 429 Too Many Requests (maximum active stream limit exceeded), but I'm not sure I can do anything about it.

This issue seems to be caused by the changes in this PR.

We are already having problems with too many labels for Loki to handle, and this PR adds two more. What is the purpose of adding ghaf_node and ghaf_type labels to the logs?

This PR:
image

Mainline:
image

@vadika
Copy link
Contributor Author

vadika commented Mar 18, 2026

Alloy on admin-vm logs occasional Loki 429 Too Many Requests (maximum active stream limit exceeded), but I'm not sure I can do anything about it.

This issue seems to be caused by the changes in this PR.

We are already having problems with too many labels for Loki to handle, and this PR adds two more. What is the purpose of adding ghaf_node and ghaf_type labels to the logs?

This PR:

image

Mainline:

image

There were used during debugging, so I'll refactor them out. No special purpose, thanks for noticing.

@leivos-unikie
Copy link
Contributor

Log forwarding is now working fine for AGX and NX.

@milva-unikie
Copy link

ghaf_node and ghaf_type labels should still be removed

@vadika
Copy link
Contributor Author

vadika commented Mar 19, 2026

ghaf_node and ghaf_type labels should still be removed

they are removed for sure! forgot to push, my bad!

@leivos-unikie
Copy link
Contributor

leivos-unikie commented Mar 19, 2026

Milla noted that boot tests on Orin NX have sometimes failed for this PR. I tried the current state also manually and there seems to be some instability in internet connection via USB eth adapter. Device may get ip address but connecting fails and also running ping 8.8.8.8 on net-vm fails at these moments. Connecting directly to RJ45 eth port seems to work stable.

journaclt -f on net-vm from the time of connecting eth-USB adapter:

Mar 19 10:14:53 ghaf-1615678653 systemd-networkd[736]: enp0s10u2: Gained carrier
Mar 19 10:14:53 ghaf-1615678653 kernel: ax88179_178a 2-2:1.0 enp0s10u2: ax88179 - Link status is: 1
Mar 19 10:14:54 ghaf-1615678653 systemd[1]: NetworkManager-dispatcher.service: Deactivated successfully.
Mar 19 10:14:54 ghaf-1615678653 audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=NetworkManager-dispatcher comm="systemd" exe="/nix/store/k2s1hshmiwzmblvzkdsnknimmdlqmhvz-netvm-systemd-aarch64-unknown-linux-gnu-258.3/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Mar 19 10:14:55 ghaf-1615678653 kernel: rpfilter drop: IN=enp0s10u2 OUT= MAC=ff:ff:ff:ff:ff:ff:f4:e2:c6:55:3f:1a:08:00 SRC=172.18.8.18 DST=255.255.255.255 LEN=154 TOS=0x00 PREC=0x00 TTL=255 ID=4797 PROTO=UDP SPT=54498 DPT=10001 LEN=134 
Mar 19 10:14:55 ghaf-1615678653 systemd-networkd[736]: enp0s10u2: Gained IPv6LL
Mar 19 10:14:56 ghaf-1615678653 kernel: rpfilter drop: IN=enp0s10u2 OUT= MAC=ff:ff:ff:ff:ff:ff:f4:e2:c6:51:94:a2:08:00 SRC=172.18.8.11 DST=255.255.255.255 LEN=154 TOS=0x00 PREC=0x00 TTL=255 ID=63196 PROTO=UDP SPT=63837 DPT=10001 LEN=134 
Mar 19 10:14:57 ghaf-1615678653 kernel: rpfilter drop: IN=enp0s10u2 OUT= MAC=ff:ff:ff:ff:ff:ff:d0:21:f9:e7:fd:e5:08:00 SRC=172.18.8.14 DST=255.255.255.255 LEN=154 TOS=0x00 PREC=0x00 TTL=255 ID=55708 PROTO=UDP SPT=63408 DPT=10001 LEN=134 
Mar 19 10:14:57 ghaf-1615678653 kernel: rpfilter drop: IN=enp0s10u2 OUT= MAC=ff:ff:ff:ff:ff:ff:54:af:97:b7:47:a8:08:00 SRC=192.168.0.1 DST=255.255.255.255 LEN=307 TOS=0x00 PREC=0x00 TTL=64 ID=31501 DF PROTO=UDP SPT=68 DPT=67 LEN=287 
Mar 19 10:14:58 ghaf-1615678653 kernel: rpfilter drop: IN=enp0s10u2 OUT= MAC=ff:ff:ff:ff:ff:ff:70:a7:41:c1:f3:a1:08:00 SRC=172.18.8.15 DST=255.255.255.255 LEN=154 TOS=0x00 PREC=0x00 TTL=255 ID=40373 PROTO=UDP SPT=53745 DPT=10001 LEN=134 
Mar 19 10:15:00 ghaf-1615678653 kernel: rpfilter drop: IN=enp0s10u2 OUT= MAC=ff:ff:ff:ff:ff:ff:f4:e2:c6:55:3f:1a:08:00 SRC=172.18.8.18 DST=255.255.255.255 LEN=154 TOS=0x00 PREC=0x00 TTL=255 ID=4800 PROTO=UDP SPT=54501 DPT=10001 LEN=134 
Mar 19 10:15:02 ghaf-1615678653 kernel: rpfilter drop: IN=enp0s10u2 OUT= MAC=ff:ff:ff:ff:ff:ff:f4:e2:c6:51:94:a2:08:00 SRC=172.18.8.11 DST=255.255.255.255 LEN=154 TOS=0x00 PREC=0x00 TTL=255 ID=63199 PROTO=UDP SPT=63840 DPT=10001 LEN=134 
Mar 19 10:15:02 ghaf-1615678653 kernel: rpfilter drop: IN=enp0s10u2 OUT= MAC=ff:ff:ff:ff:ff:ff:d0:21:f9:e7:fd:e5:08:00 SRC=172.18.8.14 DST=255.255.255.255 LEN=154 TOS=0x00 PREC=0x00 TTL=255 ID=55711 PROTO=UDP SPT=63411 DPT=10001 LEN=134 

@vadika
Copy link
Contributor Author

vadika commented Mar 19, 2026

Milla noted that boot tests on Orin NX have sometimes failed for this PR. I tried the current state also manually and there seems to be some instability in internet connection via USB eth adapter. Device may get ip address but connecting fails and also running ping 8.8.8.8 on net-vm fails at these moments. Connecting directly to RJ45 eth port seems to work stable.

journaclt -f on net-vm from the time of connecting eth-USB adapter:

Mar 19 10:14:53 ghaf-1615678653 systemd-networkd[736]: enp0s10u2: Gained carrier
Mar 19 10:14:53 ghaf-1615678653 kernel: ax88179_178a 2-2:1.0 enp0s10u2: ax88179 - Link status is: 1
Mar 19 10:14:54 ghaf-1615678653 systemd[1]: NetworkManager-dispatcher.service: Deactivated successfully.
Mar 19 10:14:54 ghaf-1615678653 audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=NetworkManager-dispatcher comm="systemd" exe="/nix/store/k2s1hshmiwzmblvzkdsnknimmdlqmhvz-netvm-systemd-aarch64-unknown-linux-gnu-258.3/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Mar 19 10:14:55 ghaf-1615678653 kernel: rpfilter drop: IN=enp0s10u2 OUT= MAC=ff:ff:ff:ff:ff:ff:f4:e2:c6:55:3f:1a:08:00 SRC=172.18.8.18 DST=255.255.255.255 LEN=154 TOS=0x00 PREC=0x00 TTL=255 ID=4797 PROTO=UDP SPT=54498 DPT=10001 LEN=134 
Mar 19 10:14:55 ghaf-1615678653 systemd-networkd[736]: enp0s10u2: Gained IPv6LL
Mar 19 10:14:56 ghaf-1615678653 kernel: rpfilter drop: IN=enp0s10u2 OUT= MAC=ff:ff:ff:ff:ff:ff:f4:e2:c6:51:94:a2:08:00 SRC=172.18.8.11 DST=255.255.255.255 LEN=154 TOS=0x00 PREC=0x00 TTL=255 ID=63196 PROTO=UDP SPT=63837 DPT=10001 LEN=134 
Mar 19 10:14:57 ghaf-1615678653 kernel: rpfilter drop: IN=enp0s10u2 OUT= MAC=ff:ff:ff:ff:ff:ff:d0:21:f9:e7:fd:e5:08:00 SRC=172.18.8.14 DST=255.255.255.255 LEN=154 TOS=0x00 PREC=0x00 TTL=255 ID=55708 PROTO=UDP SPT=63408 DPT=10001 LEN=134 
Mar 19 10:14:57 ghaf-1615678653 kernel: rpfilter drop: IN=enp0s10u2 OUT= MAC=ff:ff:ff:ff:ff:ff:54:af:97:b7:47:a8:08:00 SRC=192.168.0.1 DST=255.255.255.255 LEN=307 TOS=0x00 PREC=0x00 TTL=64 ID=31501 DF PROTO=UDP SPT=68 DPT=67 LEN=287 
Mar 19 10:14:58 ghaf-1615678653 kernel: rpfilter drop: IN=enp0s10u2 OUT= MAC=ff:ff:ff:ff:ff:ff:70:a7:41:c1:f3:a1:08:00 SRC=172.18.8.15 DST=255.255.255.255 LEN=154 TOS=0x00 PREC=0x00 TTL=255 ID=40373 PROTO=UDP SPT=53745 DPT=10001 LEN=134 
Mar 19 10:15:00 ghaf-1615678653 kernel: rpfilter drop: IN=enp0s10u2 OUT= MAC=ff:ff:ff:ff:ff:ff:f4:e2:c6:55:3f:1a:08:00 SRC=172.18.8.18 DST=255.255.255.255 LEN=154 TOS=0x00 PREC=0x00 TTL=255 ID=4800 PROTO=UDP SPT=54501 DPT=10001 LEN=134 
Mar 19 10:15:02 ghaf-1615678653 kernel: rpfilter drop: IN=enp0s10u2 OUT= MAC=ff:ff:ff:ff:ff:ff:f4:e2:c6:51:94:a2:08:00 SRC=172.18.8.11 DST=255.255.255.255 LEN=154 TOS=0x00 PREC=0x00 TTL=255 ID=63199 PROTO=UDP SPT=63840 DPT=10001 LEN=134 
Mar 19 10:15:02 ghaf-1615678653 kernel: rpfilter drop: IN=enp0s10u2 OUT= MAC=ff:ff:ff:ff:ff:ff:d0:21:f9:e7:fd:e5:08:00 SRC=172.18.8.14 DST=255.255.255.255 LEN=154 TOS=0x00 PREC=0x00 TTL=255 ID=55711 PROTO=UDP SPT=63411 DPT=10001 LEN=134 

Don't see any connection between PR and this behaviour. Have no ideas what to fix, probably this unstability worth bug opening and independent investigation.

@milva-unikie
Copy link

Don't see any connection between PR and this behaviour. Have no ideas what to fix, probably this unstability worth bug opening and independent investigation.

This is the only PR where we have seen this behavior. I ran the pre-merge tests 4 times, 7 / 8 of the Orin NX boots failed. Other PRs have been tested in between and they had no issues.

@vadika
Copy link
Contributor Author

vadika commented Mar 19, 2026

Don't see any connection between PR and this behaviour. Have no ideas what to fix, probably this unstability worth bug opening and independent investigation.

This is the only PR where we have seen this behavior. I ran the pre-merge tests 4 times, 7 / 8 of the Orin NX boots failed. Other PRs have been tested in between and they had no issues.

I have a candidate that breaks it -- e45a8fc

Enable Orin host log forwarding, keep ghaf-givc pinned to latest upstream main, and remove custom Alloy ghaf_node/ghaf_type labels so logs use default labels.

Signed-off-by: vadik likholetov <vadikas@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug on Orin NX Cross Issues found on NVIDIA Jetson NX Orin cross-compiled while checking this PR Tested on Orin AGX Cross This PR has been tested on NVIDIA Jetson AGX Orin cross-compiled

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants