Skip to content

feat: extend phonehome facts with pseudonymization#1034

Merged
DavidePrincipi merged 11 commits intomainfrom
feat-7829
Jan 26, 2026
Merged

feat: extend phonehome facts with pseudonymization#1034
DavidePrincipi merged 11 commits intomainfrom
feat-7829

Conversation

@DavidePrincipi
Copy link
Member

@DavidePrincipi DavidePrincipi commented Jan 22, 2026

  • Add ui_name to cluster, nodes, and modules objects
  • Node additional facts: timezone, kernel version, uptime seconds, fqdn, default route IP addresses, creation date
  • Add user_domains facts, with user/group counters and domain type information
  • Module user domain references and FQDNs (as returned by Traefik)
  • Update available flag on Node and Modules
  • Update status facts: enabled/disabled, scheduled on/off

Sensitive strings, domain and host names are pseudonymized with a random seed. The seed is generated once in cluster lifetime to ensure hash stability.

Refs NethServer/dev#7829

Send the free-text label of cluster entities.
- Initialize a random seed for string pseudonymization. Save it into
  Redis for stable results.

- Enforce pseudonymization if a subscription is not active.
Return new ansible facts as attributes of node get-facts action
response:

- fqdn (pseudonymized with new pseudo_domain() function)
- timezone
- uptime seconds
- kernel version
- Add a list of detailed user domain objects.
- Add module reference to user domains.
- Import agent.facts library under cluster and node get-facts actions
- Return user_domains facts from cluster/get-facts
- Clean up print-phonehome script
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request extends the phonehome facts collection functionality with comprehensive pseudonymization support to protect sensitive data. The implementation ensures that systems without a subscription have their sensitive information (domain names, IP addresses, hostnames) pseudonymized using a stable seed, while maintaining data utility for telemetry purposes.

Changes:

  • Added new agent.facts module implementing pseudonymization functions with MD5-based hashing
  • Extended cluster, node, and module facts with additional fields including ui_name, timezone, kernel version, uptime, FQDN, and user domain information
  • Introduced stable pseudonymization seed stored in Redis (cluster/anon_seed) to ensure consistent hashing across collection runs

Reviewed changes

Copilot reviewed 5 out of 6 changed files in this pull request and generated 10 comments.

Show a summary per file
File Description
core/imageroot/usr/local/agent/pypkg/agent/facts.py New module implementing pseudonymization functions for strings, domains, and IP addresses
core/imageroot/var/lib/nethserver/cluster/bin/print-phonehome Enhanced to collect ui_name, user_domains, FQDNs from Traefik, and module certification/update status
core/imageroot/var/lib/nethserver/cluster/actions/get-facts/50get Added user domain facts collection with counters and update schedule information
core/imageroot/var/lib/nethserver/node/actions/get-facts/50get Extended node facts with cluster_leader flag, FQDN, IP addresses, timezone, kernel version, and uptime
core/imageroot/usr/local/agent/pypkg/cluster/inventory.py Added fact_user_domain_counters function to count LDAP users and groups per domain
docs/core/database.md Documented new cluster/anon_seed Redis key for pseudonymization

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

- add update_available flag to modules and nodes
- module certification level attribute (from repo metadata)
- report cluster update disabled status and reason
- report update schedule status flag
- Extract IPv4 and IPv6 from default node IP route.
- Implement a pseudo_ip() helper that returns a stable, random hash of
  the given IP address from the global anon_seed.
- Add the list of FQDN to the application facts. FQDNs are obtained from
  Traefik instances, by looking at the name_module_map fact.

- Traefik knows an application name from set_route() and
  set_certificate() calls.
- Print a warning if the global cluster seed is unset
- Use a temporary self-generated fallback seed to calculate hashes
@DavidePrincipi DavidePrincipi marked this pull request as ready for review January 23, 2026 18:41
Extract the node creation date by looking at the birth date of some
files, created at NS8 installation time.

Using system's "stat" command because it provides better filesystem
birth date compatibility than Python 3.11.
@DavidePrincipi DavidePrincipi merged commit d6a35ac into main Jan 26, 2026
2 checks passed
@DavidePrincipi DavidePrincipi deleted the feat-7829 branch January 26, 2026 15:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant