Skip to content

Commit eb788cd

Browse files
authored
make tags first class node owner (juanfont#2885)
This PR changes tags to be something that exists on nodes in addition to users, to being its own thing. It is part of moving our tags support towards the correct tailscale compatible implementation. There are probably rough edges in this PR, but the intention is to get it in, and then start fixing bugs from 0.28.0 milestone (long standing tags issue) to discover what works and what doesnt. Updates juanfont#2417 Closes juanfont#2619
1 parent 705b239 commit eb788cd

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

49 files changed

+3090
-745
lines changed

.github/workflows/test-integration.yaml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -55,11 +55,12 @@ jobs:
5555
- TestPreAuthKeyCorrectUserLoggedInCommand
5656
- TestApiKeyCommand
5757
- TestNodeTagCommand
58+
- TestTaggedNodeRegistration
59+
- TestTagPersistenceAcrossRestart
5860
- TestNodeAdvertiseTagCommand
5961
- TestNodeCommand
6062
- TestNodeExpireCommand
6163
- TestNodeRenameCommand
62-
- TestNodeMoveCommand
6364
- TestPolicyCommand
6465
- TestPolicyBrokenConfigCommand
6566
- TestDERPVerifyEndpoint

AGENTS.md

Lines changed: 336 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -237,6 +237,21 @@ headscale/
237237
- `policy.go`: Policy storage and retrieval
238238
- Schema migrations in `schema.sql` with extensive test data coverage
239239

240+
**CRITICAL DATABASE MIGRATION RULES**:
241+
242+
1. **NEVER reorder existing migrations** - Migration order is immutable once committed
243+
2. **ONLY add new migrations to the END** of the migrations array
244+
3. **NEVER disable foreign keys** in new migrations - no new migrations should be added to `migrationsRequiringFKDisabled`
245+
4. **Migration ID format**: `YYYYMMDDHHSS-short-description` (timestamp + descriptive suffix)
246+
- Example: `202511131500-add-user-roles`
247+
- The timestamp must be chronologically ordered
248+
5. **New migrations go after the comment** "As of 2025-07-02, no new IDs should be added here"
249+
6. If you need to rename a column that other migrations depend on:
250+
- Accept that the old column name will exist in intermediate migration states
251+
- Update code to work with the new column name
252+
- Let AutoMigrate create the new column if needed
253+
- Do NOT try to rename columns that later migrations reference
254+
240255
**Policy Engine (`hscontrol/policy/`)**
241256

242257
- `policy.go`: Core ACL evaluation logic, HuJSON parsing
@@ -687,6 +702,326 @@ assert.EventuallyWithT(t, func(c *assert.CollectT) {
687702
}, 10*time.Second, 500*time.Millisecond, "mixed operations")
688703
```
689704

705+
## Tags-as-Identity Architecture
706+
707+
### Overview
708+
709+
Headscale implements a **tags-as-identity** model where tags and user ownership are mutually exclusive ways to identify nodes. This is a fundamental architectural principle that affects node registration, ownership, ACL evaluation, and API behavior.
710+
711+
### Core Principle: Tags XOR User Ownership
712+
713+
Every node in Headscale is **either** tagged **or** user-owned, never both:
714+
715+
- **Tagged Nodes**: Ownership is defined by tags (e.g., `tag:server`, `tag:database`)
716+
- Tags are set during registration via tagged PreAuthKey
717+
- Tags are immutable after registration (cannot be changed via API)
718+
- May have `UserID` set for "created by" tracking, but ownership is via tags
719+
- Identified by: `node.IsTagged()` returns `true`
720+
721+
- **User-Owned Nodes**: Ownership is defined by user assignment
722+
- Registered via OIDC, web auth, or untagged PreAuthKey
723+
- Node belongs to a specific user's namespace
724+
- No tags (empty tags array)
725+
- Identified by: `node.UserID().Valid() && !node.IsTagged()`
726+
727+
### Critical Implementation Details
728+
729+
#### Node Identification Methods
730+
731+
```go
732+
// Primary methods for determining node ownership
733+
node.IsTagged() // Returns true if node has tags OR AuthKey.Tags
734+
node.HasTag(tag) // Returns true if node has specific tag
735+
node.IsUserOwned() // Returns true if UserID set AND not tagged
736+
737+
// IMPORTANT: UserID can be set on tagged nodes for tracking!
738+
// Always use IsTagged() to determine actual ownership, not just UserID.Valid()
739+
```
740+
741+
#### UserID Field Semantics
742+
743+
**Critical distinction**: `UserID` has different meanings depending on node type:
744+
745+
- **Tagged nodes**: `UserID` is optional "created by" tracking
746+
- Indicates which user created the tagged PreAuthKey
747+
- Does NOT define ownership (tags define ownership)
748+
- Example: User "alice" creates tagged PreAuthKey with `tag:server`, node gets `UserID=alice.ID` + `Tags=["tag:server"]`
749+
750+
- **User-owned nodes**: `UserID` defines ownership
751+
- Required field for non-tagged nodes
752+
- Defines which user namespace the node belongs to
753+
- Example: User "bob" registers via OIDC, node gets `UserID=bob.ID` + `Tags=[]`
754+
755+
#### Mapper Behavior (mapper/tail.go)
756+
757+
The mapper converts internal nodes to Tailscale protocol format, handling the TaggedDevices special user:
758+
759+
```go
760+
// From mapper/tail.go:102-116
761+
User: func() tailcfg.UserID {
762+
// IMPORTANT: Tags-as-identity model
763+
// Tagged nodes ALWAYS use TaggedDevices user, even if UserID is set
764+
if node.IsTagged() {
765+
return tailcfg.UserID(int64(types.TaggedDevices.ID))
766+
}
767+
// User-owned nodes: use the actual user ID
768+
return tailcfg.UserID(int64(node.UserID().Get()))
769+
}()
770+
```
771+
772+
**TaggedDevices constant** (`types.TaggedDevices.ID = 2147455555`): Special user ID for all tagged nodes in MapResponse protocol.
773+
774+
#### Registration Flow
775+
776+
**Tagged Node Registration** (via tagged PreAuthKey):
777+
778+
1. User creates PreAuthKey with tags: `pak.Tags = ["tag:server"]`
779+
2. Node registers with PreAuthKey
780+
3. Node gets: `Tags = ["tag:server"]`, `UserID = user.ID` (optional tracking), `AuthKeyID = pak.ID`
781+
4. `IsTagged()` returns `true` (ownership via tags)
782+
5. MapResponse sends `User = TaggedDevices.ID`
783+
784+
**User-Owned Node Registration** (via OIDC/web/untagged PreAuthKey):
785+
786+
1. User authenticates or uses untagged PreAuthKey
787+
2. Node registers
788+
3. Node gets: `Tags = []`, `UserID = user.ID` (required)
789+
4. `IsTagged()` returns `false` (ownership via user)
790+
5. MapResponse sends `User = user.ID`
791+
792+
#### API Validation (SetTags)
793+
794+
The SetTags gRPC API enforces tags-as-identity rules:
795+
796+
```go
797+
// From grpcv1.go:340-347
798+
// User-owned nodes are nodes with UserID that are NOT tagged
799+
isUserOwned := nodeView.UserID().Valid() && !nodeView.IsTagged()
800+
if isUserOwned && len(request.GetTags()) > 0 {
801+
return error("cannot set tags on user-owned nodes")
802+
}
803+
```
804+
805+
**Key validation rules**:
806+
807+
- ✅ Can call SetTags on tagged nodes (tags already define ownership)
808+
- ❌ Cannot set tags on user-owned nodes (would violate XOR rule)
809+
- ❌ Cannot remove all tags from tagged nodes (would orphan the node)
810+
811+
#### Database Layer (db/node.go)
812+
813+
**Tag storage**: Tags are stored in PostgreSQL ARRAY column and SQLite JSON column:
814+
815+
```sql
816+
-- From schema.sql
817+
tags TEXT[] DEFAULT '{}' NOT NULL, -- PostgreSQL
818+
tags TEXT DEFAULT '[]' NOT NULL, -- SQLite (JSON array)
819+
```
820+
821+
**Validation** (`state/tags.go`):
822+
823+
- `validateNodeOwnership()`: Enforces tags XOR user rule
824+
- `validateAndNormalizeTags()`: Validates tag format (`tag:name`) and uniqueness
825+
826+
#### Policy Layer
827+
828+
**Tag Ownership** (policy/v2/policy.go):
829+
830+
```go
831+
func NodeCanHaveTag(node types.NodeView, tag string) bool {
832+
// Checks if node's IP is in the tagOwnerMap IP set
833+
// This is IP-based authorization, not UserID-based
834+
if ips, ok := pm.tagOwnerMap[Tag(tag)]; ok {
835+
if slices.ContainsFunc(node.IPs(), ips.Contains) {
836+
return true
837+
}
838+
}
839+
return false
840+
}
841+
```
842+
843+
**Important**: Tag authorization is based on IP ranges in ACL, not UserID. Tags define identity, ACL authorizes that identity.
844+
845+
### Testing Tags-as-Identity
846+
847+
**Unit Tests** (`hscontrol/types/node_tags_test.go`):
848+
849+
- `TestNodeIsTagged`: Validates IsTagged() for various scenarios
850+
- `TestNodeOwnershipModel`: Tests tags XOR user ownership
851+
- `TestUserTypedID`: Helper method validation
852+
853+
**API Tests** (`hscontrol/grpcv1_test.go`):
854+
855+
- `TestSetTags_UserXORTags`: Validates rejection of setting tags on user-owned nodes
856+
- `TestSetTags_TaggedNode`: Validates that tagged nodes (even with UserID) are not rejected
857+
858+
**Auth Tests** (`hscontrol/auth_test.go:890-928`):
859+
860+
- Tests node registration with tagged PreAuthKey
861+
- Validates tags are applied during registration
862+
863+
### Common Pitfalls
864+
865+
1. **Don't check only `UserID.Valid()` to determine user ownership**
866+
- ❌ Wrong: `if node.UserID().Valid() { /* user-owned */ }`
867+
- ✅ Correct: `if node.UserID().Valid() && !node.IsTagged() { /* user-owned */ }`
868+
869+
2. **Don't assume tagged nodes never have UserID set**
870+
- Tagged nodes MAY have UserID for "created by" tracking
871+
- Always use `IsTagged()` to determine ownership type
872+
873+
3. **Don't allow setting tags on user-owned nodes**
874+
- This violates the tags XOR user principle
875+
- Use API validation to prevent this
876+
877+
4. **Don't forget TaggedDevices in mapper**
878+
- All tagged nodes MUST use `TaggedDevices.ID` in MapResponse
879+
- User ID is only for actual user-owned nodes
880+
881+
### Migration Considerations
882+
883+
When nodes transition between ownership models:
884+
885+
- **No automatic migration**: Tags-as-identity is set at registration and immutable
886+
- **Re-registration required**: To change from user-owned to tagged (or vice versa), node must be deleted and re-registered
887+
- **UserID persistence**: UserID on tagged nodes is informational and not cleared
888+
889+
### Architecture Benefits
890+
891+
The tags-as-identity model provides:
892+
893+
1. **Clear ownership semantics**: No ambiguity about who/what owns a node
894+
2. **ACL simplicity**: Tag-based access control without user conflicts
895+
3. **API safety**: Validation prevents invalid ownership states
896+
4. **Protocol compatibility**: TaggedDevices special user aligns with Tailscale's model
897+
898+
## Logging Patterns
899+
900+
### Incremental Log Event Building
901+
902+
When building log statements with multiple fields, especially with conditional fields, use the **incremental log event pattern** instead of long single-line chains. This improves readability and allows conditional field addition.
903+
904+
**Pattern:**
905+
906+
```go
907+
// GOOD: Incremental building with conditional fields
908+
logEvent := log.Debug().
909+
Str("node", node.Hostname).
910+
Str("machine_key", node.MachineKey.ShortString()).
911+
Str("node_key", node.NodeKey.ShortString())
912+
913+
if node.User != nil {
914+
logEvent = logEvent.Str("user", node.User.Username())
915+
} else if node.UserID != nil {
916+
logEvent = logEvent.Uint("user_id", *node.UserID)
917+
} else {
918+
logEvent = logEvent.Str("user", "none")
919+
}
920+
921+
logEvent.Msg("Registering node")
922+
```
923+
924+
**Key rules:**
925+
926+
1. **Assign chained calls back to the variable**: `logEvent = logEvent.Str(...)` - zerolog methods return a new event, so you must capture the return value
927+
2. **Use for conditional fields**: When fields depend on runtime conditions, build incrementally
928+
3. **Use for long log lines**: When a log line exceeds ~100 characters, split it for readability
929+
4. **Call `.Msg()` at the end**: The final `.Msg()` or `.Msgf()` sends the log event
930+
931+
**Anti-pattern to avoid:**
932+
933+
```go
934+
// BAD: Long single-line chains are hard to read and can't have conditional fields
935+
log.Debug().Caller().Str("node", node.Hostname).Str("machine_key", node.MachineKey.ShortString()).Str("node_key", node.NodeKey.ShortString()).Str("user", node.User.Username()).Msg("Registering node")
936+
937+
// BAD: Forgetting to assign the return value (field is lost!)
938+
logEvent := log.Debug().Str("node", node.Hostname)
939+
logEvent.Str("user", username) // This field is LOST - not assigned back
940+
logEvent.Msg("message") // Only has "node" field
941+
```
942+
943+
**When to use this pattern:**
944+
945+
- Log statements with 4+ fields
946+
- Any log with conditional fields
947+
- Complex logging in loops or error handling
948+
- When you need to add context incrementally
949+
950+
**Example from codebase** (`hscontrol/db/node.go`):
951+
952+
```go
953+
logEvent := log.Debug().
954+
Str("node", node.Hostname).
955+
Str("machine_key", node.MachineKey.ShortString()).
956+
Str("node_key", node.NodeKey.ShortString())
957+
958+
if node.User != nil {
959+
logEvent = logEvent.Str("user", node.User.Username())
960+
} else if node.UserID != nil {
961+
logEvent = logEvent.Uint("user_id", *node.UserID)
962+
} else {
963+
logEvent = logEvent.Str("user", "none")
964+
}
965+
966+
logEvent.Msg("Registering test node")
967+
```
968+
969+
### Avoiding Log Helper Functions
970+
971+
Prefer the incremental log event pattern over creating helper functions that return multiple logging closures. Helper functions like `logPollFunc` create unnecessary indirection and allocate closures.
972+
973+
**Instead of:**
974+
975+
```go
976+
// AVOID: Helper function returning closures
977+
func logPollFunc(req tailcfg.MapRequest, node *types.Node) (
978+
func(string, ...any), // warnf
979+
func(string, ...any), // infof
980+
func(string, ...any), // tracef
981+
func(error, string, ...any), // errf
982+
) {
983+
return func(msg string, a ...any) {
984+
log.Warn().
985+
Caller().
986+
Bool("omitPeers", req.OmitPeers).
987+
Bool("stream", req.Stream).
988+
Uint64("node.id", node.ID.Uint64()).
989+
Str("node.name", node.Hostname).
990+
Msgf(msg, a...)
991+
},
992+
// ... more closures
993+
}
994+
```
995+
996+
**Prefer:**
997+
998+
```go
999+
// BETTER: Build log events inline with shared context
1000+
func (m *mapSession) logTrace(msg string) {
1001+
log.Trace().
1002+
Caller().
1003+
Bool("omitPeers", m.req.OmitPeers).
1004+
Bool("stream", m.req.Stream).
1005+
Uint64("node.id", m.node.ID.Uint64()).
1006+
Str("node.name", m.node.Hostname).
1007+
Msg(msg)
1008+
}
1009+
1010+
// Or use incremental building for complex cases
1011+
logEvent := log.Trace().
1012+
Caller().
1013+
Bool("omitPeers", m.req.OmitPeers).
1014+
Bool("stream", m.req.Stream).
1015+
Uint64("node.id", m.node.ID.Uint64()).
1016+
Str("node.name", m.node.Hostname)
1017+
1018+
if additionalContext {
1019+
logEvent = logEvent.Str("extra", value)
1020+
}
1021+
1022+
logEvent.Msg("Operation completed")
1023+
```
1024+
6901025
## Important Notes
6911026

6921027
- **Dependencies**: Use `nix develop` for consistent toolchain (Go, buf, protobuf tools, linting)
@@ -697,3 +1032,4 @@ assert.EventuallyWithT(t, func(c *assert.CollectT) {
6971032
- **Integration Tests**: Require Docker and can consume significant disk space - use headscale-integration-tester agent
6981033
- **Performance**: NodeStore optimizations are critical for scale - be careful with changes to state management
6991034
- **Quality Assurance**: Always use appropriate specialized agents for testing and validation tasks
1035+
- **Tags-as-Identity**: Tags and user ownership are mutually exclusive - always use `IsTagged()` to determine ownership

CHANGELOG.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,10 @@ at creation time. When listing keys, only the prefix is shown (e.g.,
2121
`hskey-auth-{prefix}-{secret}`. Legacy plaintext keys continue to work for
2222
backwards compatibility.
2323

24+
### Tags
25+
26+
Tags are now implemented following the Tailscale model where tags and user ownership are mutually exclusive. Devices can be either user-owned (authenticated via web/OIDC) or tagged (authenticated via tagged PreAuthKeys). Tagged devices receive their identity from tags rather than users, making them suitable for servers and infrastructure. Applying a tag to a device removes user-based authentication. See the [Tailscale tags documentation](https://tailscale.com/kb/1068/tags) for details on how tags work.
27+
2428
### Database migration support removed for pre-0.25.0 databases
2529

2630
Headscale no longer supports direct upgrades from databases created before
@@ -30,6 +34,8 @@ release.
3034

3135
### BREAKING
3236

37+
- **Tags**: The gRPC `SetTags` endpoint now allows converting user-owned nodes to tagged nodes by setting tags. Once a node is tagged, it cannot be converted back to a user-owned node.
38+
3339
- Database migration support removed for pre-0.25.0 databases [#2883](https://github.com/juanfont/headscale/pull/2883)
3440
- If you are running a version older than 0.25.0, you must upgrade to 0.25.1 first, then upgrade to this release
3541
- See the [upgrade path documentation](https://headscale.net/stable/about/faq/#what-is-the-recommended-update-path-can-i-skip-multiple-versions-while-updating) for detailed guidance

0 commit comments

Comments
 (0)