Skip to content

Commit 387ad86

Browse files
committed
docs: add RULES
1 parent 6311a38 commit 387ad86

File tree

1 file changed

+218
-0
lines changed

1 file changed

+218
-0
lines changed

RULES.md

Lines changed: 218 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,218 @@
1+
# Rules
2+
3+
Opinionated rules for creating CDEvents and transformers.
4+
5+
## CDEvents Best Practices
6+
7+
Choosing good values for key fields improves observability, event correlation, and entity tracking.
8+
9+
Always follow the [official CDEvents specification](https://github.com/cdevents/spec/blob/main/spec.md).
10+
11+
### context.source - Event Origin
12+
13+
#### Official definition
14+
15+
Extract from [context.source](https://github.com/cdevents/spec/blob/main/spec.md#source-context)
16+
17+
> Type: URI-Reference
18+
> Description: defines the context in which an event happened. The main purpose of the source is to provide global uniqueness for source + id.
19+
> The source MAY identify a single producer or a group of producer that belong to the same application.
20+
> When selecting the format for the source, it may be useful to think about how clients may use it. Using the root use cases as reference:
21+
>
22+
> - A client may want to react only to events sent by a specific service, like the instance of Tekton that runs in a specific cluster or the instance of Jenkins managed by team X
23+
> - A client may want to collate all events coming from a specific source for monitoring, observability or visualization purposes
24+
>
25+
> Constraints:
26+
>
27+
> - REQUIRED
28+
> - MUST be a non-empty URI-reference
29+
> - An absolute URI is RECOMMENDED
30+
31+
#### Complementary rules
32+
33+
- Use the URI of the latest service that creates or modifies the event, regardless of what triggered it (webhook, another event, etc.)
34+
- Prefer the URI of the service (or sub-service) generating the event, regardless of subject or event type
35+
- Prefer API URIs over human-facing view URIs
36+
- Use query parameters to provide additional information
37+
38+
**Why**: Allows consumers to identify where the event producer is configured
39+
40+
```yaml
41+
# ✅ Good - Specific service identifiers
42+
"source": "https://github.com/myorg/myrepo/workflow-a" # Event sent from specific workflow
43+
"source": "https://jenkins.example.com/job/job_name"
44+
"source": "https://cdviz-collector.example.com/?source=source_name" # Use query params when needed
45+
46+
# ❌ Avoid - Too generic, conflicts in larger scopes
47+
"source": "github.com/myorg/myrepo"
48+
"source": "myrepo"
49+
```
50+
51+
### subject.id - Event Subject Identifier
52+
53+
#### Official definition
54+
55+
Extract from [subject.id](https://github.com/cdevents/spec/blob/main/spec.md#id-subject):
56+
57+
> Identifier for a subject. Subsequent events associated to the same subject MUST use the same subject id.
58+
> Constraints:
59+
>
60+
> - REQUIRED
61+
> - MUST be a non-empty string
62+
> - MUST be unique within the given source (in the scope of the producer)
63+
64+
#### Complementary rules
65+
66+
Use **unique, hierarchical identifiers** scoped to your organization or globally.
67+
68+
- Use a URI (URL, PURL, or absolute path starting with `/`)
69+
- Prefer API URIs over human-facing view URIs
70+
- **DO NOT use `subject.source`** - it's confusing and optional. Instead, make `subject.id` globally unique and let `context.source` identify the event origin
71+
72+
**Why**:
73+
74+
- The ID should be a standalone identifier that can be used as a reference or link in any context
75+
- Manipulating a single `id` field is simpler than managing `id` + optional `source`
76+
77+
```yaml
78+
# ✅ Good - Globally unique, hierarchical, semantic
79+
"subject.id": "/namespace/my-service"
80+
"subject.id": "/cluster/us-1/staging"
81+
"subject.id": "https://github.com/org-id/repo-id/workflow-id/run-id"
82+
"subject.id": "https://jenkins.example.com/job/job_name/"
83+
84+
# ❌ Avoid - Not globally unique or too generic
85+
"subject.id": "550e8400-e29b-41d4-a716-446655440000" # UUID
86+
"subject.id": "run-12345" # Not globally unique
87+
"subject.id": "production" # Too generic, not a path
88+
```
89+
90+
### environment.id - Deployment Environment
91+
92+
Follow the same rules as `subject.id` since `environment.id` is a reference to an environment subject. However, often:
93+
94+
- The subject/system doesn't know its environment, so this information isn't in the source event
95+
- Environments may lack clear URIs or scopes (VPC, Kubernetes cluster, region, etc.)
96+
97+
Guidelines:
98+
99+
- Define `environment.id` as an absolute path starting with `/`
100+
- Use your organization name for consistency
101+
- Be consistent across all apps and configurations - use the same naming convention
102+
- Use hierarchical paths like `/level/region/owner` ordered from most to least stable
103+
- Consider how you want to group data in dashboards and reports
104+
105+
**Why**: Enables environment-level dashboards, filtering, and alerts.
106+
107+
```yaml
108+
"environment": {"id": "/production"}
109+
"environment": {"id": "/pro"}
110+
"environment": {"id": "/pro/us-1/cluster-33"}
111+
"environment": {"id": "/staging"}
112+
"environment": {"id": "/dev/ephemeral-42"}
113+
```
114+
115+
### artifactId - Package URL (PURL)
116+
117+
- Follow the same rules as `subject.id` since `artifactId` is a reference to an artifact subject
118+
- Follow the [Package URL specification](https://github.com/package-url/purl-spec) for your artifact type
119+
- Use the appropriate type if supported, otherwise fallback to `generic` (official CDEvents requirement)
120+
121+
**Why**: Enables universal artifact identification, dependency tracking, and interoperability with other tools
122+
123+
**Common Patterns**:
124+
125+
```yaml
126+
# OCI images (Docker/container registries)
127+
# Note: OCI type doesn't support namespace - use query params for registry/repo
128+
"artifactId": "pkg:oci/my-app@sha256:abc123def456...?repository_url=ghcr.io/myorg/my-app&tag=v1.2.3"
129+
"artifactId": "pkg:oci/nginx@sha256:def456abc123...?repository_url=docker.io/library/nginx&tag=latest"
130+
131+
# NPM packages
132+
"artifactId": "pkg:npm/[email protected]"
133+
134+
# Maven artifacts
135+
"artifactId": "pkg:maven/org.springframework/[email protected]"
136+
137+
# Generic packages
138+
"artifactId": "pkg:generic/[email protected]"
139+
```
140+
141+
**Common Pitfalls**:
142+
143+
- **Digest vs Tag**: Use digest (`@sha256:...`) for immutability - this is the image digest, NOT the source code commit SHA
144+
- **Version Semantics**: For OCI, the version is the image digest, not the git commit that built it
145+
- **OCI Namespace Limitation**: `pkg:oci/` does NOT support namespace in the path - use `repository_url` query parameter
146+
- **Registry Encoding**: OCI requires `repository_url` query parameter; other types encode registries differently
147+
- **Type-Specific Rules**: Each PURL type has unique encoding rules - consult the specification
148+
149+
## Rules for Transformers
150+
151+
### Use metadata for transformer chaining
152+
153+
- Use `metadata` to transfer information between transformers
154+
- Use `metadata` from extractors to initialize information (not available with the `transform` subcommand)
155+
- Use the first transformer to initialize information when:
156+
- Not possible via extractor (pre-0.19 or `transform` subcommand)
157+
- Sharing information/transformers between multiple sources and transformer chains
158+
159+
Example of "first" transformer:
160+
161+
```toml
162+
[transformers.init_metadata]
163+
type = "vrl"
164+
template = """
165+
.metadata = object(.metadata) ?? {}
166+
167+
[{
168+
"metadata": merge(.metadata, {
169+
"environment_id": "cluster/A-dev",
170+
}),
171+
"headers": .headers,
172+
"body": .body,
173+
}]
174+
"""
175+
```
176+
177+
### Automatic `context.id` generation
178+
179+
- Let cdviz-collector generate `context.id` by setting it to `"0"`
180+
- Do NOT omit `context.id` to generate valid cdevents as output
181+
- Do NOT reuse IDs from incoming events (webhooks, Kafka messages, etc.)
182+
- **Exception**: Keep `context.id` when the transformer's purpose is NOT to create a new CDEvent (filtering, normalizing, validating, or adding customData)
183+
184+
**Why**:
185+
186+
- Ensures content-based deduplication
187+
- Enables reproducible, deterministic IDs for testing
188+
189+
### `context.timestamp` generation
190+
191+
- Extract timestamp from input data (events, files) when available
192+
- Avoid `now()` or automatic timestamps for reproducibility
193+
194+
**Why**:
195+
196+
- Creates reproducible output for the same input
197+
- Ensures the same automatic ID generation, enabling reliable testing with transform CLI
198+
199+
### Define `context.source`
200+
201+
As defined in the CDEvents rules above, `context.source` should be the URI of the cdviz-collector service that creates or modifies the event.
202+
203+
The value depends on cdviz-collector's running mode and external address:
204+
205+
- **`connect` mode (server)**: Use the cdviz-collector URI with `source` as a query parameter
206+
- **`send` mode**: Use the URL of the triggering system (pipeline, workflow, etc.)
207+
- **`transform` mode**: Use `http://cdviz-collector.example.com?source=cli-transform`
208+
209+
To simplify development, cdviz-collector provides a suggested value in metadata. Transformers may use or override it.
210+
211+
- Customize the URL using `http.root_url` in `cdviz-collector.toml` (default: `http://cdviz-collector.example.com`)
212+
213+
### Use `customData` for source-specific information
214+
215+
- Use `customData` to preserve complementary information not covered by CDEvents standard fields
216+
- Structure as a JSON object with the source name at the first level (`github`, `argocd`, etc.)
217+
- For webhook events, mirror the original event structure under the first level (can be complete or filtered)
218+
- Additional first-level keys may be added for information useful to other consumers

0 commit comments

Comments
 (0)