Skip to content

Commit b17b3c1

Browse files
committed
[memory-bank] Document spicedb, scrubber and service-waiter
Tool: gitpod/catfood.gitpod.cloud
1 parent eede8c9 commit b17b3c1

File tree

5 files changed

+504
-1
lines changed

5 files changed

+504
-1
lines changed

memory-bank/activeContext.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -40,6 +40,9 @@ Key areas of focus include:
4040
- usage: Tracks, calculates, and manages workspace usage and billing
4141
- common-go: Foundational Go library providing shared utilities across services
4242
- workspacekit: Manages container setup and namespace isolation for workspaces
43+
- spicedb: Provides authorization and permission management
44+
- scrubber: Removes or masks sensitive information from data
45+
- service-waiter: Waits for services to become available
4346

4447
As work progresses, this section will continue to be updated to reflect:
4548
- Additional component documentation

memory-bank/components/scrubber.md

Lines changed: 162 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,162 @@
1+
# Scrubber Component
2+
3+
## Overview
4+
5+
The Scrubber component in Gitpod is a Go library that provides functionality for removing or masking sensitive information from data. It's designed to protect personally identifiable information (PII) and other sensitive data from being exposed in logs, error messages, and other outputs. The component offers various methods for scrubbing different types of data structures, including strings, key-value pairs, JSON, and Go structs.
6+
7+
## Purpose
8+
9+
The primary purposes of the Scrubber component are:
10+
- Remove or mask personally identifiable information (PII) from data
11+
- Protect sensitive information such as passwords, tokens, and secrets
12+
- Provide consistent data sanitization across the Gitpod platform
13+
- Support various data formats and structures
14+
- Enable customizable scrubbing rules
15+
- Reduce the risk of sensitive data exposure
16+
- Comply with privacy regulations and best practices
17+
- Facilitate safe logging and error reporting
18+
19+
## Architecture
20+
21+
The Scrubber component is structured as a Go library with several key parts:
22+
23+
1. **Core Scrubber Interface**: Defines the methods for scrubbing different types of data
24+
2. **Scrubber Implementation**: Provides the actual scrubbing functionality
25+
3. **Sanitization Functions**: Implements different sanitization strategies (redaction, hashing)
26+
4. **Configuration**: Defines what fields and patterns should be scrubbed
27+
5. **Struct Walking**: Uses reflection to traverse and scrub complex data structures
28+
29+
The component is designed to be used by other Gitpod components that need to sanitize data before logging, storing, or transmitting it.
30+
31+
## Key Features
32+
33+
### Scrubbing Methods
34+
35+
The Scrubber interface provides several methods for scrubbing different types of data:
36+
37+
1. **Value**: Scrubs a single string value using heuristics to detect sensitive data
38+
2. **KeyValue**: Scrubs a key-value pair, using the key as a hint for how to sanitize the value
39+
3. **JSON**: Scrubs a JSON structure, handling nested objects and arrays
40+
4. **Struct**: Scrubs a Go struct in-place, respecting struct tags for customization
41+
5. **DeepCopyStruct**: Creates a scrubbed deep copy of a Go struct
42+
43+
### Sanitization Strategies
44+
45+
The component implements different sanitization strategies:
46+
47+
1. **Redaction**: Replaces sensitive values with `[redacted]` or `[redacted:keyname]`
48+
2. **Hashing**: Replaces sensitive values with an MD5 hash (`[redacted:md5:hash:keyname]`)
49+
3. **URL Path Hashing**: Specially handles URLs by preserving the structure but hashing path segments
50+
51+
### Configuration
52+
53+
The scrubber is configured with several lists and patterns:
54+
55+
1. **RedactedFieldNames**: Field names whose values should be completely redacted
56+
2. **HashedFieldNames**: Field names whose values should be hashed
57+
3. **HashedURLPathsFieldNames**: Field names containing URLs whose paths should be hashed
58+
4. **HashedValues**: Regular expressions that, when matched, cause values to be hashed
59+
5. **RedactedValues**: Regular expressions that, when matched, cause values to be redacted
60+
61+
### Struct Tag Support
62+
63+
When scrubbing structs, the component respects the `scrub` struct tag:
64+
65+
- `scrub:"ignore"`: Skip scrubbing this field
66+
- `scrub:"hash"`: Hash this field's value
67+
- `scrub:"redact"`: Redact this field's value
68+
69+
### Trusted Values
70+
71+
The component supports a `TrustedValue` interface that allows marking specific values to be exempted from scrubbing:
72+
73+
```go
74+
type TrustedValue interface {
75+
IsTrustedValue()
76+
}
77+
```
78+
79+
## Usage Patterns
80+
81+
### Basic Value Scrubbing
82+
```go
83+
// Scrub a single value
84+
scrubbedValue := scrubber.Default.Value("[email protected]")
85+
// Result: "[redacted:md5:hash]" or similar
86+
```
87+
88+
### Key-Value Scrubbing
89+
```go
90+
// Scrub a value with key context
91+
scrubbedValue := scrubber.Default.KeyValue("password", "secret123")
92+
// Result: "[redacted]"
93+
```
94+
95+
### JSON Scrubbing
96+
```go
97+
// Scrub a JSON structure
98+
jsonData := []byte(`{"username": "johndoe", "email": "[email protected]"}`)
99+
scrubbedJSON, err := scrubber.Default.JSON(jsonData)
100+
// Result: {"username": "[redacted:md5:hash]", "email": "[redacted]"}
101+
```
102+
103+
### Struct Scrubbing
104+
```go
105+
// Scrub a struct in-place
106+
type User struct {
107+
Username string
108+
Email string `scrub:"redact"`
109+
Password string
110+
}
111+
user := User{Username: "johndoe", Email: "[email protected]", Password: "secret123"}
112+
err := scrubber.Default.Struct(&user)
113+
// Result: user.Username is hashed, user.Email is redacted, user.Password is redacted
114+
```
115+
116+
### Deep Copy Struct Scrubbing
117+
```go
118+
// Create a scrubbed copy of a struct
119+
type User struct {
120+
Username string
121+
Email string `scrub:"redact"`
122+
Password string
123+
}
124+
user := User{Username: "johndoe", Email: "[email protected]", Password: "secret123"}
125+
scrubbedUser := scrubber.Default.DeepCopyStruct(user).(User)
126+
// Original user is unchanged, scrubbedUser has sanitized values
127+
```
128+
129+
## Integration Points
130+
131+
The Scrubber component integrates with:
132+
1. **Logging Systems**: To sanitize log messages
133+
2. **Error Handling**: To sanitize error messages
134+
3. **API Responses**: To sanitize sensitive data in responses
135+
4. **Monitoring Systems**: To sanitize metrics and traces
136+
5. **Other Gitpod Components**: To provide consistent data sanitization
137+
138+
## Dependencies
139+
140+
### Internal Dependencies
141+
None specified in the component's build configuration.
142+
143+
### External Dependencies
144+
- `github.com/hashicorp/golang-lru`: For caching sanitization decisions
145+
- `github.com/mitchellh/reflectwalk`: For traversing complex data structures
146+
147+
## Security Considerations
148+
149+
The component implements several security measures:
150+
151+
1. **Default Deny**: Fields are scrubbed by default if they match sensitive patterns
152+
2. **Multiple Strategies**: Different sanitization strategies for different types of data
153+
3. **Caching**: Caches sanitization decisions for performance
154+
4. **Customization**: Allows customization of scrubbing rules
155+
5. **Trusted Values**: Supports marking values as trusted to exempt them from scrubbing
156+
157+
## Related Components
158+
159+
- **Common-Go**: Uses the Scrubber for logging
160+
- **Server**: Uses the Scrubber for API request/response sanitization
161+
- **Workspace Services**: Use the Scrubber to protect workspace data
162+
- **Monitoring Components**: Use the Scrubber to sanitize metrics and traces
Lines changed: 154 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,154 @@
1+
# Service-Waiter Component
2+
3+
## Overview
4+
5+
The Service-Waiter component in Gitpod is a utility service that waits for other services to become available before proceeding. It's designed to be used in initialization and deployment scenarios where services have dependencies on other services being ready. The component can wait for different types of services, including databases, Redis instances, and Kubernetes components, ensuring that a service doesn't start until its dependencies are fully operational.
6+
7+
## Purpose
8+
9+
The primary purposes of the Service-Waiter component are:
10+
- Ensure services start in the correct order during deployment
11+
- Prevent services from starting before their dependencies are ready
12+
- Provide a consistent way to wait for different types of services
13+
- Handle timeouts and failure scenarios gracefully
14+
- Support different types of service readiness checks
15+
- Enable orchestration of complex multi-service deployments
16+
- Improve reliability of service startup sequences
17+
- Provide clear logging and error reporting for troubleshooting
18+
19+
## Architecture
20+
21+
The Service-Waiter component is structured as a command-line tool with several subcommands for different types of services:
22+
23+
1. **Database Waiter**: Waits for a MySQL database to become available
24+
2. **Redis Waiter**: Waits for a Redis instance to become reachable
25+
3. **Component Waiter**: Waits for a Kubernetes component to be ready with the correct image
26+
27+
The component is designed to be run as a Kubernetes init container or as part of a deployment script, blocking until the target service is ready or a timeout is reached.
28+
29+
## Key Features
30+
31+
### Database Waiting
32+
33+
The database waiter:
34+
- Connects to a MySQL database using provided credentials
35+
- Checks if the database is reachable and responsive
36+
- Optionally verifies that the latest migration has been applied
37+
- Supports TLS connections with custom CA certificates
38+
- Retries connections with backoff until success or timeout
39+
40+
### Redis Waiting
41+
42+
The Redis waiter:
43+
- Connects to a Redis instance at the specified host and port
44+
- Verifies connectivity by sending a PING command
45+
- Retries connections until success or timeout
46+
- Provides detailed logging of connection attempts
47+
48+
### Component Waiting
49+
50+
The component waiter:
51+
- Checks if Kubernetes pods with specific labels are running
52+
- Verifies that pods are using the expected container image
53+
- Ensures that pods are in the Ready state
54+
- Monitors pod status until all pods are ready or timeout
55+
- Supports namespace-specific checks
56+
57+
## Configuration
58+
59+
The Service-Waiter component can be configured through command-line flags or environment variables:
60+
61+
### Global Configuration
62+
- `--timeout`, `-t`: Maximum time to wait (default: 5m or `SERVICE_WAITER_TIMEOUT` env var)
63+
- `--verbose`, `-v`: Enable verbose logging
64+
- `--json-log`, `-j`: Produce JSON log output
65+
66+
### Database Waiter Configuration
67+
- `--host`, `-H`: Database host (from `DB_HOST` env var)
68+
- `--port`, `-p`: Database port (from `DB_PORT` env var, default: 3306)
69+
- `--username`, `-u`: Database username (from `DB_USERNAME` env var, default: gitpod)
70+
- `--password`, `-P`: Database password (from `DB_PASSWORD` env var)
71+
- `--caCert`: Custom CA certificate (from `DB_CA_CERT` env var)
72+
- `--migration-check`: Enable checking if the latest migration has been applied
73+
74+
### Redis Waiter Configuration
75+
- `--host`, `-H`: Redis host (default: redis)
76+
- `--port`, `-p`: Redis port (default: 6379)
77+
78+
### Component Waiter Configuration
79+
- `--image`: The latest image of current installer build
80+
- `--namespace`: The namespace of deployment
81+
- `--component`: Component name of deployment
82+
- `--labels`: Labels of deployment
83+
84+
## Usage Patterns
85+
86+
### Waiting for a Database
87+
```bash
88+
service-waiter database --host mysql.example.com --port 3306 --username gitpod --password secret
89+
```
90+
91+
### Waiting for Redis
92+
```bash
93+
service-waiter redis --host redis.example.com --port 6379
94+
```
95+
96+
### Waiting for a Kubernetes Component
97+
```bash
98+
service-waiter component --namespace default --component server --labels "app=gitpod,component=server" --image gitpod/server:latest
99+
```
100+
101+
### Using as an Init Container
102+
```yaml
103+
initContainers:
104+
- name: wait-for-db
105+
image: gitpod/service-waiter:latest
106+
command: ["service-waiter", "database"]
107+
env:
108+
- name: DB_HOST
109+
value: mysql
110+
- name: DB_PASSWORD
111+
valueFrom:
112+
secretKeyRef:
113+
name: mysql-secrets
114+
key: password
115+
```
116+
117+
## Integration Points
118+
119+
The Service-Waiter component integrates with:
120+
1. **MySQL Databases**: For database readiness checks
121+
2. **Redis Instances**: For Redis readiness checks
122+
3. **Kubernetes API**: For component readiness checks
123+
4. **Deployment Systems**: As part of deployment scripts or init containers
124+
5. **Logging Systems**: For reporting readiness status
125+
126+
## Dependencies
127+
128+
### Internal Dependencies
129+
- `components/common-go`: Common Go utilities
130+
- `components/gitpod-db`: For database migration information
131+
132+
### External Dependencies
133+
- MySQL client for database connections
134+
- Redis client for Redis connections
135+
- Kubernetes client for component checks
136+
- Cobra for command-line interface
137+
- Viper for configuration management
138+
139+
## Security Considerations
140+
141+
The component implements several security measures:
142+
143+
1. **TLS Support**: For secure database connections
144+
2. **Password Masking**: Passwords are masked in logs
145+
3. **Minimal Permissions**: Only requires read access to check service status
146+
4. **Timeout Handling**: Prevents indefinite hanging
147+
5. **Error Handling**: Proper handling of connection errors
148+
149+
## Related Components
150+
151+
- **Database**: The service-waiter checks database availability
152+
- **Redis**: The service-waiter checks Redis availability
153+
- **Kubernetes Components**: The service-waiter checks component readiness
154+
- **Deployment Systems**: Use service-waiter to orchestrate deployments

0 commit comments

Comments
 (0)