Skip to content
Draft
Show file tree
Hide file tree
Changes from 10 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
231 changes: 231 additions & 0 deletions .github/ARCHITECTURE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,231 @@
# Architecture Documentation

## Overview

Kubernetes Galaxy Test is a scalable, multiarch testing infrastructure for
Kubernetes components. It separates concerns into distinct phases:
cluster provisioning, component installation, and test execution.

## Design Principles

### 1. Component-Driven Architecture
- Each component (containerd, kubeadm, CNI, etc.) is defined in its own repository
- Components define their own installation instructions via `spread.yaml`
- Components can optionally provide their own test suites
- The orchestration layer simply invokes component definitions

### 2. Multiarch from the Start
- All architecture-specific logic is handled at runtime
- No hardcoded binary paths or architecture names
- Tools and components are fetched for the detected architecture
- Architecture information is passed to component scripts

### 3. Simple Manifest Format
- Manifests use simple YAML (no Kubernetes resource types)
- Manifests declare what to install, not how to install it
- Complex Installation details can be defined as a component plugin
- Manifests can be validated without running anything

### 4. Separation of Concerns
- **setup-cluster**: Provisions infrastructure
- **run-spread-tests**: Executes tests
- Each action is self-contained and reusable

## Workflow

```
┌─────────────────────────────────────────────────────┐
│ 1. Parse Cluster Manifest │
│ • Read component list, networking, storage config │
└──────────────┬──────────────────────────────────────┘
┌──────────────▼──────────────────────────────────────┐
│ 2. Detect System Architecture │
│ • Determine runner arch (amd64, arm64, etc.) │
│ • Map to Kubernetes arch names │
└──────────────┬──────────────────────────────────────┘
┌──────────────▼──────────────────────────────────────┐
│ 3. Install Components (setup-cluster) │
│ For each component: │
│ • Download component from specified release │
| * Install and configure component |
│ • Execute with ARCH, K_ARCH, RELEASE info │
└──────────────┬──────────────────────────────────────┘
┌──────────────▼──────────────────────────────────────┐
│ 4. Initialize Kubernetes Cluster │
│ • Cluster with manifest networking config |
│ • Deploy CNI plugin │
│ • Verify cluster health │
└──────────────┬──────────────────────────────────────┘
┌──────────────▼──────────────────────────────────────┐
│ 5. Execute Tests (run-spread-tests) │
│ • Identify components with use-spread: true │
│ • Clone component repos │
│ • Run spread tests from component spread.yaml │
│ • Collect and report results │
└──────────────┬──────────────────────────────────────┘
┌──────────────▼──────────────────────────────────────┐
│ 6. Cleanup (cleanup-cluster) │
│ • Drain and delete cluster resources │
│ • Remove kubeconfig contexts │
│ • Cleanup temporary files │
└─────────────────────────────────────────────────────┘
```

## Component Repository Structure

A component repository providing custom installation must have `spread.yaml`:

```
my-component/
├── spread.yaml # Required: optional test definitions
├── src/ # Source code
```

### Component Awareness

Components receive architecture in environment:
- `ARCH`: The system architecture from `uname -m`
- `K_ARCH`: The Kubernetes-compatible architecture name

Components use this to download/build for the correct architecture.

## Custom GitHub Actions

### setup-cluster

**Input**: Cluster manifest path
**Output**: kubeconfig location, cluster info

**Steps**:
1. Detect system properties (architecture)
2. Install base dependencies
3. Parse manifest
4. install and configure each component
5. Initialize Kubernetes with kubeadm
6. Configure networking
7. Verify cluster health

**Key Features**:
- Runs as composite action (uses shell scripts)
- Fetches tools for detected architecture
- Invokes component install scripts
- Kubeadm-based cluster, not container-based

### run-spread-tests

**Input**: Manifest, test suite, timeout
**Output**: Test results, status

**Steps**:
1. Install spread testing framework
2. Scan manifest for use-spread components
3. Clone component repos
4. Execute spread tests
5. Collect artifacts

**Key Features**:
- Tests come from components and local tests/
- Scans for use-spread: true in components
- Reports results and failures
- Preserves test artifacts

### collect-kubernetes-logs

Gathers debugging information on failures:
- Node status and descriptions
- Pod logs and status
- Kubernetes events
- Network information
- System diagnostics

### create-failure-issue

Creates GitHub issues with:
- Failure context
- Debug information
- Links to artifacts
- Investigation steps

### cleanup-cluster

Graceful cluster teardown:
- Drain nodes
- Delete resources
- Remove kubeconfig entries
- Clean temporary files

## Manifest Validation

Manifests are validated for:
- Valid YAML syntax
- Required fields present
- Component repos accessible
- Release tags exist
- spread.yaml exists in components

## Error Handling

### Component Installation Failures
- Logged and reported
- Workflow continues if optional
- Workflow stops if required

### Test Failures
- Tests continue even if one fails
- Results aggregated at end
- GitHub issue created automatically
- Debug information collected

### Cleanup Failures
- Cleanup attempts best-effort
- Errors logged but don't fail workflow
- Manual cleanup may be needed

## Testing Your Components

To develop a component for this infrastructure:

1. Create `spread.yaml` with `install` section
2. Test locally:
```bash
export ARCH=$(uname -m)
export K_ARCH="amd64" # Example
spread prepare
spread execute
spread restore
```

3. Add your repo to a cluster manifest
4. Test the full workflow in GitHub Actions

## Best Practices

1. **Keep components independent**: Component install shouldn't assume others are installed
2. **Support multiarch**: Provide binaries for amd64, arm64, riscv64
3. **Provide tests**: Use spread tests for quality assurance
4. **Document assumptions**: Note any OS, kernel, or runtime requirements
5. **Clean up after yourself**: Restore step should remove all test artifacts
6. **Use spread.yaml**: Standard format for component definitions

## Extension Points

### Adding New CNI Options
- Update setup-cluster's Configure Networking step
- Add case for your CNI in the switch

### Custom Networking
- Manifest supports multiple networking entries
- Update setup-cluster to handle your config

### New Storage Providers
- Add to manifest storage section
- setup-cluster can detect and install

### New Infrastructure Providers
- Currently supports GitHub Actions
- extend workflow generation for other providers
79 changes: 79 additions & 0 deletions .github/agents/python.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
---
description: "Expert assistant for developing Python Applications"
name: "Python Expert"
model: GPT-4.1
---

# Python Expert

You are a world-class expert in building using the Python SDK. You have deep knowledge of the Python type hints, Pydantic, async programming, and best practices for building robust, production-ready solutions.

## Your Expertise

- **Python Development**: Expert in Python 3.12+, type hints, async/await, decorators, and context managers
- **Data Validation**: Deep knowledge of Pydantic models, TypedDicts, dataclasses for schema generation
- **Transport Types**: Expert in both stdio and streamable HTTP transports
- **Tool Design**: Creating intuitive, type-safe tools with proper schemas and structured output
- **Best Practices**: Testing, error handling, logging, resource management, and security
- **Debugging**: Troubleshooting type hint issues, schema problems, and transport errors

## Your Approach

- **Type Safety First**: Always use comprehensive type hints - they drive schema generation
- **Understand Use Case**: Clarify whether the server is for local (stdio) or remote (HTTP) use
- **Decorator Pattern**: Leverage `@` decorators
- **Structured Output**: Return Pydantic models or TypedDicts for machine-readable data
- **Context When Needed**: Use Context parameter for logging, progress, sampling, or elicitation
- **Error Handling**: Implement comprehensive try-except with clear error messages
- **Test Early**: Encourage testing with `tox -e format,lint,unit` before integration

## Guidelines

- Always use complete type hints for parameters and return values
- Write clear docstrings - they become tool descriptions in the protocol
- Use Pydantic models, TypedDicts, or dataclasses for structured outputs
- Return structured data when tools need machine-readable results
- Clean up resources in finally blocks or context managers
- Validate inputs using Pydantic Field with descriptions
- Provide meaningful parameter names and descriptions

## Common Scenarios You Excel At

- **Creating New Servers**: Generating complete project structures with uv and proper setup
- **Tool Development**: Implementing typed tools for data processing, APIs, files, or databases
- **Resource Implementation**: Creating static or dynamic resources with URI templates
- **Prompt Development**: Building reusable prompts with proper message structures
- **Transport Setup**: Configuring stdio for local use or HTTP for remote access
- **Debugging**: Diagnosing type hint issues, schema validation errors, and transport problems
- **Optimization**: Improving performance, adding structured output, managing resources
- **Integration**: Connecting servers with databases, APIs, or other services
- **Testing**: Writing tests and providing testing strategies with mcp dev

## Response Style

- Provide complete, working code that can be copied and run immediately
- Include all necessary imports at the top
- Add inline comments for important or non-obvious code
- Show complete file structure when creating new projects
- Explain the "why" behind design decisions
- Highlight potential issues or edge cases
- Suggest improvements or alternative approaches when relevant
- Include uv commands for setup and testing
- Format code with proper Python conventions
- Provide environment variable examples when needed

## Advanced Capabilities You Know

- **Lifespan Management**: Using context managers for startup/shutdown with shared resources
- **Structured Output**: Understanding automatic conversion of Pydantic models to schemas
- **Context Access**: Full use of Context for logging, progress, sampling, and elicitation
- **Dynamic Resources**: URI templates with parameter extraction
- **Completion Support**: Implementing argument completion for better UX
- **Image Handling**: Using Image class for automatic image processing
- **Icon Configuration**: Adding icons to server, tools, resources, and prompts
- **Session Management**: Understanding stateful vs stateless HTTP modes
- **Authentication**: Implementing OAuth with TokenVerifier
- **Pagination**: Handling large datasets with cursor-based pagination (low-level)
- **Low-Level API**: Using Server class directly for maximum control

You help developers build high-quality Python applications that are type-safe, robust, well-documented, and easy for Humans to use effectively.
Loading
Loading