Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
428 changes: 428 additions & 0 deletions .github/ARCHITECTURE.md

Large diffs are not rendered by default.

79 changes: 79 additions & 0 deletions .github/agents/python.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
---
description: "Expert assistant for developing Python Applications"
name: "Python Expert"
model: GPT-4.1
---

# Python Expert

You are a world-class expert in building using the Python SDK. You have deep knowledge of the Python type hints, Pydantic, async programming, and best practices for building robust, production-ready solutions.

## Your Expertise

- **Python Development**: Expert in Python 3.12+, type hints, async/await, decorators, and context managers
- **Data Validation**: Deep knowledge of Pydantic models, TypedDicts, dataclasses for schema generation
- **Transport Types**: Expert in both stdio and streamable HTTP transports
- **Tool Design**: Creating intuitive, type-safe tools with proper schemas and structured output
- **Best Practices**: Testing, error handling, logging, resource management, and security
- **Debugging**: Troubleshooting type hint issues, schema problems, and transport errors

## Your Approach

- **Type Safety First**: Always use comprehensive type hints - they drive schema generation
- **Understand Use Case**: Clarify whether the server is for local (stdio) or remote (HTTP) use
- **Decorator Pattern**: Leverage `@` decorators
- **Structured Output**: Return Pydantic models or TypedDicts for machine-readable data
- **Context When Needed**: Use Context parameter for logging, progress, sampling, or elicitation
- **Error Handling**: Implement comprehensive try-except with clear error messages
- **Test Early**: Encourage testing with `tox -e format,lint,unit` before integration

## Guidelines

- Always use complete type hints for parameters and return values
- Write clear docstrings - they become tool descriptions in the protocol
- Use Pydantic models, TypedDicts, or dataclasses for structured outputs
- Return structured data when tools need machine-readable results
- Clean up resources in finally blocks or context managers
- Validate inputs using Pydantic Field with descriptions
- Provide meaningful parameter names and descriptions

## Common Scenarios You Excel At

- **Creating New Servers**: Generating complete project structures with uv and proper setup
- **Tool Development**: Implementing typed tools for data processing, APIs, files, or databases
- **Resource Implementation**: Creating static or dynamic resources with URI templates
- **Prompt Development**: Building reusable prompts with proper message structures
- **Transport Setup**: Configuring stdio for local use or HTTP for remote access
- **Debugging**: Diagnosing type hint issues, schema validation errors, and transport problems
- **Optimization**: Improving performance, adding structured output, managing resources
- **Integration**: Connecting servers with databases, APIs, or other services
- **Testing**: Writing tests and providing testing strategies with mcp dev

## Response Style

- Provide complete, working code that can be copied and run immediately
- Include all necessary imports at the top
- Add inline comments for important or non-obvious code
- Show complete file structure when creating new projects
- Explain the "why" behind design decisions
- Highlight potential issues or edge cases
- Suggest improvements or alternative approaches when relevant
- Include uv commands for setup and testing
- Format code with proper Python conventions
- Provide environment variable examples when needed

## Advanced Capabilities You Know

- **Lifespan Management**: Using context managers for startup/shutdown with shared resources
- **Structured Output**: Understanding automatic conversion of Pydantic models to schemas
- **Context Access**: Full use of Context for logging, progress, sampling, and elicitation
- **Dynamic Resources**: URI templates with parameter extraction
- **Completion Support**: Implementing argument completion for better UX
- **Image Handling**: Using Image class for automatic image processing
- **Icon Configuration**: Adding icons to server, tools, resources, and prompts
- **Session Management**: Understanding stateful vs stateless HTTP modes
- **Authentication**: Implementing OAuth with TokenVerifier
- **Pagination**: Handling large datasets with cursor-based pagination (low-level)
- **Low-Level API**: Using Server class directly for maximum control

You help developers build high-quality Python applications that are type-safe, robust, well-documented, and easy for Humans to use effectively.
204 changes: 204 additions & 0 deletions .github/copilot-instructions.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,204 @@
# Kubernetes Galaxy Test - Copilot Instructions

## Project Overview
Scalable Kubernetes testing infrastructure that validates custom-built components (kubeadm, containerd, etc.) across multiple Kubernetes versions (1.33-1.36) on multiarch systems (amd64, arm64, riscv64, etc.) using Python CLI and portable modules. Tests use canonical [spread](https://github.com/canonical/spread) framework from component repositories.

## Core Architecture

**Manifest-Driven Design** - YAML cluster configurations:
- Define K8s version, node counts, networking, and component versions/sources
- Components specify repo URL, release tag, format (Binary|Container|Binary+Container), and `test` flag (component provides tests)
- 4 baseline manifests targeting K8s 1.33, 1.34, 1.35, 1.36 with 30+ components each (containerd, etcd, coredns, kube-* etc.)

**Python CLI Framework** - Cross-platform testing:
- `kube-galaxy` CLI with command routing and automatic manifest discovery
- `pkg/cluster/setup.py`: Manifest-based provisioning via kubeadm (not container shortcuts)
- `pkg/testing/spread.py`: Executes spread tests from components marked `test: true`
- `pkg/utils/logs.py`: Log collection and debugging utilities
- Library utilities for arch detection, YAML manifest parsing, component installation

**Multiarch Support Built In**:
- Runtime arch detection with mapping to K8s binary formats (amd64, arm64, riscv64, arm, ppc64le, s390x)
- Components receive SYSTEM_ARCH, K8S_ARCH, IMAGE_ARCH environment variables
- Container image tags mapped per architecture (e.g., aarch64→arm64)

**GitHub Actions Integration**:
- Single workflow matrix tests across all K8s versions with customizable runner sizes
- Uses astral-sh/setup-uv for fast Python environment setup
- Direct CLI invocation with no custom action wrappers needed
- Automatic debug log collection and issue creation on failures

## Essential Workflows & Patterns

### Manifest Anatomy
```yaml
name: baseline-k8s-1.35 # Cluster identifier
description: "1.35.0 baseline" # Human-readable
kubernetes-version: "1.35.0" # Reference only
nodes:
control-plane: 1 # Kubeadm provisioning count
worker: 2
components:
- name: containerd # Component identifier
category: containerd # Organizational
release: "2.2.1" # Git tag or branch
repo: # Repository info object
base-url: "https://github.com/..." # Required: fetch source
subdir: "path/to/component" # Optional: for monorepo components
ref: "feature-branch" # Optional: override release with git ref
format: Binary|Container|Binary+Container # Install method
test: false/true # Component provides spread tests
networking:
- name: calico
service-cidr: "10.96.0.0/12"
pod-cidr: "192.168.0.0/16"
```
Key insight: `test: true` means component repo has spread.yaml tests that `kube-galaxy test` will execute.

### Local Development Workflow
```bash
# Validate manifest YAML syntax
kube-galaxy validate

# Provision real cluster with kubeadm (no container shortcuts)
kube-galaxy setup

# Run spread tests from components with test: true
kube-galaxy test spread

# Clean cluster and artifacts
kube-galaxy cleanup all
```

### CI/CD Test Execution
- **Single matrix workflow**: `test-baseline-clusters.yml` tests all K8s versions (1.33-1.36) in parallel
- **Inputs**: manifest path, K8s version (matrix param), test suite name
- **Process**: setup → run tests → collect logs on failure → cleanup (always)
- **Failure handling**: Custom action `create-failure-issue` captures full debug state (pods, nodes, events) in issue body
- **Artifact retention**: 30 days for test results

### Component Installation Pattern
`pkg/cluster/setup.py` provides the standard flow:
1. Fetch component repo at specified release tag (git clone + git checkout)
2. Locate installation method: `spread.yaml` → extract install script path
3. Validate architecture compatibility (SYSTEM_ARCH, K8S_ARCH, IMAGE_ARCH env vars)
4. Execute install script, verify binary in PATH
5. Components specify format: Binary (install to /usr/local/bin), Container (pull image), or both

### Test Execution Model
- **Test discovery**: Only components with `test: true` are tested
- **Test location**: Component repos contain `spread.yaml` at root
- **Spread execution**: `kube-galaxy test spread` clones each component, finds spread.yaml, runs spread test suite
- **Parallelism**: Spread tests run concurrently if specified in spread.yaml
- **Test results**: Captured and uploaded as GitHub artifacts

### Multiarch Execution
Architecture detection happens at runtime in `pkg/cluster/setup.py`:
- Calls `get_arch_info()` from `pkg/arch/detector.py`
- Sets: `SYSTEM_ARCH` (raw uname), `K8S_ARCH` (Kubernetes format), `IMAGE_ARCH` (container tag format)
- All component install scripts receive these three env vars for architecture-specific behavior
- Example: aarch64 system → K8S_ARCH=arm64, IMAGE_ARCH=arm64

## Critical Design Patterns

**State Preservation for Debugging**:
- Cluster state saved to `debug-logs/` directory before cleanup
- Preserve: kubectl dump, pod logs, events, node descriptions
- GitHub Actions auto-creates failure issues with this debug data
- Files survive cleanup and kubeadm reset for post-failure investigation

**Manifest as Single Source of Truth**:
- All behavior (components, versions, networking) defined in YAML manifests
- No hardcoding component lists or versions in Python code
- Each K8s version has its own manifest: baseline-k8s-1.33.yaml through 1.36.yaml
- Python modules parse manifests using `pkg/manifest/loader.py`

**Module Organization**:
- `pkg/arch/detector.py`: Pure architecture detection/mapping (no side effects)
- `pkg/manifest/loader.py`: Pure YAML parsing/extraction (no modifications)
- `pkg/manifest/validator.py`: Schema and field validation
- `pkg/cluster/setup.py`: Cluster setup operations
- `pkg/testing/spread.py`: Test execution
- CLI commands in `cmd/` compose these modules for user-facing behavior

**Error Handling & Cleanup**:
- `kube-galaxy setup`: Creates cluster, exits on first error (fail-fast)
- `kube-galaxy cleanup all`: Always runs `kubeadm reset --force`, removes artifacts
- GitHub Actions use `if: always()` to ensure cleanup even on failures

## GitHub Actions Integration

**Current Implementation**:
- **Setup Python**: Uses astral-sh/setup-uv@v7.2.1 for fast environment setup
- **Install CLI**: `pip install -e .` installs kube-galaxy CLI
- **Run Commands**: Direct invocation of `kube-galaxy` CLI commands
- **Failure Handling**: Automatic log collection via `pkg/utils/logs.py`
- **Artifact retention**: 30 days for test results

**Implementation Pattern**:
- Workflow defined in `.github/workflows/test-baseline-clusters.yml`
- Calls Python CLI directly; CLI provides GitHub logging integration
- No external action dependencies needed
- All features (setup, test, logs) provided by Python modules

## Best Practices

**When Adding Components**:
1. Add entry to all 4 `manifests/baseline-k8s-*.yaml` files (don't skip versions)
2. Set `test: true` only if component repo has `spread.yaml` with test definitions
3. Use canonical GitHub repos where available; verify release tag exists
4. Set `format` correctly based on component's build/distribution (Binary, Container, or both)

**When Modifying Python Modules**:
1. Follow existing patterns in `pkg/` modules for business logic
2. Use `pkg/utils/errors.py` custom exceptions for error handling
3. Test locally with `tox -e test` before committing
4. Update docstrings and type hints for all functions
5. Use manifest parsing (`load_manifest()` from loader) instead of hardcoding

**When Debugging Failures**:
1. Check `debug-logs/` directory for preserved cluster state BEFORE cleanup removes it
2. View GitHub Actions artifact logs from failure runs (30-day retention)
3. Verify SYSTEM_ARCH/K8S_ARCH/IMAGE_ARCH env vars in manifest detection
4. Run `kube-galaxy cleanup all` manually if workflow fails mid-setup for cleanup

## CLI Command Reference

```bash
# Validation
kube-galaxy validate
kube-galaxy validate --manifest manifests/baseline-k8s-1.35.yaml

# Testing
kube-galaxy setup manifests/baseline-k8s-1.35.yaml
kube-galaxy test manifests/baseline-k8s-1.35.yaml

# Management
kube-galaxy cleanup manifests/baseline-k8s-1.35.yaml
kube-galaxy status
```

## Integration References
- [GitHub Copilot Custom Agents](https://docs.github.com/en/copilot/tutorials/customization-library/custom-agents/your-first-custom-agent)
- [Canonical Spread Testing](https://github.com/canonical/spread)
- [Kubernetes Documentation](https://kubernetes.io/docs/)
- [GitHub Actions Documentation](https://docs.github.com/en/actions)

## Quick Start Commands
```bash
# Generate new cluster manifest
copilot: "Create a new cluster manifest for testing with 5 worker nodes and custom containerd version"

# Create GitHub Action workflow
copilot: "Generate a GitHub Actions workflow for the high-availability cluster manifest"

# Add error handling
copilot: "Add comprehensive error handling and issue creation to the existing workflow"

# Create test suite
copilot: "Create a spread test suite for networking functionality"
```

---

This project leverages GitHub Copilot to accelerate development of scalable Kubernetes testing infrastructure. Follow these instructions to ensure consistent, robust, and maintainable test automation.
47 changes: 47 additions & 0 deletions .github/workflows/lint.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
name: Lint

on:
push:
branches:
- main
paths:
- 'src/**'
- 'tests/**'
- '.github/workflows/lint.yml'
- 'pyproject.toml'
pull_request:
paths:
- 'src/**'
- 'tests/**'
- '.github/workflows/lint.yml'
- 'pyproject.toml'

concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true

jobs:
call-inclusive-naming-check:
name: Inclusive Naming
uses: canonical/inclusive-naming/.github/workflows/woke.yaml@main
with:
fail-on-error: "true"
linting:
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v4

- name: Setup Astral UV
uses: astral-sh/setup-uv@v7.2.1
with:
python-version: "3.12"

- name: Install kube-galaxy
shell: bash
run: |
uv tool install tox --with tox-uv
- name: Run lint checks
run: |
tox -e lint
Loading
Loading