Work in progress
A Juju machine charm for deploying Concourse CI - a modern, scalable continuous integration and delivery system. This charm supports flexible deployment patterns including single-unit, multi-unit with automatic role assignment, and separate web/worker configurations.
Note: This is a machine charm designed for bare metal, VMs, and LXD deployments. For Kubernetes deployments, see https://charmhub.io/concourse-web and https://charmhub.io/concourse-worker.
- Flexible Deployment Modes: Deploy as auto-scaled web/workers or explicit roles
- Automatic Role Detection: Leader unit becomes web server, followers become workers
- Fully Automated Key Distribution: TSA keys automatically shared via peer relations - zero manual setup!
- Secure Random Passwords: Auto-generated admin password stored in Juju peer data
- Latest Version Detection: Automatically downloads the latest Concourse release from GitHub
- PostgreSQL 16+ Integration: Full support with Juju secrets API for secure credential management
- Dynamic Port Configuration: Change web port on-the-fly with automatic service restart
- Privileged Port Support: Run on port 80 with proper Linux capabilities (CAP_NET_BIND_SERVICE)
- Auto External-URL: Automatically detects unit IP for external-url configuration
- Ubuntu 24.04 LTS: Optimized for Ubuntu 24.04 LTS
- Container Runtime: Uses containerd with LXD-compatible configuration
- Automatic Key Management: TSA keys, session signing keys, and worker keys auto-generated
- Prometheus Metrics: Optional metrics endpoint for monitoring
- Download Progress: Real-time installation progress in Juju status
- GPU Support: NVIDIA (CUDA) and AMD (ROCm) GPU workers for ML/AI workloads (GPU Guide)
- Dataset Mounting: Automatic dataset injection for GPU tasks (Dataset Guide)
- 🆕 General Folder Mounting: Automatic discovery and mounting of ANY folder under
/srv(General Mounting Guide)- ✅ Zero configuration - just mount folders to
/srvand go - ✅ Read-only by default for data safety
- ✅ Writable folders with
_writableor_rwsuffix - ✅ Multiple concurrent folders (datasets, models, outputs, caches)
- ✅ Works on both GPU and non-GPU workers
- ✅ Automatic permission validation and fail-fast
- ✅ Backward compatible with existing GPU dataset mounting
- ✅ Zero configuration - just mount folders to
- Juju 3.x
- Ubuntu 24.04 LTS (on Juju-managed machines)
- PostgreSQL charm 16/stable (for web server)
# Create a Juju model
juju add-model concourse
# Deploy PostgreSQL
juju deploy postgresql --channel 16/stable --base ubuntu@24.04
# Deploy Concourse CI charm as application "concourse-ci"
juju deploy concourse-ci-machine concourse-ci --config mode=auto
# Relate to database (uses PostgreSQL 16 client interface with Juju secrets)
juju integrate concourse-ci:postgresql postgresql:database
# Expose the web interface (opens port in Juju)
juju expose concourse-ci
# Wait for deployment (takes ~5-10 minutes)
juju status --watch 1sThe charm automatically:
- Reads database credentials from Juju secrets
- Configures the external URL based on unit IP
- Opens the configured web port (default: 8080)
- Generates and stores admin password in peer relation data
Naming Convention:
- Charm name:
concourse-ci-machine(what you deploy from Charmhub) - Application name:
concourse-ci(used throughout this guide) - Unit names:
concourse-ci/0,concourse-ci/1, etc.
Once deployed, get credentials with juju run concourse-ci/leader get-admin-password
Deploy multiple units with automatic role assignment and key distribution:
# Deploy PostgreSQL
juju deploy postgresql --channel 16/stable --base ubuntu@24.04
# Deploy Concourse charm (named "concourse-ci") with 1 web + 2 workers
juju deploy concourse-ci-machine concourse-ci -n 3 --config mode=auto
# Relate to database (using application name "concourse-ci")
juju relate concourse-ci:postgresql postgresql:database
# Check deployment
juju statusResult:
concourse-ci/0(leader): Web serverconcourse-ci/1-2: Workers- All keys automatically distributed via peer relations! ✨
Note: Application is named concourse-ci for easier reference (shorter than concourse-ci-machine)
For maximum flexibility with separate applications:
# Deploy PostgreSQL
juju deploy postgresql --channel 16/stable --base ubuntu@24.04
# Deploy web server (1 unit)
juju deploy concourse-ci-machine web --config mode=web
# Deploy workers (2 units)
juju deploy concourse-ci-machine worker -n 2 --config mode=worker
# Relate web to database
juju relate web:postgresql postgresql:database
# Relate web and worker for automatic TSA key exchange
juju relate web:tsa worker:flight
# Check deployment
juju statusResult:
web/0: Web server onlyworker/0,worker/1: Workers only connected via TSA
Note: The tsa / flight relation automatically handles SSH key exchange between web and worker applications, eliminating the need for manual key management.
The charm supports three deployment modes via the mode configuration:
Leader unit runs web server, non-leader units run workers. Keys automatically distributed via peer relations!
Note: You need at least 2 units for this mode to have functional workers (Unit 0 = Web, Unit 1+ = Workers).
juju deploy concourse-ci-machine concourse-ci -n 3 --config mode=auto
juju relate concourse-ci:postgresql postgresql:databaseBest for: Production, scalable deployments Key Distribution: ✅ Fully Automatic - zero manual intervention required!
Deploy web and workers as separate applications for independent scaling.
# Web application
juju deploy concourse-ci-machine web --config mode=web
# Worker application (scalable)
juju deploy concourse-ci-machine worker -n 2 --config mode=worker
# Relate web to PostgreSQL
juju relate web:postgresql postgresql:database
# Relate web and worker for automatic TSA key exchange
juju relate web:tsa worker:flightBest for: Independent scaling of web and workers
Key Distribution: ✅ Automatic via tsa / flight relation
| Option | Type | Default | Description |
|---|---|---|---|
mode |
string | auto |
Deployment mode: auto, web, or worker |
version |
string | latest |
Concourse version to install (auto-detects latest from GitHub) |
web-port |
int | 8080 |
Web UI and API port |
worker-procs |
int | 1 |
Number of worker processes per unit |
log-level |
string | info |
Log level: debug, info, warn, error |
enable-metrics |
bool | true |
Enable Prometheus metrics on port 9391 |
external-url |
string | (auto) | External URL for webhooks and OAuth |
initial-admin-username |
string | admin |
Initial admin username |
container-placement-strategy |
string | volume-locality |
Container placement: volume-locality, random, etc. |
max-concurrent-downloads |
int | 10 |
Max concurrent resource downloads |
containerd-dns-proxy-enable |
bool | false |
Enable containerd DNS proxy |
containerd-dns-server |
string | 1.1.1.1,8.8.8.8 |
DNS servers for containerd containers |
Configuration changes are applied dynamically with automatic service restart.
# Set custom web port (automatically restarts service)
juju config concourse-ci web-port=9090
# Change to privileged port 80 (requires CAP_NET_BIND_SERVICE - already configured)
juju config concourse-ci web-port=80
# Enable debug logging
juju config concourse-ci log-level=debug
# Set external URL (auto-detects unit IP if not set)
juju config concourse-ci external-url=https://ci.example.comUse the upgrade action to change Concourse CI version (update the version configuration first to ensure the change persists across charm refreshes):
# Set version configuration first (essential for persistence)
juju config concourse-ci version=7.14.3
# Trigger the upgrade action (automatically upgrades all workers)
juju config concourse-ci version=7.14.3
# Downgrade is also supported (update config then run action)
juju config concourse-ci version=7.12.1
juju config concourse-ci version=7.12.1Auto-upgrade behavior:
- When the web server (leader in mode=auto) is upgraded, all workers automatically upgrade to match
- Works across separate applications connected via TSA relations
- Workers show "Auto-upgrading Concourse CI to X.X.X..." during automatic upgrades
Note: The web-port configuration supports dynamic changes including privileged ports (< 1024) thanks to AmbientCapabilities=CAP_NET_BIND_SERVICE in the systemd service.
- Get the web server IP:
juju status- Check the exposed port (shown in Ports column):
juju status concourse-ci
# Look for: Ports column showing "80/tcp" or "8080/tcp"-
Open in browser:
http://<web-unit-ip>:<port> -
Get the admin credentials:
juju run concourse-ci/leader get-admin-passwordExample output:
message: Use these credentials to login to Concourse web UI
password: 01JfF@I!9W^0%re!3I!hyy3C
username: admin
Security: A random password is automatically generated on first deployment and stored securely in Juju peer relation data. All units in the deployment share the same credentials.
The Fly CLI is Concourse's command-line tool for managing pipelines:
# Download fly from your Concourse instance
curl -Lo fly "http://<web-unit-ip>:8080/api/v1/cli?arch=amd64&platform=linux"
chmod +x fly
sudo mv fly /usr/local/bin/
# Get credentials
ADMIN_PASSWORD=$(juju run concourse-ci/leader get-admin-password --format=json | jq -r '."unit-concourse-ci-2".results.password')
# Login
fly -t prod login -c http://<web-unit-ip>:8080 -u admin -p "$ADMIN_PASSWORD"
# Sync fly version
fly -t prod syncimage_resource.
- Create a pipeline file
hello.yml:
jobs:
- name: hello-world
plan:
- task: say-hello
config:
platform: linux
image_resource:
type: registry-image
source:
repository: busybox
run:
path: sh
args:
- -c
- |
echo "=============================="
echo "Hello from Concourse CI!"
echo "Date: $(date)"
echo "=============================="- Set the pipeline:
fly -t prod set-pipeline -p hello -c hello.yml
fly -t prod unpause-pipeline -p hello- Trigger the job:
fly -t prod trigger-job -j hello/hello-world -wNote: Common lightweight images: busybox (~2MB), alpine (~5MB), ubuntu (~28MB)
# Add 2 more worker units to the concourse-ci application
juju add-unit concourse-ci -n 2
# Verify workers
juju ssh concourse-ci/0 # SSH to unit 0 of concourse-ci application
fly -t local workers# Remove specific unit
juju remove-unit concourse-ci/3The web server requires a PostgreSQL database:
juju relate concourse-ci:postgresql postgresql:databaseSupported PostgreSQL Charms:
postgresql(16/stable recommended)- Any charm providing the
postgresqlinterface
Concourse exposes Prometheus metrics on port 9391:
juju relate concourse-ci:monitoring prometheus:targetUnits automatically coordinate via the peers relation (automatic, no action needed).
The charm uses Juju storage for persistent data:
# Deploy with specific storage
juju deploy concourse-ci-machine concourse-ci --storage concourse-data=20G
# Add storage to existing unit
juju add-storage concourse-ci/0 concourse-data=10GStorage is mounted at /var/lib/concourse.
Concourse workers can utilize NVIDIA GPUs for ML/AI workloads, GPU-accelerated builds, and compute-intensive tasks.
- NVIDIA GPU hardware on the host machine
- NVIDIA drivers installed on the host (tested with driver 580.95+)
- For LXD/containers: GPU passthrough configured (see below)
Note: The charm automatically installs nvidia-container-toolkit and configures the GPU runtime. No manual setup required!
Complete deployment from scratch:
# 1. Deploy PostgreSQL
juju deploy postgresql --channel 16/stable --base ubuntu@24.04
# 2. Deploy web server
juju deploy concourse-ci-machine web --config mode=web
# 3. Deploy GPU-enabled worker
juju deploy concourse-ci-machine worker \
--config mode=worker \
--config compute-runtime=cuda
# 4. Add GPU to LXD container (only manual step for localhost cloud)
lxc config device add <container-name> gpu0 gpu
# Example: lxc config device add juju-abc123-0 gpu0 gpu
# 5. Create relations
juju relate web:postgresql postgresql:database
juju relate web:tsa worker:flight
# 6. Check status
juju status worker
# Expected: "Worker ready (GPU: 1x NVIDIA)"# Enable NVIDIA GPU on already deployed worker
juju config worker compute-runtime=cuda
# Enable AMD GPU on already deployed worker
juju config worker compute-runtime=rocm
# Disable GPU
juju config worker compute-runtime=noneIf deploying on LXD (localhost cloud), add GPU to the container:
# Find your worker container name
lxc list | grep juju
# Add GPU device (requires container restart)
lxc config device add <container-name> gpu0 gpu
# Example:
lxc config device add juju-abc123-0 gpu0 gpuEverything else is automated! The charm will:
- ✅ Install nvidia-container-toolkit
- ✅ Create GPU wrapper script
- ✅ Configure runtime for GPU passthrough
- ✅ Set up automatic GPU device injection
| Option | Default | Description |
|---|---|---|
compute-runtime |
none |
GPU compute runtime: none, cuda (NVIDIA), or rocm (AMD) |
gpu-device-ids |
all |
GPU devices to expose: "all" or "0,1,2" |
When GPU is enabled, workers are automatically tagged:
cuda- NVIDIA GPU worker (whencompute-runtime=cuda)rocm- AMD GPU worker (whencompute-runtime=rocm)gpu-count=N- Number of GPUs availablegpu-devices=0,1- Specific device IDs (if configured)
Create a pipeline that targets GPU-enabled workers:
jobs:
- name: train-model-nvidia
plan:
- task: gpu-training
tags: [cuda] # Target NVIDIA GPU workers
config:
platform: linux
image_resource:
type: registry-image
source:
repository: nvidia/cuda
tag: 13.1.0-runtime-ubuntu24.04
run:
path: sh
args:
- -c
- |
# Verify GPU access
nvidia-smi
# Run your GPU workload
python train.py --use-gpu
- name: gpu-benchmark
plan:
- task: benchmark
tags: [cuda, gpu-count=1] # More specific targeting
config:
platform: linux
image_resource:
type: registry-image
source:
repository: nvidia/cuda
tag: 13.1.0-base-ubuntu24.04
run:
path: nvidia-smi# Check worker status
juju status worker
# Should show: "Worker ready (GPU: 1x NVIDIA)"
# Verify GPU tags in Concourse
fly -t local workers
# Worker should show tags: cuda, gpu-count=1nvidia/cuda:13.1.0-base-ubuntu24.04- CUDA base (~174MB)nvidia/cuda:13.1.0-runtime-ubuntu24.04- CUDA runtime (~1.38GB)nvidia/cuda:13.1.0-devel-ubuntu24.04- CUDA development (~3.39GB)tensorflow/tensorflow:latest-gpu- TensorFlow with GPUpytorch/pytorch:latest- PyTorch with GPU
Worker shows "GPU enabled but no GPU detected"
- Verify GPU present:
nvidia-smi - Check driver installation:
nvidia-smi
Container cannot access GPU
- Verify nvidia-container-runtime:
which nvidia-container-runtime - Check containerd config:
cat /etc/containerd/config.toml - Restart containerd:
sudo systemctl restart containerd
GPU not showing in task
- Ensure using NVIDIA CUDA base image
- Run
nvidia-smiin task to debug - Check worker tags:
fly -t local workers
Concourse workers can utilize AMD GPUs with ROCm for ML/AI workloads, GPU-accelerated computations, and HPC tasks.
- AMD GPU hardware on the host machine (e.g., Radeon RX 6000/7000 series, MI series)
- AMD GPU drivers installed on the host
- ROCm tools (optional, for host-side management)
- For LXD/containers: GPU passthrough configured (see below)
Note: The charm automatically installs amd-container-toolkit, generates CDI specifications, and configures the ROCm runtime. No manual setup required!
Complete deployment from scratch:
# 1. Deploy PostgreSQL
juju deploy postgresql --channel 16/stable --base ubuntu@24.04
# 2. Deploy web server
juju deploy concourse-ci-machine web --config mode=web
# 3. Deploy ROCm-enabled worker
juju deploy concourse-ci-machine worker \
--config mode=worker \
--config compute-runtime=rocm
# 4. Add AMD GPU to LXD container (use specific GPU ID for multi-GPU systems)
# Note: On systems with multiple GPU vendors, use 'id=N' to target specific GPU
lxc query /1.0/resources | jq '.gpu.cards[] | {id: .drm.id, vendor, driver, product_id, vendor_id, pci_address}'
lxc config device add <container-name> gpu1 gpu id=1
# Example: lxc config device add juju-abc123-0 gpu1 gpu id=1
# 5. Create relations
juju relate web:postgresql postgresql:database
juju relate web:tsa worker:flight
# 6. Check status
juju status worker
# Expected: "Worker ready (v7.14.2) (GPU: 1x AMD)"# Enable AMD GPU on already deployed worker
juju config worker compute-runtime=rocmIf deploying on LXD (localhost cloud), add AMD GPU to the container:
# Find available GPUs and their IDs
lxc query /1.0/resources | jq '.gpu.cards[] | {id: .drm.id, vendor, driver, product_id, vendor_id, pci_address}'
# Output example:
# {
# "id": 0,
# "vendor": "NVIDIA Corporation",
# "driver": "nvidia",
# "product": "GA104 [GeForce RTX 3070]"
# }
# {
# "id": 1,
# "vendor": "Advanced Micro Devices, Inc. [AMD/ATI]",
# "driver": "amdgpu",
# "product": "Navi 31 [Radeon RX 7900 XT]"
# }
# Add AMD GPU device using specific ID (GPU 1 in this example)
lxc config device add <container-name> gpu1 gpu id=1
# Add /dev/kfd device (required for ROCm compute)
lxc config device add <container-name> kfd unix-char source=/dev/kfd path=/dev/kfd
# Example:
lxc config device add juju-abc123-0 gpu1 gpu id=1
lxc config device add juju-abc123-0 kfd unix-char source=/dev/kfd path=/dev/kfd- Generic
lxc config device add ... gpupasses ALL GPUs to the container - This causes ambiguity when both NVIDIA and AMD GPUs are present
- Always use
id=Nto target the specific AMD GPU - GPU ID corresponds to
/dev/dri/cardN(e.g.,id=1→/dev/dri/card1)
/dev/kfd(Kernel Fusion Driver) is required for ROCm compute workloads- Without
/dev/kfd, GPU monitoring works but PyTorch/TensorFlow cannot use the GPU - Must be added as separate device after GPU passthrough
- Discrete GPUs (fully supported): RX 6000/7000 series, Radeon Pro, Instinct MI series - work natively
- Integrated GPUs (requires workaround): APUs like Phoenix1 (gfx1103), Renoir, Cezanne
- ✅ CAN work with
HSA_OVERRIDE_GFX_VERSIONenvironment variable (see below) ⚠️ Lower performance due to shared system memory- Recommended for development/testing, not production ML workloads
- ✅ CAN work with
- Check ROCm compatibility: https://rocm.docs.amd.com/en/latest/release/gpu_os_support.html
Everything else is automated! The charm will:
- ✅ Install amd-container-toolkit
- ✅ Generate CDI specification
- ✅ Install rocm-smi for GPU monitoring
- ✅ Create AMD GPU wrapper script
- ✅ Configure runtime for ROCm GPU passthrough
- ✅ Set up automatic GPU device injection into task containers (including /dev/kfd)
| Option | Default | Description |
|---|---|---|
compute-runtime |
none |
GPU compute runtime: none, cuda (NVIDIA), or rocm (AMD) |
gpu-device-ids |
all |
GPU devices to expose: "all" or "0,1,2" |
When ROCm GPU is enabled, workers are automatically tagged:
rocm- AMD GPU worker (whencompute-runtime=rocm)gpu-count=N- Number of AMD GPUs availablegpu-devices=0,1- Specific device IDs (if configured)
Create a pipeline that targets ROCm-enabled workers:
jobs:
- name: rocm-benchmark
plan:
- task: gpu-test
tags: [rocm] # Target ROCm workers
config:
platform: linux
image_resource:
type: registry-image
source:
repository: rocm/dev-ubuntu-24.04
tag: latest
run:
path: sh
args:
- -c
- |
# Verify GPU access
rocm-smi
# Check available devices
ls -la /dev/dri/
# Run your ROCm workload
python train.py --rocm
- name: amd-gpu-compute
plan:
- task: compute
tags: [rocm, gpu-count=1] # More specific targeting
config:
platform: linux
image_resource:
type: registry-image
source:
repository: rocm/pytorch
tag: latest
run:
path: sh
args:
- -c
- |
# For integrated AMD GPUs (Phoenix1/gfx1103, etc.)
export HSA_OVERRIDE_GFX_VERSION=11.0.0
python3 -c "import torch; print('CUDA available:', torch.cuda.is_available()); x = torch.rand(5,3).cuda(); print('Result:', x * 2)"# Check worker status
juju status worker
# Should show: "Worker ready (v7.14.2) (GPU: 1x AMD)"
# Verify GPU tags in Concourse
fly -t local workers
# Worker should show tags: rocm, gpu-count=1
# Test GPU access in a task
fly -t local execute -c test-gpu.yml --tag=rocmrocm/dev-ubuntu-24.04:latest- ROCm development base (~1.1GB)rocm/tensorflow:latest- TensorFlow with ROCmrocm/pytorch:latest- PyTorch with ROCm (~6GB, includes PyTorch 2.9.1+rocm7.2.0)rocm/rocm-terminal:latest- ROCm with utilities
Integrated AMD GPUs (APUs) like Phoenix1 (gfx1103), Renoir, and Cezanne are not officially supported by ROCm, but can work with the HSA_OVERRIDE_GFX_VERSION environment variable.
Why it's needed:
- ROCm checks GPU architecture (GFX version) and rejects unsupported GPUs
- Integrated GPUs often use newer GFX versions without full ROCm kernel support
- Override tells ROCm to use kernels from a supported architecture
How to use:
jobs:
- name: pytorch-rocm-integrated-gpu
plan:
- task: test-gpu
tags: [rocm]
config:
platform: linux
image_resource:
type: registry-image
source:
repository: rocm/pytorch
tag: latest
run:
path: sh
args:
- -c
- |
# Set override for gfx1103 (Phoenix1) - use gfx11.0.0 kernels
export HSA_OVERRIDE_GFX_VERSION=11.0.0
# Your PyTorch code
python3 -c "
import torch
print('CUDA (ROCm) available:', torch.cuda.is_available())
x = torch.rand(5, 3).cuda()
y = x * 2
print('GPU computation succeeded!')
print('Result:', y)
"Override values for common integrated GPUs:
| GPU Architecture | GFX Version | Override Value |
|---|---|---|
| Phoenix1 (780M) | gfx1103 | 11.0.0 |
| Renoir (4000 series) | gfx90c | 9.0.0 |
| Cezanne (5000 series) | gfx90c | 9.0.0 |
Limitations:
⚠️ Uses suboptimal kernels → lower performance than discrete GPUs⚠️ Shared system memory → memory bandwidth limitations⚠️ May not support all ROCm features- ✅ Good for development, testing, and light compute workloads
- ❌ Not recommended for production ML training
Testing on host (before deploying pipeline):
# Test if your integrated GPU works with override
docker run --rm -it --device=/dev/kfd --device=/dev/dri \
rocm/pytorch:latest sh -c "
export HSA_OVERRIDE_GFX_VERSION=11.0.0
python3 -c 'import torch; x = torch.rand(5,3).cuda(); print(x * 2)'
"Worker shows "GPU enabled but no GPU detected"
- Verify AMD GPU present:
lspci | grep -i amd - Check driver:
lsmod | grep amdgpu - Check devices:
ls -la /dev/dri/
Container cannot access AMD GPU
- Verify LXD device passthrough:
lxc config device show <container-name> - Check devices in container:
juju ssh worker/0 -- ls -la /dev/dri/ - Ensure using correct GPU ID on multi-GPU systems
- Check /dev/kfd: Must be present for compute workloads
PyTorch/TensorFlow shows "CUDA (ROCm) available: False"
- Most common: Missing
/dev/kfddevice- Check in container:
ls -la /dev/kfd - Add if missing:
lxc config device add <container-name> kfd unix-char source=/dev/kfd path=/dev/kfd
- Check in container:
- Integrated GPU without override: Try
HSA_OVERRIDE_GFX_VERSIONworkaround (see above)- Verify GPU model:
lspci | grep -i vga - Check PCI ID:
cat /sys/class/drm/card*/device/uevent | grep PCI_ID - For gfx1103 (Phoenix1):
export HSA_OVERRIDE_GFX_VERSION=11.0.0
- Verify GPU model:
- HSA_STATUS_ERROR_OUT_OF_RESOURCES: Usually indicates unsupported GPU or missing drivers
rocm-smi works but PyTorch doesn't detect GPU
- This indicates
/dev/kfdis missing or inaccessible rocm-smionly needs/dev/dri/*for monitoring- PyTorch needs
/dev/kfdfor compute operations - Solution: Add
/dev/kfddevice to container (see above)
rocm-smi not working in container
- Ensure using ROCm-enabled image (
rocm/dev-ubuntu-24.04or similar) - Check device permissions:
ls -la /dev/dri/in task - ROCm version mismatch: Host and container ROCm versions should be compatible
GPU not showing in task
- Ensure using ROCm-enabled image
- Run
ls -la /dev/dri/in task to debug device availability - Check worker tags:
fly -t local workers - Verify task uses correct tags:
--tag=rocm
Multi-GPU system issues
- If worker detects wrong GPU type, check LXD device configuration
- Use specific GPU ID:
lxc config device add ... gpu id=1(not genericgpu) - Query GPU IDs:
lxc query /1.0/resources | jq '.gpu.cards[] | {id: .drm.id, vendor, driver, product_id, vendor_id, pci_address}'
Integrated GPU performance issues
- If compute works but is slow, this is expected (shared memory bandwidth)
- Consider discrete GPU for production workloads
- Use integrated GPU for testing/development only
- Monitor memory usage: integrated GPUs share system RAM
Cause: Usually means PostgreSQL relation is missing (for web units).
Fix:
juju relate concourse-ci:postgresql postgresql:databaseCheck logs:
juju debug-log --include concourse-ci/0 --replay --no-tail | tail -50
# Or SSH and check systemd
juju ssh concourse-ci/0
sudo journalctl -u concourse-server -fCommon issues:
- Database not configured: Check PostgreSQL relation
- Auth configuration missing: Check
/var/lib/concourse/config.env - Port already in use: Change
web-portconfig
Check worker status:
juju ssh concourse-ci/1 # Worker unit
sudo systemctl status concourse-worker
sudo journalctl -u concourse-worker -fCommon issues:
- TSA keys not generated: Check
/var/lib/concourse/keys/ - Containerd not running:
sudo systemctl status containerd - Network connectivity: Ensure workers can reach web server
juju ssh concourse-ci/0
sudo cat /var/lib/concourse/config.env┌─────────────────────────────────────────────────────────┐
│ Web Server │
│ ┌────────────┐ ┌────────────┐ ┌────────────┐ │
│ │ Web UI/API │ │ TSA │ │ Scheduler │ │
│ └────────────┘ └────────────┘ └────────────┘ │
│ │ │ │ │
│ └──────────────┴─────────────────┘ │
│ │ │
└────────────────────────┼────────────────────────────────┘
│
│ (SSH over TSA)
│
┌────────────────┴────────────────┐
│ │
┌─────▼──────┐ ┌─────▼──────┐
│ Worker 1 │ │ Worker 2 │
│┌──────────┐│ │┌──────────┐│
││Container ││ ││Container ││
││Runtime ││ ││Runtime ││
│└──────────┘│ │└──────────┘│
└────────────┘ └────────────┘
... see https://concourse-ci.org/internals.html
/opt/concourse/: Concourse binaries/var/lib/concourse/: Data and configuration/var/lib/concourse/keys/: TSA and worker keys/var/lib/concourse/worker/: Worker runtime directory
concourse-server.service: Web server (runs asconcourseuser)concourse-worker.service: Worker (runs asroot)
# Install charmcraft
sudo snap install charmcraft --classic
# Clone repository
git clone https://github.com/fourdollars/concourse-ci-machine.git
cd concourse-ci-machine
# Build charm
charmcraft pack
# Deploy locally
juju deploy ./concourse-ci-machine_amd64.charmconcourse-ci-machine/
├── src/
│ └── charm.py # Main charm logic
├── lib/
│ ├── concourse_common.py # Shared utilities
│ ├── concourse_installer.py # Installation logic
│ ├── concourse_web.py # Web server management
│ └── concourse_worker.py # Worker management
├── metadata.yaml # Charm metadata
├── config.yaml # Configuration options
├── charmcraft.yaml # Build configuration
├── actions.yaml # Charm actions
└── README.md # This file
- Change default password immediately:
fly -t prod login -c http://<ip>:8080 -u admin -p admin
# Use web UI to change password in team settings-
Configure proper authentication:
- Set up OAuth providers (GitHub, GitLab, etc.)
- Use Juju secrets for credentials
- Enable HTTPS with reverse proxy (nginx/haproxy)
-
Network security:
- Use Juju spaces to isolate networks
- Configure firewall rules to restrict access
- Use private PostgreSQL endpoints
Database credentials are passed securely via Juju relations, not environment variables.
Contributions are welcome! Please:
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Make your changes
- Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
This charm is licensed under the Apache 2.0 License. See LICENSE for details.
- Concourse CI: https://concourse-ci.org/
- Documentation: https://concourse-ci.org/docs.html
- Charm Hub: https://charmhub.io/concourse-ci
- Source Code: https://github.com/fourdollars/concourse-ci-machine
- Issue Tracker: https://github.com/fourdollars/concourse-ci-machine/issues
- Juju: https://juju.is/
- Community Support: Open an issue on GitHub
- Commercial Support: Contact maintainers
- Concourse CI team for the amazing CI/CD system
- Canonical for Juju and the Operator Framework
- Contributors to this charm