Skip to content

feat: Add enterprise air-gap and mirror support for toolchain downloads #208

@avrabe

Description

@avrabe

Enterprise Air-Gap & Mirror Support - Solution Proposal

Date: 2025-11-14
Problem: rules_wasm_component cannot build in air-gapped/corporate environments
Impact: Blocks enterprise adoption
Effort: 2-3 weeks


Problem Statement

Current Situation (CRITICAL BLOCKER)

Downloads Required: ~554 MB from 5 external registries

  • GitHub (10 tools): wasm-tools, wit-bindgen, wac, wkg, wasmtime, wasi-sdk, wizer, TinyGo, Binaryen, wasmsign2
  • npmjs.org: jco + dependencies
  • nodejs.org: Node.js runtime
  • go.dev: Go SDK
  • registry.wasm.io: WKG packages (runtime)

Enterprise Requirements NOT Met:

  • ❌ Air-gap builds (no internet access)
  • ❌ Corporate proxy with authentication
  • ❌ Custom internal mirrors (JFrog Artifactory, Sonatype Nexus, Harbor)
  • ❌ Security scanning before downloads
  • ❌ License compliance verification
  • ❌ Download audit trail for compliance

Solution Analysis: 4 Approaches

Approach 1: Environment Variable Mirror Override ⭐ RECOMMENDED

Design:

# Existing code:
url = f"https://github.com/{repo}/releases/download/v{version}/{asset}"

# New code:
mirror_base = os.getenv("BAZEL_WASM_GITHUB_MIRROR", "https://github.com")
url = f"{mirror_base}/{repo}/releases/download/v{version}/{asset}"

Pros:

  • ✅ Minimal code changes (20 lines total)
  • ✅ Backward compatible (defaults to public URLs)
  • ✅ Works with ANY mirror (JFrog, Minio, Harbor, S3)
  • ✅ Per-registry configuration (GitHub, npm, Go separately)
  • ✅ Immediately enables air-gap with distfiles
  • ✅ No Bazel API changes needed

Cons:

  • ⚠️ Requires environment variables (must document)
  • ⚠️ Mirror must replicate exact GitHub URL structure
  • ⚠️ No automatic fallback to public if mirror fails

Implementation:

# toolchains/secure_download.bzl
def secure_download_tool(repository_ctx, tool_name, version, platform):
    # NEW: Read mirror configuration
    github_mirror = repository_ctx.os.environ.get(
        "BAZEL_WASM_GITHUB_MIRROR",
        "https://github.com"
    )

    # Construct URL with mirror
    url = construct_url(github_mirror, tool_info, version, platform)

    # Download with checksum verification (unchanged)
    repository_ctx.download_and_extract(
        url = url,
        sha256 = checksum,
        # ...
    )

Environment Variables:

export BAZEL_WASM_GITHUB_MIRROR="https://artifacts.corp.com/github-mirror"
export BAZEL_NPM_REGISTRY="https://npm.corp.com"
export BAZEL_GO_MIRROR="https://go-mirror.corp.com"
export BAZEL_NODEJS_MIRROR="https://nodejs-mirror.corp.com"

Corporate Setup Steps:

  1. Mirror GitHub releases to JFrog: https://jfrog.corp.com/github/{owner}/{repo}/...
  2. Set env var: BAZEL_WASM_GITHUB_MIRROR=https://jfrog.corp.com/github
  3. Build works identically, just different source

Effort: 3-5 days


Approach 2: Bazel Repository Cache + Distfiles

Design: Leverage Bazel's --repository_cache and --distdir

Pros:

  • ✅ Uses native Bazel features
  • ✅ No code changes needed
  • ✅ Works for all repository downloads

Cons:

  • ❌ Requires manual pre-population of cache
  • ❌ Complex to distribute cache to air-gapped systems
  • ❌ No built-in cache mirroring
  • ❌ Doesn't solve npm/Go SDK downloads

Usage:

# Step 1: Build on internet-connected machine
bazel build --repository_cache=/tmp/bazel-cache //...

# Step 2: Copy cache to air-gapped machine
rsync -av /tmp/bazel-cache airgap-server:/opt/bazel-cache

# Step 3: Build on air-gapped machine
bazel build --repository_cache=/opt/bazel-cache //...

Issues:

  • Cache isn't human-readable (content-addressed by SHA256)
  • npm packages still require internet
  • Go SDK still requires internet
  • No way to pre-populate cache without building first

Effort: 1 week (documentation + testing)


Approach 3: Bazel Module Extension with Mirror Config

Design: Add mirror configuration to Bazel module extension

Pros:

  • ✅ Type-safe configuration
  • ✅ Per-project customization
  • ✅ Version-controlled configuration
  • ✅ Better UX than environment variables

Cons:

  • ❌ Requires MODULE.bazel changes (users must update)
  • ❌ Not backward compatible
  • ❌ More complex implementation
  • ❌ Harder to apply globally across projects

Usage:

# MODULE.bazel (user configuration)
wasm_toolchain = use_extension("//wasm:extensions.bzl", "wasm_toolchain")

wasm_toolchain.configure_mirrors(
    github_base = "https://artifacts.corp.com/github",
    npm_registry = "https://npm.corp.com",
    go_mirror = "https://go.corp.com",
    nodejs_mirror = "https://nodejs.corp.com",
)

wasm_toolchain.register(name = "wasm_tools")

Implementation:

# wasm/extensions.bzl (new file)
def _configure_mirrors_impl(ctx):
    ctx.file("mirrors.bzl", content = """
GITHUB_MIRROR = "{github}"
NPM_REGISTRY = "{npm}"
GO_MIRROR = "{go}"
NODEJS_MIRROR = "{nodejs}"
""".format(
        github = ctx.attr.github_base,
        npm = ctx.attr.npm_registry,
        go = ctx.attr.go_mirror,
        nodejs = ctx.attr.nodejs_mirror,
    ))

configure_mirrors = tag_class(attrs = {
    "github_base": attr.string(default = "https://github.com"),
    "npm_registry": attr.string(default = "https://registry.npmjs.org"),
    # ...
})

Effort: 1-2 weeks


Approach 4: Vendoring Script + Offline Mode

Design: Pre-download all dependencies to third_party/ directory

Pros:

  • ✅ Complete offline capability
  • ✅ No runtime configuration needed
  • ✅ Works identically on all machines
  • ✅ Audit trail (vendored files in repo)

Cons:

  • ❌ Large repo size (~554 MB)
  • ❌ Complex vendoring script
  • ❌ Must re-vendor for version updates
  • ❌ Git doesn't handle large binaries well

Usage:

# Step 1: Vendor all toolchains (internet required)
bazel run //tools:vendor_toolchains

# Step 2: Commit vendored files
git add third_party/toolchains/
git commit -m "vendor: toolchain binaries for v1.0.0"

# Step 3: Build offline
bazel build --config=offline //...

Implementation:

# tools/vendor_toolchains.py
def vendor_all_toolchains(output_dir):
    registry = load_registry()

    for tool in registry.tools:
        for version in tool.versions:
            for platform in tool.platforms:
                url = construct_url(tool, version, platform)
                checksum = get_checksum(tool, version, platform)

                # Download to third_party/
                download_file(
                    url=url,
                    output=f"{output_dir}/{tool}/{version}/{platform}",
                    verify_sha256=checksum
                )

Effort: 2 weeks


Recommended Solution: Hybrid Approach

Combine Approach 1 (env var mirrors) + Approach 4 (vendoring script) for maximum flexibility.

Architecture

┌─────────────────────────────────────────────────────────┐
│              Build Environment Detection                 │
├─────────────────────────────────────────────────────────┤
│                                                           │
│  1. Check BAZEL_WASM_OFFLINE=1                           │
│     ├─ YES → Use vendored files in third_party/          │
│     └─ NO  → Continue to step 2                          │
│                                                           │
│  2. Check BAZEL_WASM_GITHUB_MIRROR set?                  │
│     ├─ YES → Download from corporate mirror              │
│     └─ NO  → Download from public GitHub                 │
│                                                           │
│  3. Download with SHA256 verification                    │
│     ├─ SUCCESS → Cache in Bazel repository cache         │
│     └─ FAIL → Error with troubleshooting hints           │
│                                                           │
└─────────────────────────────────────────────────────────┘

Usage Scenarios

Scenario 1: Public Internet (Default)

# No configuration needed
bazel build //examples/basic:hello_component
# Downloads from github.com, npmjs.org, etc.

Scenario 2: Corporate Mirror

# .bazelrc or CI/CD environment
export BAZEL_WASM_GITHUB_MIRROR=https://jfrog.corp.com/github
export BAZEL_NPM_REGISTRY=https://npm.corp.com

bazel build //examples/basic:hello_component
# Downloads from corporate mirrors

Scenario 3: Air-Gap (Vendored)

# Step 1: Vendor on internet-connected machine
bazel run //tools:vendor_toolchains -- --platform=linux_amd64,darwin_arm64

# Step 2: Transfer repo to air-gapped machine

# Step 3: Build offline
export BAZEL_WASM_OFFLINE=1
bazel build //examples/basic:hello_component
# Uses third_party/toolchains/ (no internet required)

Scenario 4: Mixed (Partial Air-Gap)

# Use vendored files + corporate mirror for new tools
export BAZEL_WASM_OFFLINE=prefer  # Try vendored first, fallback to mirror
export BAZEL_WASM_GITHUB_MIRROR=https://jfrog.corp.com/github

bazel build //examples/basic:hello_component

Implementation Plan

Phase 1: Environment Variable Mirrors (Week 1)

  • Add mirror URL environment variable support to secure_download.bzl
  • Add npm registry configuration to jco_toolchain.bzl
  • Add Go mirror configuration to tinygo_toolchain.bzl
  • Add Node.js mirror configuration to jco_toolchain.bzl
  • Update all toolchain files to use configurable mirrors
  • Add retry logic with exponential backoff
  • Document mirror setup for JFrog, Nexus, Harbor

Phase 2: Vendoring Support (Week 2)

  • Create tools/vendor_toolchains.py script
  • Add offline mode detection to secure_download.bzl
  • Support file:// URLs in download infrastructure
  • Add third_party/toolchains/.gitignore (optional vendoring)
  • Test complete offline build workflow
  • Document vendoring process

Phase 3: Testing & Documentation (Week 3)

  • Test with JFrog Artifactory setup
  • Test with air-gap environment
  • Test with corporate proxy
  • Write enterprise deployment guide
  • Create mirror setup scripts
  • Add troubleshooting documentation

Proof of Concept: Environment Variable Mirrors

Code Changes Required

File 1: toolchains/secure_download.bzl (20 lines changed)

def secure_download_tool(repository_ctx, tool_name, version, platform):
    """Download and verify tool with configurable mirror support."""

    # NEW: Read mirror configuration from environment
    github_mirror = repository_ctx.os.environ.get(
        "BAZEL_WASM_GITHUB_MIRROR",
        "https://github.com"  # Default to public GitHub
    )

    # Load tool info from registry
    tool_info = get_tool_info(tool_name)
    checksum = get_tool_checksum(tool_name, version, platform)

    # Construct URL with configurable mirror
    if github_mirror != "https://github.com":
        # Corporate mirror: replace github.com with mirror
        url = construct_mirror_url(github_mirror, tool_info, version, platform)
    else:
        # Public GitHub: use standard URL construction
        url = construct_github_url(tool_info, version, platform)

    # Download with verification (unchanged)
    repository_ctx.download_and_extract(
        url = url,
        sha256 = checksum,
        type = archive_type,
    )

File 2: toolchains/jco_toolchain.bzl (15 lines changed)

def _jco_toolchain_impl(repository_ctx):
    # NEW: Read NPM registry from environment
    npm_registry = repository_ctx.os.environ.get(
        "BAZEL_NPM_REGISTRY",
        "https://registry.npmjs.org"
    )

    # NEW: Read Node.js mirror from environment
    nodejs_mirror = repository_ctx.os.environ.get(
        "BAZEL_NODEJS_MIRROR",
        "https://nodejs.org"
    )

    # Download Node.js from configurable mirror
    node_url = f"{nodejs_mirror}/dist/v{node_version}/node-v{node_version}-{platform}.tar.gz"

    # Configure npm to use corporate registry
    npm_config = f"registry={npm_registry}\n"
    repository_ctx.file(".npmrc", content=npm_config)

File 3: .bazelrc (documentation)

# Corporate mirror configuration (optional)
# Uncomment and customize for your environment:

# build --repo_env=BAZEL_WASM_GITHUB_MIRROR=https://artifacts.corp.com/github
# build --repo_env=BAZEL_NPM_REGISTRY=https://npm.corp.com
# build --repo_env=BAZEL_GO_MIRROR=https://go-mirror.corp.com
# build --repo_env=BAZEL_NODEJS_MIRROR=https://nodejs-mirror.corp.com

# Air-gap mode (use vendored files)
# build:offline --repo_env=BAZEL_WASM_OFFLINE=1

Testing the POC

Test 1: Mirror URL Construction

# Set mirror
export BAZEL_WASM_GITHUB_MIRROR=https://jfrog.corp.com/github-mirror

# Verify URL construction
bazel build --repository_cache=/tmp/test-cache //examples/basic:hello_component 2>&1 | grep "Downloading"
# Expected: https://jfrog.corp.com/github-mirror/bytecodealliance/wasm-tools/...
# NOT: https://github.com/bytecodealliance/wasm-tools/...

Test 2: Fallback to Default

# No mirror set
unset BAZEL_WASM_GITHUB_MIRROR

# Should use public GitHub
bazel build //examples/basic:hello_component 2>&1 | grep "Downloading"
# Expected: https://github.com/...

Test 3: NPM Registry Override

export BAZEL_NPM_REGISTRY=https://npm.corp.com

# Check npm configuration
bazel build //toolchains/jco:jco_toolchain --repository_cache=/tmp/test
cat $(bazel info output_base)/external/jco_toolchain/.npmrc
# Expected: registry=https://npm.corp.com

Corporate Mirror Setup Guide

JFrog Artifactory

Step 1: Create Remote Repository

# Artifactory → Repositories → New Remote Repository
Repository Type: Generic
Repository Key: github-releases
URL: https://github.com

Step 2: Configure URL Rewriting

// Artifactory → Remote Repositories → github-releases → Advanced
Remote Repository URL: https://github.com
Path Pattern: **/*

Step 3: Set Environment Variable

export BAZEL_WASM_GITHUB_MIRROR=https://artifactory.corp.com/artifactory/github-releases

Sonatype Nexus

Step 1: Create Raw Proxy Repository

# Nexus → Repositories → Create Repository → raw (proxy)
Name: github-proxy
Remote Storage: https://github.com

Step 2: Configure

export BAZEL_WASM_GITHUB_MIRROR=https://nexus.corp.com/repository/github-proxy

Harbor (OCI Registry)

Challenge: Harbor is OCI-only, GitHub releases are not OCI
Solution: Use Harbor for WASM components only, different mirror for binaries

export BAZEL_WASM_GITHUB_MIRROR=https://storage.corp.com/github-mirror  # S3/Minio
export WKG_REGISTRY=https://harbor.corp.com  # For WASM components

Risk Assessment

Technical Risks

Risk Likelihood Impact Mitigation
Mirror URL format mismatch Medium High Document exact URL structure required
npm registry incompatibility Low Medium Test with common registries (Verdaccio, Nexus)
Checksum verification fails Low High Mirror must preserve exact file contents
Environment variable not propagated Medium Medium Document bazel --repo_env usage

Organizational Risks

Risk Likelihood Impact Mitigation
Corporate IT blocks setup Low High Provide security/compliance docs
Mirror maintenance burden Medium Medium Document automated mirroring
Users don't read docs High Low Fail with helpful error messages

Success Criteria

Must Have:

  • ✅ Builds succeed with BAZEL_WASM_GITHUB_MIRROR set to test mirror
  • ✅ Builds succeed in complete air-gap with vendored files
  • ✅ Backward compatible (no env vars = current behavior)
  • ✅ Works with JFrog Artifactory
  • ✅ Works with npm registries (Verdaccio/Nexus)

Should Have:

  • ✅ Retry logic for transient failures
  • ✅ Helpful error messages for mirror misconfiguration
  • ✅ Documentation for common corporate setups
  • ✅ Vendoring script for air-gap preparation

Nice to Have:

  • Mirror health checking
  • Automatic fallback to public if mirror fails
  • Download audit logging

Estimated Effort

Phase Effort Risk
Env var mirrors 3-5 days Low
Vendoring support 5-7 days Medium
Testing & docs 3-5 days Low
Total 11-17 days Low-Medium

Recommendation

Implement Hybrid Approach (Env Var + Vendoring):

  1. Start with Phase 1 (env var mirrors) - delivers 80% of value in 1 week
  2. Add Phase 2 (vendoring) - completes air-gap story
  3. Polish in Phase 3 - documentation and edge cases

Why This Approach:

  • ✅ Minimal code changes (proven pattern used by Bazel rules_docker, rules_oci)
  • ✅ Backward compatible (zero breaking changes)
  • ✅ Flexible (works with any mirror system)
  • ✅ Quick to implement (2-3 weeks total)
  • ✅ Addresses root enterprise blocker

Alternative If Timeline Critical:

  • Implement only Phase 1 (env var mirrors) in 1 week
  • Document manual vendoring workaround
  • Add formal vendoring support later based on demand

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions