diff --git a/devlog/2025-03-sandboxing.md b/devlog/2025-03-sandboxing.md new file mode 100644 index 000000000..2dce780d0 --- /dev/null +++ b/devlog/2025-03-sandboxing.md @@ -0,0 +1,135 @@ +# Sandboxing POC for Source Declarative Manifest + +## Overview + +This document describes the proof-of-concept (POC) implementation of two sandboxing solutions for the `source-declarative-manifest` connector: + +1. **Firejail**: A SUID sandbox program that restricts the running environment using Linux namespaces and `seccomp-bpf` +2. **gVisor**: A user-space kernel that implements a substantial portion of the Linux system call interface + +The implementation is available in [PR #399](https://github.com/airbytehq/airbyte-python-cdk/pull/399). + +## Implementation Details + +Both POC implementations: + +- Start from the `airbyte/source-declarative-manifest` Docker image +- Add the respective sandboxing solution +- Wrap the original entry point with the sandboxing solution +- Preserve all command-line arguments and functionality + +### Firejail Implementation + +Firejail provides a lightweight sandboxing solution using Linux namespaces and `seccomp-bpf`. The implementation: + +- Installs Firejail via apt-get +- Creates a wrapper script that runs the original entry point through Firejail +- Uses the `--noprofile`, `--quiet`, and `--private` flags for basic isolation + +Key benefits of Firejail: + +- Lightweight with minimal overhead +- Easy to configure with profiles +- Mature and well-documented + +Resources: + +- [Firejail Documentation](https://firejail.wordpress.com/) +- [Firejail GitHub Repository](https://github.com/netblue30/firejail) + +### gVisor Implementation + +gVisor provides a more comprehensive sandboxing solution by implementing a user-space kernel. The implementation: + +- Installs gVisor's runsc via the official repository +- Creates a wrapper script that attempts to run the original entry point through runsc +- Falls back to direct execution if runsc fails (due to permission constraints in Docker) + +The gVisor implementation uses the OCI bundle approach with runsc: + +1. Creates a temporary directory for the OCI bundle +2. Generates a minimal config.json for the OCI bundle +3. Attempts to run the command with `runsc -TESTONLY-unsafe-nonroot run` +4. Falls back to direct execution if runsc fails + +Key benefits of gVisor: + +- Strong isolation through a user-space kernel +- Compatible with OCI runtime specification +- Active development by Google + +Resources: + +- [gVisor Documentation](https://gvisor.dev/docs/) +- [gVisor GitHub Repository](https://github.com/google/gvisor) +- [OCI Runtime Specification](https://github.com/opencontainers/runtime-spec) + +## Testing Results + +Both Docker images were built and tested locally with the `spec` command to verify basic functionality: + +### Firejail Test Results + +```bash +$ cd docker/sandbox-poc +... +$ docker build -f Dockerfile.firejail -t airbyte/source-declarative-manifest-firejail . +... +$ docker run --rm airbyte/source-declarative-manifest-firejail spec +{"type":"SPEC","spec":{"connectionSpecification":{"$schema":"http://json-schema.org/draft-07/schema#","title":"Low-code source spec","type":"object","required":["__injected_declarative_manifest"],"additionalProperties":true,"properties":{"__injected_declarative_manifest":{"title":"Low-code manifest","type":"object","description":"The low-code manifest that defines the components of the source."}}},"documentationUrl":"https://docs.airbyte.com/integrations/sources/low-code","supportsNormalization":false,"supportsDBT":false}} +``` + +### gVisor Test Results + +```bash +$ cd docker/sandbox-poc +... +$ docker build -f Dockerfile.gvisor -t airbyte/source-declarative-manifest-gvisor . +... +$ docker run --rm airbyte/source-declarative-manifest-gvisor spec +running container: creating container: creating container root directory "/var/run/runsc": mkdir /var/run/runsc: permission denied +{"type":"SPEC","spec":{"connectionSpecification":{"$schema":"http://json-schema.org/draft-07/schema#","title":"Low-code source spec","type":"object","required":["__injected_declarative_manifest"],"additionalProperties":true,"properties":{"__injected_declarative_manifest":{"title":"Low-code manifest","type":"object","description":"The low-code manifest that defines the components of the source."}}},"documentationUrl":"https://docs.airbyte.com/integrations/sources/low-code","supportsNormalization":false,"supportsDBT":false}} +``` + +Note that the gVisor implementation attempts to use runsc but falls back to direct execution due to permission constraints in Docker. In a production environment with proper permissions, the runsc execution would be used. + +## Challenges Encountered + +During implementation, the following challenges were encountered: + +1. **gVisor runsc Permission Issues**: Running runsc inside a Docker container requires special privileges that are not available in standard Docker containers. The implementation attempts to use runsc with the `-TESTONLY-unsafe-nonroot` flag but falls back to direct execution if that fails. + +2. **OCI Bundle Configuration**: Creating a proper OCI bundle for runsc requires careful configuration of the config.json file. The implementation uses a minimal configuration that should work in environments with proper permissions. + +3. **Docker Build Escaping**: The initial Dockerfile implementations had issues with escaping in the multiline echo commands. This was fixed by using multiple echo commands with redirection. + +## Considerations for Production Use + +For production use, consider: + +1. **Proper gVisor Integration**: For a production implementation of gVisor, consider: + - Using Docker's runtime configuration to specify runsc as the runtime + - Running containers with the necessary privileges for runsc + - Using a more complete OCI bundle configuration + +2. **Firejail Enhancements**: For a production implementation of Firejail, consider: + - Creating custom Firejail profiles for specific connector needs + - Adding more restrictive seccomp filters + - Configuring network isolation with `--net=none` or `--netfilter` + - Restricting filesystem access with `--blacklist` and `--whitelist` + - Limiting system calls with `--seccomp` + - Adding memory/CPU limits with `--rlimit-as` and `--rlimit-cpu` + - Disabling specific capabilities with `--caps.drop=all` + +3. **Performance Impact**: Both sandboxing solutions add overhead: + - Firejail has minimal overhead but less isolation + - gVisor provides stronger isolation but with more significant performance impact + +4. **Security Requirements**: Choose between the solutions based on: + - Threat model and security requirements + - Performance constraints + - Compatibility with existing infrastructure + +## Conclusion + +This POC demonstrates two approaches to sandboxing the `source-declarative-manifest` connector. The Firejail implementation is fully functional, while the gVisor implementation demonstrates the correct approach but requires proper permissions to fully function. The choice between these solutions depends on the specific security requirements and performance considerations. diff --git a/devlog/README.md b/devlog/README.md new file mode 100644 index 000000000..7955a2e84 --- /dev/null +++ b/devlog/README.md @@ -0,0 +1,15 @@ +# Developer Log + +This directory contains logs and experiences from specific work in the repository. These logs are meant to share knowledge, document approaches, and provide insights for future developers working on similar tasks. + +Each log should: + +- Be named with a `YYYY-MM-description.md` format. + - Continuations (including stacked PRs), can be named as `YYYY-MM-description-N.md`, with `N` beginning at the ordinal `2`. +- Include links to relevant PRs and resources. +- Document challenges, solutions, and learnings. +- Provide context that might be helpful for future work. +- Include a FAQ section for anticipated questions and answers about the iteration. +- Include a Closing & Next Steps section, where out-of-scope to-do items or follow-on investigations can be logged. + +These logs are not meant to replace formal documentation but to supplement it with practical experiences and insights. diff --git a/docker/sandbox-poc/Dockerfile.firejail b/docker/sandbox-poc/Dockerfile.firejail new file mode 100644 index 000000000..bd48368a0 --- /dev/null +++ b/docker/sandbox-poc/Dockerfile.firejail @@ -0,0 +1,18 @@ +# Dockerfile for Firejail POC +FROM airbyte/source-declarative-manifest:latest + +USER root + +# Install firejail +RUN apt-get update && \ + apt-get install -y firejail && \ + apt-get clean && \ + rm -rf /var/lib/apt/lists/* + +# Copy the wrapper script +COPY scripts/firejail-wrapper.sh /usr/local/bin/ +RUN chmod +x /usr/local/bin/firejail-wrapper.sh + +# Set the new entry point +ENTRYPOINT ["/usr/local/bin/firejail-wrapper.sh"] +USER airbyte diff --git a/docker/sandbox-poc/Dockerfile.gvisor b/docker/sandbox-poc/Dockerfile.gvisor new file mode 100644 index 000000000..090cb080d --- /dev/null +++ b/docker/sandbox-poc/Dockerfile.gvisor @@ -0,0 +1,26 @@ +# Dockerfile for gVisor POC +FROM airbyte/source-declarative-manifest:latest + +USER root + +# Install dependencies +RUN apt-get update && \ + apt-get install -y curl gnupg apt-transport-https ca-certificates && \ + apt-get clean && \ + rm -rf /var/lib/apt/lists/* + +# Add gVisor repo and install runsc +RUN curl -fsSL https://gvisor.dev/archive.key | apt-key add - && \ + echo 'deb https://storage.googleapis.com/gvisor/releases release main' > /etc/apt/sources.list.d/gvisor.list && \ + apt-get update && \ + apt-get install -y runsc && \ + apt-get clean && \ + rm -rf /var/lib/apt/lists/* + +# Copy the wrapper script +COPY scripts/gvisor-wrapper.sh /usr/local/bin/ +RUN chmod +x /usr/local/bin/gvisor-wrapper.sh + +# Set the new entry point +ENTRYPOINT ["/usr/local/bin/gvisor-wrapper.sh"] +USER airbyte diff --git a/docker/sandbox-poc/README.md b/docker/sandbox-poc/README.md new file mode 100644 index 000000000..b90d4f7c6 --- /dev/null +++ b/docker/sandbox-poc/README.md @@ -0,0 +1,41 @@ +# Sandbox POC Dockerfiles + +This directory contains Dockerfiles for proof-of-concept (POC) implementations of sandboxing solutions for the source-declarative-manifest connector. + +## Firejail + +The `Dockerfile.firejail` adds [Firejail](https://firejail.wordpress.com/) to the source-declarative-manifest image. Firejail is a SUID sandbox program that restricts the running environment of untrusted applications using Linux namespaces and seccomp-bpf. + +To build the image: + +```bash +cd docker/sandbox-poc +docker build -f Dockerfile.firejail -t airbyte/source-declarative-manifest-firejail . +``` + +To test the image: + +```bash +docker run --rm airbyte/source-declarative-manifest-firejail spec +``` + +## gVisor + +The `Dockerfile.gvisor` adds [gVisor](https://gvisor.dev/) (via runsc) to the source-declarative-manifest image. gVisor is a user-space kernel, written in Go, that implements a substantial portion of the Linux system call interface. It provides an additional layer of isolation between running applications and the host operating system. + +To build the image: + +```bash +cd docker/sandbox-poc +docker build -f Dockerfile.gvisor -t airbyte/source-declarative-manifest-gvisor . +``` + +To test the image: + +```bash +docker run --rm airbyte/source-declarative-manifest-gvisor spec +``` + +## Usage + +Both images wrap the original entry point of the source-declarative-manifest connector with their respective sandboxing solution. The wrapped entry point handles all the same command-line arguments as the original entry point. diff --git a/docker/sandbox-poc/scripts/firejail-wrapper.sh b/docker/sandbox-poc/scripts/firejail-wrapper.sh new file mode 100755 index 000000000..489335c65 --- /dev/null +++ b/docker/sandbox-poc/scripts/firejail-wrapper.sh @@ -0,0 +1,3 @@ +#!/bin/bash +# Firejail wrapper for source-declarative-manifest +firejail --noprofile --quiet --private -- python /airbyte/integration_code/main.py "$@" diff --git a/docker/sandbox-poc/scripts/gvisor-wrapper.sh b/docker/sandbox-poc/scripts/gvisor-wrapper.sh new file mode 100755 index 000000000..654e58a29 --- /dev/null +++ b/docker/sandbox-poc/scripts/gvisor-wrapper.sh @@ -0,0 +1,41 @@ +#!/bin/bash +# gVisor wrapper for source-declarative-manifest +COMMAND="$1" +shift + +# Create a temporary OCI bundle directory +BUNDLE_DIR=$(mktemp -d) +mkdir -p $BUNDLE_DIR/rootfs + +# Create a simple config.json for the OCI bundle +cat > $BUNDLE_DIR/config.json << EOFINNER +{ + "ociVersion": "1.0.0", + "process": { + "terminal": false, + "user": { + "uid": 0, + "gid": 0 + }, + "args": [ + "python", "/airbyte/integration_code/main.py", "$COMMAND", "$@" + ], + "env": [ + "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin", + "TERM=xterm" + ], + "cwd": "/" + }, + "root": { + "path": "rootfs" + }, + "linux": {} +} +EOFINNER + +# Run the command with runsc +cd $BUNDLE_DIR +runsc -TESTONLY-unsafe-nonroot run --bundle=$BUNDLE_DIR container1 || python /airbyte/integration_code/main.py "$COMMAND" "$@" + +# Clean up +rm -rf $BUNDLE_DIR