Skip to content

bltavares/aricanduva

Repository files navigation

aricanduva

Simple S3 endpoint API that proxy requests to an IPFS node.

This service allows projects which requires S3-like endpoints to read and write files, without requiring porting their code to work IPFS. Useful on a homelab scenario, where you'd like to integrate with off-the-shelf services, but store and distribute files over your IPFS node. Examples: gotosocial profile pictures.

It's called aricanduva is a reference to the Ponte Aricanduva in Sao Paulo, Brasil - because this is a bridge between S3 and IPFS.

Running it

Docker images available in GHCR and Docker Hub.

If you are running Kubo on a container, you can use the host-gateway to access it, or any other connectivity means, such as docker compose with links:, container orchestration service discovery, etc.

docker run \
   --add-host host.docker.internal:host-gateway
   -e RPC_ADDRESS=http://host.docker.internal:5001/api/v0 \
   bltavares/aricanduva

Binaries are also provided for Windows, Mac and Linux as Github Releases.

Project Overview

The project is designed to be used specifically on a self-hosted scenario. To make it portable, there are a few designs take:

  • Containerized runtime
  • Configuration using env-vars or CLI arguments
  • Lightweight runtime (Rust)
  • Streaming responses when possible (async)
  • Low operational requirements (SQlite)
  • Logs for troubleshooting (export RUST_LOG=debug)
  • Portable coding practices targeting Linux, Mac and Windows
  • Healthcheck endpoint for container orchestrators (/healthz)

In order to run this proxy, you must provide a IPFS Node RPC connection address, as this project does not run itself a IPFS Node. You most likely want to use a Kubo container.

Tip

Ensure Kubo address is accessible to aricanduva, either by binding to all address + Basic password, or using the startup script to listen to private networks only

As long aricanduva can talk to an IPFS Node, it should work, either on the same machine or in a clustered scenario.

By default, it will run with mode: auto, using an experimental and non-standard S3-API mode, which allows aricanduva to be exposed to the internet without generating too much traffic - important in a homelab scenario.

Important

If exposing aricanduva to the internet, ensure it's using a reverse proxy for HTTPS/SSL as well as running with auth enabled, otherwhise anonymous users may add files to your IPFS Node.

Exaple flags: --ip-extraction=RightmostXForwardedFor --auth-access-key=banana --auth-secret-key=bananabanana --mode auto

IPFS ecosystem integration

The service supports the x-ipfs-path header in responses, which provides the full IPFS path for retrieved objects. This allow IPFS Companion to intercept requests and use your preferred gateway and IPFS node.

  • x-ipfs-path: Contains the IPFS path (e.g., /ipfs/<CID>) for the object.
  • x-ipfs-roots: Contains the CID of the object.

The project uses file streaming to return content in mode: proxy which allow users with the IPFS Companion plugin in browser to abort the request without causing too much memory/data transfer on the proxy.****

Supported operations

Architecture

aricanduva works exposing AWS S3 endpoints for services and customers, allowing then to interact with a IPFS Node (such as kubo) as if it was talking with AWS S3.

This means that any service in a cluster that allows configuring AWS S3 integration can point to aricanduva to read and store files in IPFS.

This allows a single bridge service to offer IPFS storage for many services at once.

architecture-beta
    group customer(cloud)[Client]
    service mobile(internet)[Mobile app] in customer
    service browser(internet)[Browser] in customer


group ipfs(cloud)[IPFS Ecosystem]
    service gateway(internet)[Public Gateway] in ipfs

    group api(cloud)[Cluster]

    service kubo(database)[Kubo] in api
    service aricanduva(server)[aricanduva] in api
    service s1(server)[Service A] in api
    service s2(server)[Service B] in api
    service s3(server)[Service C] in api


    kubo:L -- R:aricanduva
    s1:R -- L:aricanduva
    s2:B -- T:aricanduva
    s3:T -- B:aricanduva

    mobile:R -- L:s1
    browser:R -- R:aricanduva

    browser:L -- L:gateway
    aricanduva:T -- B:gateway
Loading

Experimental

Auto mode: Redirect publicly and proxy locally

Considering this project is designed to run on a homelab, ensuring resource utilization is low is very important, specially as bandwidth might be limited by the ISP.

For this scenario, aricanduva has an mode: auto with a non-standard GetObject redirect to a IPFS Public Gateway.

This mode has two scenarios:

  • When the request comes from an IP on a Private Network range, it will return the content directly
  • When the request comes from an IP on a Public Network range, it will return a 307 Temporary Redirect to the configured public gateway address that would actually return the content

Important

A 307 Redirect is a non-standard response from the S3 API and many clients/SDKs will not follow the redirect, causing errors on services. This works as expected on browsers tho.

This is useful on the following scenario:

  • A service uses aricanduva to store files using the S3 API, such a profile pictures
  • The service also have a Web interface to render the files, using PreSignedUrl pointing to aricanduva
  • The client connects to the service and see the PreSignedUrl
  • When retrieving, it fetchs the content from the Public Gateway instead of the aricanduva instance

This means aricanduva can be exposed to the internet together with the service, provide a compatible S3 API to the service, while still serving content (in non-standard protocol) for broswers and other clients that follow redirects.

Considering services will return the content directly from aricanduva as they are on the private network range, it will not break compatibility with SDKs and clients.

architecture-beta
    group customer(cloud)[Client]
    service browser(internet)[Browser] in customer

    group ipfs(cloud)[IPFS Ecosystem]
    service gateway(internet)[Public Gateway] in ipfs

    group api(cloud)[Cluster]

    service kubo(database)[Kubo] in api
    service aricanduva(server)[aricanduva] in api
    service s1(server)[Service A] in api


    browser:R -- L:aricanduva
    browser:R -- L:gateway
    browser:R -- L:s1
    aricanduva:T -- B:gateway
    s1:T -- B:aricanduva
aricanduva:R -- L:kubo
Loading

If you deploy a Split-horizon DNS resolution on your network, you can optmize file transfer paths.

Alternatively, you can run with --mode=proxy to serve the content directly from aricanduva and always run in a standard mode.

architecture-beta
    group customer(cloud)[Client]
    service browser(internet)[Browser] in customer

    group api(cloud)[Cluster]
    service kubo(database)[Kubo] in api
    service aricanduva(server)[aricanduva] in api
    service s1(server)[Service A] in api


    browser:R -- L:aricanduva
    browser:R -- L:s1
    s1:T -- B:aricanduva
aricanduva:R -- L:kubo
Loading

Note

There is --mode redirect but it's mostly for testing it returns 307 Redirect always and this breaks most S3 SDK and clients that don't follow redirects

Trimming empty dirs from MFS

When a file is removed, aricanduva will try to trim empty folders from the MFS layer on the IPFS Node, to ensure it's kept tidy.

At this moment, due to some limitations on the feature-set of SQLite, this will generate a N+1 queries, for each path segment. This means a deeply-nested key entry might take a while to be cleanup and may tax the database.

Note

If possible, avoid using deeply nested keys

If you'd like to avoid trimming if deeply nested keys are used by services, disable it with --experimental-trim-empty-folders=false

Auto Mime detection

aricanduva stores the Content-Type: header on PutObject operations, but some clients might not send this information on the request.

If there is no content-type header, the service will attempt to guess it based on the file extension.

Disable content-type guessin with --expreimenta-auto-mime=false

MultiPartUpload

MultiPartUpload was a required feature to implement even for small files, based on testing with a few SDK and S3-client. Ideally, when the size is know, it should use a single PUT PutObject request, yet many of them will perform a more complicated multi-call using the POST MultiPartUpload implementation

Important

The implemtation is naive and stores files in-memory. If the service restarts mid-upload, the client must try again.

This also means large files using MultiPartUpload will require lots of memory of the process.

In order to get it done quickly, MultiPartUpload stages parts in-memory, and collate all parts on CompleteMultiPartUpload call. It will only store content on IPFS and the metadata database when completed.

If you expect large files, run with high memory limits, to avoid restarts mid-upload and loosing data.

Local Development

The project declares integration with cargo-run-bin to help setup environment.

Hot-Reloading with systemfd

For development with hot-reloading, use systemfd to manage the socket:

make run

This setup allows the server to automatically reload when code changes are detected.

Update database schemas

To test database queries, use cargo sqlx commands (from sqlx-cli).

make prepare

Building locally

You can build on Docker, which uses cargo-chef to optimize layers.

make docker DOCKER_IMAGE=example/aricanduva

TODOs

Known issues and future work so I don't forget when I visit this project in the future

  • Support GET RANGE and pass forward ipfs cat_range
  • Refactor the ipfs-api crate to support impl AsyncRead + !Send as axum::Body: !Send
  • Implement integrity check of validation of headers with hashing of the body in single chunk
  • Implement integrity check of validation of headers with hashing of chunk reader

Acknowledgements

About

S3 to IPFS proxy service - Bridge services into the IPFS ecosystem

Topics

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors