Skip to content

Commit ee80b1e

Browse files
committed
IPFS @ Maven
1 parent cf414a4 commit ee80b1e

File tree

1 file changed

+137
-0
lines changed
  • content/en/blog/cstamas/2026/01/ipfs

1 file changed

+137
-0
lines changed
Lines changed: 137 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,137 @@
1+
---
2+
title: "Maven @ IPFS"
3+
date: 2026-01-22T10:55:53+01:00
4+
draft: false
5+
authors: cstamas
6+
author: cstamas ([@cstamas](https://bsky.app/profile/cstamas.bsky.social))
7+
categories:
8+
- Blog
9+
tags:
10+
- maven
11+
- ipfs
12+
projects:
13+
- Maven
14+
---
15+
16+
Lately has been toying with IPFS to achieve content sharing without centralized infrastructure. In other words, instead
17+
to free-ride on some (centralized) infrastructure that may be a public good, or some commercial offering, solve the publishing
18+
(and also "owning" and "hosting" the data) by my self. Kinda reminded me this to early 2000s, when many ran
19+
Maven repositories in their own basements, making access to their repositories fragile, with longevity accessibility
20+
and uptimes totally unpredictable. That was before Central, which consolidated but also promoted itself as single point of
21+
failure. And hence, ended up as over- and misused, see [Maven Central and the Tragedy of the Commons](https://www.sonatype.com/blog/maven-central-and-the-tragedy-of-the-commons).
22+
23+
Hence, I wanted plan B for our Maveniverse organization. What if we -- aside of continued publishing to Central -- offer
24+
other ways to get artifacts for those interested in them, let them keep them in a way they want (not possible with Central,
25+
mirroring is still not an option), and have them full access to artifacts (ie indexing, scanning, whatever)?
26+
27+
## Enter IPFS
28+
29+
I will not spend a lot on explaining IPFS, it has a nice [documentation available already](https://docs.ipfs.tech/).
30+
If anything else, at least read [this page](https://docs.ipfs.tech/concepts/ipfs-solves/).
31+
32+
In short, IPFS is a decentralized network, running IPFS nodes and implementing Content Addressable Storage (CAS) and
33+
more. Terms worth knowing about are [CID](https://docs.ipfs.tech/concepts/content-addressing/#what-is-a-cid),
34+
that in very simplified form, can be understood as a "hash" (content solely content hash), pointing to some
35+
content (any content). The content behind CID can be one of multiple things (but once set, is immutable): it can
36+
point to contents of a JAR file, or in fact any file, but it can be a DAG backed by IPLD schema [file system](https://docs.ipfs.tech/concepts/file-systems/).
37+
This means that CID may point even to "file hierarchy", like on Linux systems, with directories and files and everything.
38+
These structures are colloquially called "Merkle Trees", and are built "bottom up", from leaves (files) toward root.
39+
Hence, in case of change (ie another file added to a directory), the existing file CID remains unchanged, but due their
40+
shared parent directory changed, its CID and root CID will change.
41+
Producing CID is free to everyone: just run an IPFS node and upload some content to it, and you will get a CID for it.
42+
Important thing to consider, is that if two persons, independently, upload some content and both end up with same CID
43+
(with some fine print, later about that), it means they both independently published bit-by-bit _same content_ (IPFS is
44+
content addressable).
45+
46+
Having the CID is like having the "address", but where is the content behind the address? For start, it is in the node
47+
you uploaded the content to get the CID. With "pinning", you can make your node pull down any CID backing content
48+
(assuming is reachable). Basically you maintain your node, letting it stick to content you want/need, and merely
49+
caching the rest (IPFS node storage performs regular garbage collections, dropping unreachable or stale content). Furthermore,
50+
there are (paid or free) services for "pinning content", by using those services, you can make sure content is "pinned"
51+
and swiftly served to any node wanting it. But this is out of scope for this article.
52+
53+
Another term is [IPNS](https://docs.ipfs.tech/concepts/ipns/), that is like "mutable CID" (maybe consider it like DNS). For creating IPNS entries
54+
a private cryptographical key is required, and each key can produce _one IPNS entry_. At the same time, one node (or user)
55+
can mangle as many keys as they want. And this is important thing: as I explained above, anyone can create CID, but if
56+
consumer asks IPNS "which is CID you published", all user has to do is resolve IPNS to get the right CID, as IPNS
57+
entry is the function of private key, and it cannot be faked. Nor CID or IPNS is "human friendly", kinda, but there
58+
are solutions to it like [DNSLink](https://dnslink.dev/) where IPNS can be exposed via DNS and "human friendly" domain.
59+
60+
{{% pageinfo color="info" %}}
61+
62+
Have to note several key things to reassure IPFS users:
63+
* IPFS works in similar fashion as Torrent, uses DHT and various other means to let nodes discover each other. In short,
64+
the more nodes the merrier. In other words, "popular" content may be cached on multiple nodes, and hence, will be
65+
faster getting them.
66+
* Each IPFS node participates in traffic direction (ie passing messages) among each other, telling about discovered nodes,
67+
and offering local content, if asked for.
68+
* Important thing to note, is that if you run your node **it will store only the content you tell it to store**
69+
(by locally pushing or pinning), it will NOT store random content from internet.
70+
* There is pretty much nothing needed on your side (network setup or alike) to make IPFS published.
71+
72+
{{% /pageinfo %}}
73+
74+
## Rounding it up
75+
76+
So what gives this?
77+
* CID points to content/structure and is immutable (same CID will always return same content, if content is accessible)
78+
* IPNS points to "up to date" CID, and you can be sure entry was published by key owner only and nobody else
79+
* DNS points to IPNS, and again, assuming you trust the domain owner, you can then delegate your trust to IPNS entry it points to. This trust delegation
80+
is very similar to Central, where you need to provide proof to get publishing namespace (that is ideally a reverse domain).
81+
82+
In short, we have a series of indirection: `domain -> IPNS -> CID`. If you get to CID by hopping over these stops, all
83+
fine. But what happens if your private key (used for publishing IPNS) is compromised? Just create new key, republish the
84+
content with it, and update DNSLink for your domain (and of course, communicate it). After all, we still can GPG sign
85+
artifacts, so IPNS + GPG is good enough.
86+
87+
An example of this setup can be seen at [ipfs.maveniverse.eu](https://ipfs.maveniverse.eu/).
88+
89+
Details:
90+
* the `ipfs.maveniverse.eu` domain uses DNSLink to publish TXT record with IPNS (try `dig _dnslink.ipfs.maveniverse.eu TXT`)
91+
* the IPNS entry is in form of `/ipns/xxxxxxxxxx`
92+
* the IPFS node can resolve `/ipns/xxxxx` address to `/ipfs/xxxxx` CID.
93+
94+
## Maven @ IPFS
95+
96+
Maven release repositories seems like perfect candidates to be put onto IPFS: they are immutable. Or to be more precise,
97+
the leaves (artifacts) in a repository will remain immutable, and by deploying (more) only parent and paren parent changes,
98+
in essence the root CID changes. Aside of parents, G and A level metadata changes as well, but those are not leaves.
99+
100+
Maveniverse [IPFS](https://github.com/maveniverse/ipfs) extension provides support for this setup above, and is even
101+
usable on CI to consume (and later to deploy).
102+
103+
The extension requirements are Java 11+ and a reachable Kubo RPC API (simplest is to have it running on localhost) and
104+
adds following components to Maven:
105+
* adds IPFS transporter supporting `ipfs:/` URLs
106+
* adds IPFS publishing support via lifecycle participant
107+
108+
The IPFS URL looks like `ipfs:/name[/subpath]` where parts are:
109+
* protocol `ipfs:/` is fixed and must
110+
* for consuming, the `name` element should be **resolvable**. It can be CID, IPNS or DNSLink-ed domain.
111+
* for deploying, the `name` element aside of that above, should be the name of a **private key** present in IPFS node used to publish IPNS
112+
* optional `/subpath` defines the path prefix within `name`
113+
114+
Have to mention, that if using CID for `name`, it is user responsibility to _ensure_ proper CID is used, since as explained
115+
above, CIDs can be created by anyone, and it may contain fake or even malicious artifacts. When using IPNS record,
116+
similar thing, user has to ensure that he resolves the proper IPNS (but if trust is established, all is good). Finally,
117+
in case of using (IPFS resolvable) domain, same level of trust can be established as in case of Central, one can
118+
safely assume that domain owner publishes right thing (same as on Central).
119+
120+
A little bit of digression here: in case `name` is a domain, I was tinkering to **limit** Maven @ IPFS to get only artifacts
121+
from domains namespace, for example `maveniverse.eu` should offer **only `eu.maveniverse` namespace**. Any ideas welcome!
122+
123+
{{% pageinfo color="info" %}}
124+
125+
Important: the current workflow Maven @ IPFS implements works of small scale, ie Maveniverse forge level, that has handful
126+
of Megabytes of artifacts. The current workflow due refresh/pinning (downloading whole blob) does not scale, but works
127+
pretty nicely on small scale.
128+
129+
{{% /pageinfo %}}
130+
131+
## Mimir @ IPFS
132+
133+
As mentioned above, in repositories published with Maven @ IPFS, the _leaves will not change_. That means that their
134+
CID remains unchanged. Next level would be _IPFS global caching_, for example with Mimir (that already offers similar
135+
service on LAN using JGroups). Here, some translation needs to be done, that begins with GAV and ends with CID.
136+
137+
Once something in place, will report back! Cheers and have fun!

0 commit comments

Comments
 (0)