Skip to content

Commit 65f13b0

Browse files
authored
docs: add glossary, add explainer, and content routing FAQ (#233)
* Initial import from Notion * renae * gramma and spelling fixes * Incorporate feedback from Claude session * crosslinking to glossary terms * Added diagram * More diagram updates * diagram fix * Link to filecoin-pin-website in glossary * Incorporate copilot feedback * Incorporating PR feedback part 1 * Incorporated feedback and updated README * Factored out content routing docs * minor fixups * Converted Storage Provider to Service Provider * fix typos
1 parent 196d055 commit 65f13b0

File tree

6 files changed

+426
-0
lines changed

6 files changed

+426
-0
lines changed

AGENTS.md

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -73,6 +73,15 @@ src/
7373
4. Clean up CAR files and Helia instances on failure
7474
5. Browser vs Node variants (check package.json exports)
7575

76+
## Documentation
77+
78+
**Glossary**: `documentation/glossary.md` is the authoritative terminology reference.
79+
80+
- Documentation files in `documentation/` should reference and link to glossary entries for key terms (e.g., `[Storage Provider](glossary.md#storage-provider)`)
81+
- Glossary entries should cross-link to related terms using anchor links (e.g., `[Data Set](#data-set)`)
82+
- Avoid overlinking: link first mention of a term in each section, not every occurrence
83+
- Update glossary when introducing new concepts or terminology
84+
7685
## CLI & Environment
7786

7887
**Commands**: `payments setup --auto`, `add <file>`, `payments status`, `data-set <id>`, `server`

README.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -68,6 +68,9 @@ Web-based management console for monitoring and managing your Filecoin Pin deplo
6868
- **Status**: Planned
6969
- **Tracking**: See [issue #74](https://github.com/filecoin-project/filecoin-pin/issues/74) for updates. Please leave a comment about your usecase if this would be particularly beneficial.
7070

71+
## Documentation
72+
See [/documentation](/documentation/).
73+
7174
## Examples
7275

7376
See Filecoin Pin in action:

documentation/README.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
* [Filecoin Pin glossary](glossary.md) - A unified place for the defining the plethora of terms for the various technologies coming together under Filecoin Pin (e.g., existing Filecoin blockchain and storage providers, new Filecoin initiatives including Filecoin Onchain Cloud, IPFS).
2+
* [Explainer: behind the scenes of adding a file](behind-the-scenes-of-adding-a-file.md) - Provides more technical info about what happens for a Filecoin Pin `add` as it uses the underlying [Synapse library](glossary.md#synapse) and [Filecoin Onchain Cloud](glossary.md#filecoin-onchain-cloud) offering.
3+
* [Content Routing FAQ](content-routing-faq.md) - Frequently asked questions about content routing with IPNI, which Filecoin Pin relies upon, including caching behavior, provider management, and indexer operations.
4+
* [Builder Cookbook](https://docs.filecoin.io/builder-cookbook/filecoin-pin)
Lines changed: 196 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,196 @@
1+
## Purpose
2+
3+
The steps outlined below are taken to "add a file with Filecoin Pin". This document is intended to provide more info about what happens "behind the scenes" as it uses underlying libraries like [`synapse`](glossary.md#synapse) and the [Filecoin Onchain Cloud](glossary.md#filecoin-onchain-cloud) offering.
4+
5+
## Diagram
6+
7+
_Key_
8+
Blue | Orange | FP | SP
9+
-- | -- | -- | --
10+
non-blockchain step | blockchain step | Filecoin Pin | Service Provider
11+
12+
13+
```mermaid
14+
graph TD
15+
16+
%% Node Definitions
17+
%% Non-blockchain nodes
18+
Start([User: Select File or Directory to Add])
19+
CreateCAR[FP: Create CAR<br/><br/>✓ IPFS Root CID known]
20+
UploadCAR[FP: Upload CAR to SP<br/><br/>✓ SP /piece/$pieceCid retrieval]
21+
IndexCAR{{SP: Index CAR CIDs<br/><br/>✓ SP /ipfs/$cid retrieval}}
22+
AdvertiseCAR{{SP: Advertise CAR CIDs to IPNI}}
23+
AwaitIPNIIndexing[FP: Await IPNI Indexing<br/><br/>✓ IPNI provider records<br/>✓ IPFS Mainnet retrieval possible]
24+
RetrieveData(Any: Retrieve with IPFS Mainnet<br/><br/>✓ ipfs://$cid works)
25+
26+
%% Blockchain nodes
27+
ConnectWallet([User: Connect Wallet<br/><br/>✓ Wallet balances visible])
28+
SetupPay[FP: Setup Filecoin Pay Account<br/><br/>✓ Filecoin Pay balance visible]
29+
IdentifyDataSet[FP: Identify Data Set SP and ID]
30+
CreateDataSet{{SP: Create Data Set & Add Piece<br/><br/>✓ Blockchain Transaction}}
31+
ConfirmBlockchainTx[FP: Confirm Blockchain Transaction<br/><br/>✓ Data Set & Piece metadata onchain]
32+
ProveData{{SP: Prove Data Possession<br/><br/>✓ Cryptographic proofs onchain and visible on explorers}}
33+
34+
%% Milestone
35+
FilecoinPinAddDone{FP: `Add` Done}
36+
37+
%% Relationships
38+
%% Non-blockchain flow
39+
Start --> CreateCAR
40+
CreateCAR --> UploadCAR
41+
UploadCAR --> IndexCAR
42+
IndexCAR --> AdvertiseCAR
43+
AdvertiseCAR --> AwaitIPNIIndexing
44+
AwaitIPNIIndexing --> FilecoinPinAddDone
45+
FilecoinPinAddDone --> RetrieveData
46+
47+
%% Blockchain flow
48+
ConnectWallet --> SetupPay
49+
SetupPay --> IdentifyDataSet
50+
IdentifyDataSet --> CreateDataSet
51+
52+
%% Convergence
53+
UploadCAR --> CreateDataSet
54+
CreateDataSet --> ConfirmBlockchainTx
55+
ConfirmBlockchainTx --> FilecoinPinAddDone
56+
FilecoinPinAddDone --> ProveData
57+
58+
%% Styling
59+
classDef nonBlockchain fill:#e1f5ff,stroke:#0288d1,stroke-width:2px,color:#000
60+
classDef blockchain fill:#fff3e0,stroke:#f57c00,stroke-width:2px,color:#000
61+
62+
class Start,CreateCAR,UploadCAR,IndexCAR,AdvertiseCAR,AwaitIPNIIndexing,RetrieveData nonBlockchain
63+
class ConnectWallet,SetupPay,IdentifyDataSet,CreateDataSet,ConfirmBlockchainTx,ProveData blockchain
64+
```
65+
66+
## Steps without Blockchain Interactions
67+
68+
These are the set of steps that are done client side (i.e., where the Filecoin Pin code is running) and with a [Service Provider](glossary.md#service-provider) that don't involve the Filecoin blockchain. These steps in isolation though don't yield a committed cryptographic proof of the data being possessed by and retrievable from an SP, but they are necessary preconditions.
69+
70+
### Create CAR
71+
72+
*What/why:*
73+
74+
The provided file needs to be turned into a Merkle DAG and have the DAG's blocks transported to an SP. [CAR](glossary.md#car) is a common container format for transporting blocks in the IPFS ecosystem and is used with Filecoin Pin. [Service Providers](glossary.md#service-provider) (SPs) store and prove contiguous sequence of bytes, but generating IPFS compatible data from files and directories creates potentially very many small "blocks" of data, which we pack into a single CAR container for SPs to store and prove. The process of packing in IPFS form, or "DAGifying" the file data, allows us to reference and verify smaller units of our content, and gives us the ability to interact _trustlessly_ with SPs to serve and retrieve our file data.
75+
76+
Implementation notes:
77+
- The CAR file is created using Helia for UnixFS DAG creation.
78+
- Individual files are wrapped in a directory so that the filename is preserved via UnixFS metadata.
79+
80+
*Outputs:*
81+
82+
v1 CAR containing the Merkle DAG representing the provided file. There is one root in the CAR, and it represents the root of the DAG for the input file. This is referred to as the "[IPFS Root CID](glossary.md#ipfs-root-cid)".
83+
84+
*Expected duration:*
85+
86+
This is a function of the size of the input file and the hardware. Typical DAGification of files and directories is relatively quick as it's simply a matter of chunking and hashing using common algorithms. The most time-consuming part is the generation of the ["Piece CID](glossary.md#piece-cid) of the whole CAR on the client side prior to upload, where a a 1Gb input can take upwards of a minute. As the car is being created, it can be streamed to an SP, which is most likely the bottleneck.
87+
88+
### Upload CAR
89+
90+
*What/why:*
91+
92+
The [Service Provider](glossary.md#service-provider) needs to be given the bytes to store so it can serve retrievals and prove to the chain that it possesses them. This is done via an HTTP `PUT /pdp/piece/upload`.
93+
94+
The upload includes [metadata](glossary.md#metadata) that will be stored on-chain:
95+
- `ipfsRootCid`: The IPFS Root CID, linking the [Piece](glossary.md#piece) back to IPFS
96+
- `withIPFSIndexing`: Signals the SP to index and advertise to [IPNI](glossary.md#ipni)
97+
98+
*Outputs:*
99+
100+
SP parks the Piece and queues it up for processing, while the client gets an HTTP response with the [Piece CID](glossary.md#piece-cid). The server calculates the Piece CID for the data and confirms that it matches the Piece CID calculated and provided by the Filecoin Pin client to provide assurance that we are providing the exact bytes we expect.
101+
102+
Since the SP has the data for the Piece, it can be retrieved with https://sp.domain/piece/$pieceCid retrieval.
103+
104+
*Expected duration:*
105+
106+
This is a function of the CAR size and the throughput between the client and the SP.
107+
108+
### Index and Advertise CAR CIDs
109+
110+
*What/why:*
111+
112+
At some point after receiving the uploaded [CAR](glossary.md#car), an SP indexing task processes the CAR and creates a local mapping of CIDs to offsets within the CAR so it can serve IPFS style retrievals. Following that, an SP [IPNI](glossary.md#ipni) tasks picks up the local index, makes and IPNI advertisement chain, and then announces the advertisement chain to IPNI indexers like filecoinpin.contact and cid.contact so they know to come and get the advertisement chain to build up their own index.
113+
114+
Filecoin Pin validates the IPNI advertisement process by polling `https://filecoinpin.contact/cid/$cid` (NOT cid.contact due to [negative caching issues discussed below](#how-long-does-an-ipni-indexer-cache-results)).
115+
116+
*Outputs:*
117+
118+
Once the SP has indexed the CAR, it can be directly retrieved from the SP (i.e., bypassing IPFS Mainnet content routing) using https://sp.domain/ipfs/$cid retrieval.
119+
120+
The SP produces a new or updated advertisement chain. By the end, IPNI indexers should have additional provider records for the advertised CIDs.
121+
122+
*Expected duration:*
123+
124+
Local indexing of the CAR is quick as the CAR already contains a list of CIDs and their offsets, which is verified and reused. Creating/updating an advertisement chain and announcing it to IPNI indexers is also quick. There is a delay in an IPNI indexer on the order of seconds for coming to grab the advertisements plus some ingestion delay on the IPNI indexer side.
125+
126+
## Blockchain related steps
127+
128+
Below are the set of steps that are particularly unique from traditional IPFS usage as they involve authorization, payment, and cryptographic proofs.
129+
130+
### Connect Wallet
131+
132+
*What/why:*
133+
134+
Filecoin Pin needs to interface with the Filecoin blockchain to authorize and send payment to [service providers](glossary.md#service-provider) (SPs) for their work of storing and proving possession of data. This requires having a secret key to sign messages sent to the blockchain.
135+
136+
Currently `filecoin-pin` expects to be explicitly passed a private key via environment variable or command line argument. [filecoin-pin-website](glossary.md#filecoin-pin-website) as [pin.filecoin.cloud](http://pin.filecoin.cloud) uses a global [session key](glossary.md#session-key) which can be embedded into source code since it scopes down the set of actions that can be performed and alleviating the need for a user to provide a wallet to perform operations.
137+
138+
*Outputs:*
139+
140+
Once a wallet is connected, [USDFC](glossary.md#usdfc) and [FIL](glossary.md#fil) balances in the wallet itself can be inspected.
141+
142+
*Expected duration:*
143+
144+
Less than 1 second once a wallet private key is provided.
145+
146+
### Setup Filecoin Pay account
147+
148+
*What/why:*
149+
150+
To prepare to make a "deal" with an SP to store data, these actions need to occur:
151+
152+
1. Permit the user's [Filecoin Pay](glossary.md#filecoin-pay) account to use [USDFC](glossary.md#usdfc). This is a one-time authorization.
153+
2. Approve FilecoinWarmStorage as an operator of Filecoin Pay funds. This is a one-time authorization.
154+
3. Deposit at least enough funds into Filecoin Pay to cover the lock-up period for the created [CAR](glossary.md#car).
155+
156+
If they haven't occurred before, then they will be handled as part of the first deposit into the Filecoin Pay account from filecoin pin. A single `depositWithPermitAndApproveOperator` transaction handles all of these actions.
157+
158+
*Outputs:*
159+
160+
The Filecoin Pay account has a non-zero balance.
161+
162+
*Expected duration:*
163+
164+
As a single transaction, this takes ~30 seconds to be confirmed onchain.
165+
166+
### Identify a Data Set SP and ID
167+
168+
*What/why:*
169+
170+
In order to upload a [CAR](glossary.md#car), Filecoin Pin needs to identify the SP to upload to. This strategy is followed (assuming no overrides are provided):
171+
172+
1. If the chain has record of a [Data Set](glossary.md#data-set) created by the wallet with the Data Set [metadata key](glossary.md#metadata) `source` set to 'filecoin-pin', then that DataSet ID and corresponding SP are used. If there are multiple, then the one storing the most data will be used.
173+
2. If there is no existing Data Set, then a new Data Set is created using an approved [Service Provider](glossary.md#service-provider) from the [Service Provider Registry](glossary.md#service-provider-registry).
174+
175+
*Outputs:*
176+
177+
- An existing Data Set ID to use or empty if a new Data Set should be created
178+
- SP id to use for CAR upload and Data Set creation (if needed).
179+
180+
*Expected duration:*
181+
182+
This should take less than a couple of seconds as it involves hitting RPC providers to get chain state.
183+
184+
### Create Data Set if necessary and Add Piece
185+
186+
*What/why:*
187+
188+
A single blockchain transaction that create a [Data Set](glossary.md#data-set) if one doesn't already exist and adds a [Piece](glossary.md#piece) to the Data Set for the corresponding [CAR](glossary.md#car) file. This is done as one operation rather than just "Create Data Set" and "Add Piece" to improve interaction latency. The Piece uses a Filecoin-internal hash function resulting in a [Piece CID](glossary.md#piece-cid), which is what is stored onchain. The [Filecoin Warm Storage Service](glossary.md#filecoin-warm-storage-service) then has record of what SP is storing which data that it needs to periodically prove it has possession of. Filecoin Pin stores additional [metadata](glossary.md#metadata) on the piece denoting that the uploaded data should be indexed by the SP and advertised to [IPNI](glossary.md#ipni) indexers.
189+
190+
*Outputs:*
191+
192+
A record onchain denoting the data that needs to periodically be proven to be in the possession of the Data Set’s SP.
193+
194+
*Expected duration:*
195+
196+
As a single transaction, this takes ~30 seconds to be confirmed onchain.
Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,38 @@
1+
# Content Routing FAQ
2+
3+
[Content Routing](glossary.md#content-routing) is essential for making the data stored with Filecoin Pin actually retrieval by [standard IPFS tooling](glossary.md#standard-ipfs-tooling). This document answers questions about the content routing systems Filecoin Pin relies on.
4+
5+
## Will indexed CIDs from Calibration be mixed with CIDs from Mainnet?
6+
7+
Yes. [IPNI](glossary.md#ipni) indexers are not chain aware. They key on the CID and will point to whatever providers have "recently" advertised the CID. This means that if a given piece is created with a [Calibration](glossary.md#calibration-network) SP and also with a Mainnet SP, the CIDs will list both SPs as providers.
8+
9+
## What happens when a piece is deleted?
10+
11+
When an SP is instructed to delete a [piece](glossary.md#piece), it announces a new advertisement to [IPNI](glossary.md#ipni) that includes the removal of the CIDs within the piece. This update to IPNI goes through the normal IPNI flow of receiving advertisement announcements and then asynchronously fetching the advertisements from the provider. As a result, deleted pieces should take seconds to low minutes for IPNI index state to be updated.
12+
13+
## What happens if a SP goes offline?
14+
15+
In this case, the [IPNI](glossary.md#ipni) indexer will still attempt to auto-sync with the publisher until 7 days (168 hours) have passed. Once this timeout is hit, the offline-SP's advertised CIDs will be removed from the index.
16+
17+
## What happens if an SP loses index state?
18+
19+
In the event that an SP wipes their existing index state, the previously announced advertisements will still be stored by the [IPNI](glossary.md#ipni) indexer if no further action is done. If the underlying advertisement disappears, but has already been processed by IPNI, this does not affect the availability of records, so long as the provider is still reachable. For the records to disappear, it is necessary to either:
20+
21+
1. publish a removal advertisement for the CIDs that need to be deleted OR
22+
2. have the SP create a new advertisement chain under a new peer ID so as to let the old provider records die out (7 days per above)
23+
24+
## How long does an IPNI indexer cache results?
25+
26+
This depends on both the [IPNI](glossary.md#ipni) indexer instance (e.g., cid.contact, filecoinpin.contact) and whether there is a cache hit or cache miss.
27+
28+
[cid.contact](http://cid.contact) for example tends to cache hits for multiple hours and cache misses (negative cache) for minutes. As a result of this, there are "gotchas" we have to be careful to avoid or can unavoidably fall into.
29+
30+
- [cid.contact](http://cid.contact) cache miss "gotcha" - Because cid.contact caches misses (i.e., negative cache), it's important for Filecoin Pin to not poll cid.contact after an advertisement has been announced. The act of polling could cause the empty result set to get cached for minutes. Instead, Filecoin Pin polls [filecoinpin.contact](http://filecoinpin.contact) which doesn't have negative caching. Once Filecoin Pin sees the expected results from filecoinpin.contact it then proceeds to give IPFS Mainnet retrieval URLs since it should be safe to invoke a request path that hits cid.contact because cid.contact should now not get a non-empty result.
31+
- [cid.contact](http://cid.contact) cache hit "gotcha" - If cid.contact has a provider record(s) for CID X, but CID X is not currently from any of those provider(s), then cid.contact could be caching non-retrievable result for hours even though filecoinpin.contact has a provider that makes CID X retrievable. We currently don't have a workaround for this…
32+
33+
## Why is there filecoinpin.contact and cid.contact?
34+
35+
[filecoinpin.contact](http://filecoinpin.contact) serves two purposes currently:
36+
37+
1. Serve as a fallback in case [cid.contact](http://cid.contact) has issues keeping its global index updated. To help with availability, cid.contact has the ability to delegate requests to other [IPNI](glossary.md#ipni) indexers like [filecoinpin.contact](http://filecoinpin.contact) in case they have results.
38+
2. Validate IPNI announcing/advertising independently of [cid.contact](http://cid.contact). Per the "[cid.contact](http://cid.contact) cache miss gotcha" above, the act of polling cid.contact can actually delay how long it takes before cid.contact returns a non-empty result for a given CID. [filecoinpin.contact](http://filecoinpin.contact) has different caching configuration so that polling can be done safely.

0 commit comments

Comments
 (0)