|
| 1 | +## Purpose |
| 2 | + |
| 3 | +The steps outlined below are taken to "add a file with Filecoin Pin". This document is intended to provide more info about what happens "behind the scenes" as it uses underlying libraries like [`synapse`](glossary.md#synapse) and the [Filecoin Onchain Cloud](glossary.md#filecoin-onchain-cloud) offering. |
| 4 | + |
| 5 | +## Diagram |
| 6 | + |
| 7 | +_Key_ |
| 8 | +Blue | Orange | FP | SP |
| 9 | +-- | -- | -- | -- |
| 10 | +non-blockchain step | blockchain step | Filecoin Pin | Service Provider |
| 11 | + |
| 12 | + |
| 13 | +```mermaid |
| 14 | +graph TD |
| 15 | +
|
| 16 | + %% Node Definitions |
| 17 | + %% Non-blockchain nodes |
| 18 | + Start([User: Select File or Directory to Add]) |
| 19 | + CreateCAR[FP: Create CAR<br/><br/>✓ IPFS Root CID known] |
| 20 | + UploadCAR[FP: Upload CAR to SP<br/><br/>✓ SP /piece/$pieceCid retrieval] |
| 21 | + IndexCAR{{SP: Index CAR CIDs<br/><br/>✓ SP /ipfs/$cid retrieval}} |
| 22 | + AdvertiseCAR{{SP: Advertise CAR CIDs to IPNI}} |
| 23 | + AwaitIPNIIndexing[FP: Await IPNI Indexing<br/><br/>✓ IPNI provider records<br/>✓ IPFS Mainnet retrieval possible] |
| 24 | + RetrieveData(Any: Retrieve with IPFS Mainnet<br/><br/>✓ ipfs://$cid works) |
| 25 | +
|
| 26 | + %% Blockchain nodes |
| 27 | + ConnectWallet([User: Connect Wallet<br/><br/>✓ Wallet balances visible]) |
| 28 | + SetupPay[FP: Setup Filecoin Pay Account<br/><br/>✓ Filecoin Pay balance visible] |
| 29 | + IdentifyDataSet[FP: Identify Data Set SP and ID] |
| 30 | + CreateDataSet{{SP: Create Data Set & Add Piece<br/><br/>✓ Blockchain Transaction}} |
| 31 | + ConfirmBlockchainTx[FP: Confirm Blockchain Transaction<br/><br/>✓ Data Set & Piece metadata onchain] |
| 32 | + ProveData{{SP: Prove Data Possession<br/><br/>✓ Cryptographic proofs onchain and visible on explorers}} |
| 33 | +
|
| 34 | + %% Milestone |
| 35 | + FilecoinPinAddDone{FP: `Add` Done} |
| 36 | +
|
| 37 | + %% Relationships |
| 38 | + %% Non-blockchain flow |
| 39 | + Start --> CreateCAR |
| 40 | + CreateCAR --> UploadCAR |
| 41 | + UploadCAR --> IndexCAR |
| 42 | + IndexCAR --> AdvertiseCAR |
| 43 | + AdvertiseCAR --> AwaitIPNIIndexing |
| 44 | + AwaitIPNIIndexing --> FilecoinPinAddDone |
| 45 | + FilecoinPinAddDone --> RetrieveData |
| 46 | +
|
| 47 | + %% Blockchain flow |
| 48 | + ConnectWallet --> SetupPay |
| 49 | + SetupPay --> IdentifyDataSet |
| 50 | + IdentifyDataSet --> CreateDataSet |
| 51 | +
|
| 52 | + %% Convergence |
| 53 | + UploadCAR --> CreateDataSet |
| 54 | + CreateDataSet --> ConfirmBlockchainTx |
| 55 | + ConfirmBlockchainTx --> FilecoinPinAddDone |
| 56 | + FilecoinPinAddDone --> ProveData |
| 57 | +
|
| 58 | + %% Styling |
| 59 | + classDef nonBlockchain fill:#e1f5ff,stroke:#0288d1,stroke-width:2px,color:#000 |
| 60 | + classDef blockchain fill:#fff3e0,stroke:#f57c00,stroke-width:2px,color:#000 |
| 61 | +
|
| 62 | + class Start,CreateCAR,UploadCAR,IndexCAR,AdvertiseCAR,AwaitIPNIIndexing,RetrieveData nonBlockchain |
| 63 | + class ConnectWallet,SetupPay,IdentifyDataSet,CreateDataSet,ConfirmBlockchainTx,ProveData blockchain |
| 64 | +``` |
| 65 | + |
| 66 | +## Steps without Blockchain Interactions |
| 67 | + |
| 68 | +These are the set of steps that are done client side (i.e., where the Filecoin Pin code is running) and with a [Service Provider](glossary.md#service-provider) that don't involve the Filecoin blockchain. These steps in isolation though don't yield a committed cryptographic proof of the data being possessed by and retrievable from an SP, but they are necessary preconditions. |
| 69 | + |
| 70 | +### Create CAR |
| 71 | + |
| 72 | +*What/why:* |
| 73 | + |
| 74 | +The provided file needs to be turned into a Merkle DAG and have the DAG's blocks transported to an SP. [CAR](glossary.md#car) is a common container format for transporting blocks in the IPFS ecosystem and is used with Filecoin Pin. [Service Providers](glossary.md#service-provider) (SPs) store and prove contiguous sequence of bytes, but generating IPFS compatible data from files and directories creates potentially very many small "blocks" of data, which we pack into a single CAR container for SPs to store and prove. The process of packing in IPFS form, or "DAGifying" the file data, allows us to reference and verify smaller units of our content, and gives us the ability to interact _trustlessly_ with SPs to serve and retrieve our file data. |
| 75 | + |
| 76 | +Implementation notes: |
| 77 | +- The CAR file is created using Helia for UnixFS DAG creation. |
| 78 | +- Individual files are wrapped in a directory so that the filename is preserved via UnixFS metadata. |
| 79 | + |
| 80 | +*Outputs:* |
| 81 | + |
| 82 | +v1 CAR containing the Merkle DAG representing the provided file. There is one root in the CAR, and it represents the root of the DAG for the input file. This is referred to as the "[IPFS Root CID](glossary.md#ipfs-root-cid)". |
| 83 | + |
| 84 | +*Expected duration:* |
| 85 | + |
| 86 | +This is a function of the size of the input file and the hardware. Typical DAGification of files and directories is relatively quick as it's simply a matter of chunking and hashing using common algorithms. The most time-consuming part is the generation of the ["Piece CID](glossary.md#piece-cid) of the whole CAR on the client side prior to upload, where a a 1Gb input can take upwards of a minute. As the car is being created, it can be streamed to an SP, which is most likely the bottleneck. |
| 87 | + |
| 88 | +### Upload CAR |
| 89 | + |
| 90 | +*What/why:* |
| 91 | + |
| 92 | +The [Service Provider](glossary.md#service-provider) needs to be given the bytes to store so it can serve retrievals and prove to the chain that it possesses them. This is done via an HTTP `PUT /pdp/piece/upload`. |
| 93 | + |
| 94 | +The upload includes [metadata](glossary.md#metadata) that will be stored on-chain: |
| 95 | +- `ipfsRootCid`: The IPFS Root CID, linking the [Piece](glossary.md#piece) back to IPFS |
| 96 | +- `withIPFSIndexing`: Signals the SP to index and advertise to [IPNI](glossary.md#ipni) |
| 97 | + |
| 98 | +*Outputs:* |
| 99 | + |
| 100 | +SP parks the Piece and queues it up for processing, while the client gets an HTTP response with the [Piece CID](glossary.md#piece-cid). The server calculates the Piece CID for the data and confirms that it matches the Piece CID calculated and provided by the Filecoin Pin client to provide assurance that we are providing the exact bytes we expect. |
| 101 | + |
| 102 | +Since the SP has the data for the Piece, it can be retrieved with https://sp.domain/piece/$pieceCid retrieval. |
| 103 | + |
| 104 | +*Expected duration:* |
| 105 | + |
| 106 | +This is a function of the CAR size and the throughput between the client and the SP. |
| 107 | + |
| 108 | +### Index and Advertise CAR CIDs |
| 109 | + |
| 110 | +*What/why:* |
| 111 | + |
| 112 | +At some point after receiving the uploaded [CAR](glossary.md#car), an SP indexing task processes the CAR and creates a local mapping of CIDs to offsets within the CAR so it can serve IPFS style retrievals. Following that, an SP [IPNI](glossary.md#ipni) tasks picks up the local index, makes and IPNI advertisement chain, and then announces the advertisement chain to IPNI indexers like filecoinpin.contact and cid.contact so they know to come and get the advertisement chain to build up their own index. |
| 113 | + |
| 114 | +Filecoin Pin validates the IPNI advertisement process by polling `https://filecoinpin.contact/cid/$cid` (NOT cid.contact due to [negative caching issues discussed below](#how-long-does-an-ipni-indexer-cache-results)). |
| 115 | + |
| 116 | +*Outputs:* |
| 117 | + |
| 118 | +Once the SP has indexed the CAR, it can be directly retrieved from the SP (i.e., bypassing IPFS Mainnet content routing) using https://sp.domain/ipfs/$cid retrieval. |
| 119 | + |
| 120 | +The SP produces a new or updated advertisement chain. By the end, IPNI indexers should have additional provider records for the advertised CIDs. |
| 121 | + |
| 122 | +*Expected duration:* |
| 123 | + |
| 124 | +Local indexing of the CAR is quick as the CAR already contains a list of CIDs and their offsets, which is verified and reused. Creating/updating an advertisement chain and announcing it to IPNI indexers is also quick. There is a delay in an IPNI indexer on the order of seconds for coming to grab the advertisements plus some ingestion delay on the IPNI indexer side. |
| 125 | + |
| 126 | +## Blockchain related steps |
| 127 | + |
| 128 | +Below are the set of steps that are particularly unique from traditional IPFS usage as they involve authorization, payment, and cryptographic proofs. |
| 129 | + |
| 130 | +### Connect Wallet |
| 131 | + |
| 132 | +*What/why:* |
| 133 | + |
| 134 | +Filecoin Pin needs to interface with the Filecoin blockchain to authorize and send payment to [service providers](glossary.md#service-provider) (SPs) for their work of storing and proving possession of data. This requires having a secret key to sign messages sent to the blockchain. |
| 135 | + |
| 136 | +Currently `filecoin-pin` expects to be explicitly passed a private key via environment variable or command line argument. [filecoin-pin-website](glossary.md#filecoin-pin-website) as [pin.filecoin.cloud](http://pin.filecoin.cloud) uses a global [session key](glossary.md#session-key) which can be embedded into source code since it scopes down the set of actions that can be performed and alleviating the need for a user to provide a wallet to perform operations. |
| 137 | + |
| 138 | +*Outputs:* |
| 139 | + |
| 140 | +Once a wallet is connected, [USDFC](glossary.md#usdfc) and [FIL](glossary.md#fil) balances in the wallet itself can be inspected. |
| 141 | + |
| 142 | +*Expected duration:* |
| 143 | + |
| 144 | +Less than 1 second once a wallet private key is provided. |
| 145 | + |
| 146 | +### Setup Filecoin Pay account |
| 147 | + |
| 148 | +*What/why:* |
| 149 | + |
| 150 | +To prepare to make a "deal" with an SP to store data, these actions need to occur: |
| 151 | + |
| 152 | +1. Permit the user's [Filecoin Pay](glossary.md#filecoin-pay) account to use [USDFC](glossary.md#usdfc). This is a one-time authorization. |
| 153 | +2. Approve FilecoinWarmStorage as an operator of Filecoin Pay funds. This is a one-time authorization. |
| 154 | +3. Deposit at least enough funds into Filecoin Pay to cover the lock-up period for the created [CAR](glossary.md#car). |
| 155 | + |
| 156 | +If they haven't occurred before, then they will be handled as part of the first deposit into the Filecoin Pay account from filecoin pin. A single `depositWithPermitAndApproveOperator` transaction handles all of these actions. |
| 157 | + |
| 158 | +*Outputs:* |
| 159 | + |
| 160 | +The Filecoin Pay account has a non-zero balance. |
| 161 | + |
| 162 | +*Expected duration:* |
| 163 | + |
| 164 | +As a single transaction, this takes ~30 seconds to be confirmed onchain. |
| 165 | + |
| 166 | +### Identify a Data Set SP and ID |
| 167 | + |
| 168 | +*What/why:* |
| 169 | + |
| 170 | +In order to upload a [CAR](glossary.md#car), Filecoin Pin needs to identify the SP to upload to. This strategy is followed (assuming no overrides are provided): |
| 171 | + |
| 172 | +1. If the chain has record of a [Data Set](glossary.md#data-set) created by the wallet with the Data Set [metadata key](glossary.md#metadata) `source` set to 'filecoin-pin', then that DataSet ID and corresponding SP are used. If there are multiple, then the one storing the most data will be used. |
| 173 | +2. If there is no existing Data Set, then a new Data Set is created using an approved [Service Provider](glossary.md#service-provider) from the [Service Provider Registry](glossary.md#service-provider-registry). |
| 174 | + |
| 175 | +*Outputs:* |
| 176 | + |
| 177 | +- An existing Data Set ID to use or empty if a new Data Set should be created |
| 178 | +- SP id to use for CAR upload and Data Set creation (if needed). |
| 179 | + |
| 180 | +*Expected duration:* |
| 181 | + |
| 182 | +This should take less than a couple of seconds as it involves hitting RPC providers to get chain state. |
| 183 | + |
| 184 | +### Create Data Set if necessary and Add Piece |
| 185 | + |
| 186 | +*What/why:* |
| 187 | + |
| 188 | +A single blockchain transaction that create a [Data Set](glossary.md#data-set) if one doesn't already exist and adds a [Piece](glossary.md#piece) to the Data Set for the corresponding [CAR](glossary.md#car) file. This is done as one operation rather than just "Create Data Set" and "Add Piece" to improve interaction latency. The Piece uses a Filecoin-internal hash function resulting in a [Piece CID](glossary.md#piece-cid), which is what is stored onchain. The [Filecoin Warm Storage Service](glossary.md#filecoin-warm-storage-service) then has record of what SP is storing which data that it needs to periodically prove it has possession of. Filecoin Pin stores additional [metadata](glossary.md#metadata) on the piece denoting that the uploaded data should be indexed by the SP and advertised to [IPNI](glossary.md#ipni) indexers. |
| 189 | + |
| 190 | +*Outputs:* |
| 191 | + |
| 192 | +A record onchain denoting the data that needs to periodically be proven to be in the possession of the Data Set’s SP. |
| 193 | + |
| 194 | +*Expected duration:* |
| 195 | + |
| 196 | +As a single transaction, this takes ~30 seconds to be confirmed onchain. |
0 commit comments