CAR based Gateway implementation

## Done Criteria
While there is an implementation of `gateway.IPFSBackend` that can leverage retrievals of CAR files with the relevant data in them.

It should implemented the proposed version of the API [here]( https://github.com/ipfs/boxo/blob/3de57127d48c1c388cec286fa55af50264cb4720/gateway/gateway.go#L88), which shouldn't have major changes before the above PR lands.


### Implementation stages

## Why Important

## Implementation Phases

- [x] **(1)** Fetch CAR into per-request memory blockstore and serve response
  - done: https://github.com/ipfs/bifrost-gateway/pull/61
- [x] **(2)** Fetch CAR into shared memory blockstore and serve response along with a blockservice that does block requests for missing data
  - done: https://github.com/ipfs/bifrost-gateway/pull/61
- [ ] **(3)** Start doing the walk locally and then if a path segment is incomplete send a request for a CAR/blocks and upon every received block try to continue using the blockservice
  - wip: https://github.com/ipfs/bifrost-gateway/pull/75
- [ ] **(4)** Start doing the walk locally and keep a list of "plausible" blocks, if after issuing a request we get a non-plausible block then report them and attempt to recover by redoing the last segment
- [x] **(5)** Don't redo the last segment fully if it's part of a UnixFS file and we can do range requests

## Details and Dependencies

ECD: 2023-03-27

- [x] https://github.com/ipfs/boxo/issues/173 (resolved by https://github.com/ipfs/boxo/pull/176 ).  It will now be possible to build an IPFS HTTP Gateway implementation where individual HTTP requests are more closely tied to Go API calls into a configurable backend.

### Blockers for mirroring traffic for Rhea
ECD: 2023-03-29

- [x] Resolve memory issues
- [x] Add more metrics tracking to the new implementation

The work is happening in #61.  See there for more details

### Blockers for production traffic for Rhea
ECD: TBD - Date for a date/plan: 2023-03-30

We need to have sufficient testing of the bifrost-gateway code given we aren't able to run Kubo's battery of sharness tests against it (per https://github.com/ipfs/bifrost-gateway/issues/58 ).

Options being considered:
- Enough of testing in #66 that we can be reasonably confident in the new implementation
   - Note: we may want to be cautious in some of our implementation work here to increase the chance that kubo sharness tests will catch errors while the conformance tests improve (i.e. use something like the current strategy with the same `BlocksGateway` implementation kubo uses but with DAG prefetching of blocks happening underneath)
- Can happen alongside some confidence building by comparing production ipfs.io/dweb.link traffic status codes + response sizes to Rhea ones.

### Completion tasks to mark this done-done-done

- [x] Turning an inbound `gateway.IPFSBackend` request into a CAR request (should be relatively straightforward)
- [x] Doing incremental verification of the responses
- [x] Handle what happens if the CAR response sends back bad data (e.g. for Caboose report the problem upstream)
- [x] Handle what happens if the CAR response dies in the middle (i.e. resumption or restarting of download)
- [x] Handle OOM/out-of-disk-space errors
   - because the CAR responses do not have duplicate blocks, but a block may be reused in a graph traversal, either the entire graph needs to be buffered/stored before the blocks are thrown away or it needs to be possible to re-issue block requests for data we recently received but might have thrown away

## Additional Notes
There already is an implementation of `gateway.IPFSBackend` that uses the existing tooling for block-based storage/retrieval [here](https://github.com/ipfs/boxo/blob/3de57127d48c1c388cec286fa55af50264cb4720/gateway/blocks_gateway.go#L80) (and related to https://github.com/ipfs/bifrost-gateway/pull/57).

Some details related to Caboose:
- Since Caboose is in charge of selecting which Saturn peers to ask for which content there may be some affinity information (perhaps just what already exists) that it wants in order to optimize which nodes it sends requests to (e.g. for a given CAR request that fulfills an IPFS HTTP Gateway request understanding if it wants to split the load, send it all to a specific L1, send it to a set of L1s, etc.).
- IIUC the current plan is to send all data for a given high level IPFS HTTP Gateway request to a single L1 which shouldn't be too bad. Note: it may not be exactly 1 IPFS HTTP Gateway request -> 1 CAR file request due to various optimizations however the total number of requests should certainly go down dramatically

---

If we need to make some compromises in the implementation here in order to start collecting some data that's doable, but if so they should be explicitly called out and issues filed. Additionally, it should continue to be possible to use a blocks gateway implementation here via config.

cc @Jorropo @aarshkshah1992 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

CAR based Gateway implementation #62

Done Criteria

Implementation stages

Why Important

Implementation Phases

Details and Dependencies

Blockers for mirroring traffic for Rhea

Blockers for production traffic for Rhea

Completion tasks to mark this done-done-done

Additional Notes

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

CAR based Gateway implementation #62

Description

Done Criteria

Implementation stages

Why Important

Implementation Phases

Details and Dependencies

Blockers for mirroring traffic for Rhea

Blockers for production traffic for Rhea

Completion tasks to mark this done-done-done

Additional Notes

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions