Skip to content

Conversation

@olizilla
Copy link

@olizilla olizilla commented Oct 5, 2022

Adds treeleaf strategy to put all non-leaf nodes in the first CAR, and then all leaf nodes in subsequent CARs.

the idea, for feedback, is to get something like this working:

import { TreeleafCarSplitter } from 'carbites/treeleaf'
import { CarReader } from '@ipld/car/reader'
import { printTree } from 'imaginary-car-tree-lib'
import fs from 'fs'

const bigCar = await CarReader.fromIterable(fs.createReadStream('/path/to/big.car'))
const targetSize = 1024 * 1024 * 100 // chunk to ~100MB CARs
const splitter = new TreeleafCarSplitter(bigCar, targetSize)

const count = 0
for await (const car of splitter.cars()) {
  const reader = await CarReader.fromIterable(car)
  const [splitCarRootCid] = await reader.getRoots()
  console.log(`## car ${count} – root: ${splitCarRootCid}`)
  console.log(printTree(reader))
  count++
}
/* output:
## car 0 - root: CID(bafybeidpq6auelplqkozwd3kobvgv4jow72r2yt6ffjzenemqhylmmxngi)
bafybeidpq6auelplqkozwd3kobvgv4jow72r2yt6ffjzenemqhylmmxngi
└─┬ bafybeibw77p4afchrvmo6ep4fpv7j4aehrn2yqs3tv3uuofc6oag36zq2q
  ├─┬ bafybeih2w5euyf3sodc6efxtpahgwkata46fupjysi4bbnazb4n2b25rry
  │ └─┬ bafybeif5xvhik6thha5ykg3g73jo3mqd5mhb7orzeznnvwquashmwryhai
  │   ├── bafkreidkmypp3asq3xiu6mmtcfbzmmhcobpmnaqcyns6535w3u7bz7el64  ❌ missing
  │   ├── bafkreid3pefuwyvhsnlrz6xk67f4f6opgqhcrle4bf47kvachwojdgt7re  ❌ missing
  │   └── bafkreigr3beehu5ebbgoisx4rl2vyaqebmohubauc6avefodb5jauz3tyi  ❌ missing
  └─┬ bafybeia2i6eqwfqrixh446pavwev37kywowmdpgvx34fbv7asdh3qwpm3y
    └── bafkreih2bhak5yv7g4vft5c37j7dw5rqnnsyyuzsifczehhhpm3t655oae    ❌ missing

## car 1 - root: CID(bafkqaaa)
bafkreidkmypp3asq3xiu6mmtcfbzmmhcobpmnaqcyns6535w3u7bz7el64
bafkreid3pefuwyvhsnlrz6xk67f4f6opgqhcrle4bf47kvachwojdgt7re
bafkreigr3beehu5ebbgoisx4rl2vyaqebmohubauc6avefodb5jauz3tyi
bafkreih2bhak5yv7g4vft5c37j7dw5rqnnsyyuzsifczehhhpm3t655oae
*/

Note: no attempt is made to restrict the tree CAR size... if you have a huge tree and a smol target size, it may well exceeed it... this PR is a sketch of the idea of the "put all the tree meta data in one CAR, and the actual file data (leaf nodes) in other cars.

License: MIT
Signed-off-by: Oli Evans [email protected]

Adds `treeleaf` strategy to put all non-leaf nodes in the first CAR, and then all leaf nodes in subsequent CARs.

Note: no attempt is made to restrict the tree CAR size... if you have a huge tree and a smol target size, it may well exceeed it... this PR is a sketch of the idea of the "put all the tree meta data in one CAR, and the actual file data (leaf nodes) in other cars.

License: MIT
Signed-off-by: Oli Evans <[email protected]>
@olizilla olizilla requested a review from alanshaw October 5, 2022 21:18
Copy link
Collaborator

@alanshaw alanshaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a super cool idea, I'm not sure I understand why in carbites though? What's the next step? How do we make use of a DAG where the tree is split from the leaves?

* @typedef {import('@ipld/car/api').BlockReader & import('@ipld/car/api').RootsReader} ICarReader
*/

export class TreeleafCarJoiner {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this just the simple joiner?

}
}
tree.writer.close()
yield tree.out
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If DAG of 1 block, this is an empty CAR...what to do? Is it ok?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants