Skip to content

Commit 569969d

Browse files
warpforkZenGround0
andauthored
Extend documentation around Flush and cached nodes (#61)
* Extend documentation around Flush and cached nodes It took me quite a while to understand what's going on here, and I agree with the existing comments from rvagg that it's probably not very optimal. There's not a lot in the way of options for controlling memory usage by this "cache"; and it's also deeply critical to the actual operating of writes, so it sort of feels like a misnomer to call it "cache" at all. However, I'm going to keep my mission in this commit to "document" rather than "change": if nothing else, I think some documentation about how to actually persist changes is useful, because it wasn't at all obvious to me that it takes muliple function calls with a variety of different parameters, and the order of those calls is consequential. * Rework comments to describe new caching behavior authored-by: warpfork Co-authored-by: ZenGround0 <[email protected]>
1 parent 83589ca commit 569969d

File tree

1 file changed

+50
-6
lines changed

1 file changed

+50
-6
lines changed

hamt.go

Lines changed: 50 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -109,7 +109,29 @@ type Pointer struct {
109109
KVs []*KV `refmt:"v,omitempty"`
110110
Link cid.Cid `refmt:"l,omitempty"`
111111

112-
// cached node to avoid too many serialization operations
112+
// cache is a pointer to an in-memory Node, which may or may not be
113+
// present, and corresponds to the Link field, which also may or may not
114+
// be present.
115+
//
116+
// If present, the cached Node should be semantically substitutable with
117+
// the Link field. It makes no sense for a cache Node to be present if KVs
118+
// is set. Link might not be set, if cache is present and is describing
119+
// data that has never yet been serialized and stored.
120+
//
121+
// `loadChild` will short circut to return this node if the pointer isn't
122+
// nil;
123+
// `loadChild` will also set this pointer when loading a node that wasn't
124+
// yet present cached.
125+
// `Flush` on a `Node` will iterate through each `Pointer` and `Put` its
126+
// cache node if:
127+
// 1. The Pointer's cache is not nil
128+
// 2. The Pointer's dirty flag is true
129+
// (and also recurse to `Flush` on that `Node`) -- in other words,
130+
// `Flush` writes out the cached data
131+
// `Flush` will assign `Link` in the process of `Put`'ing the 'cache' data.
132+
// `Copy` will copy any cached nodes, Link fields and dirty flags.
133+
//
134+
// `Link` becomes defined on`Flush`
113135
cache *Node
114136
// dirty flag to indicate that the cached node needs to be flushed
115137
dirty bool
@@ -438,10 +460,23 @@ func (n *Node) checkSize(ctx context.Context) (uint64, error) {
438460
return totsize, nil
439461
}
440462

441-
// Flush saves and purges any cached Nodes recursively from this Node through
442-
// its (cached) children. Cached nodes primarily exist through the use of
443-
// Copy() operations where the entire graph is instantiated in memory and each
444-
// child pointer exists in cached form.
463+
// Flush has two effectis, it (partially!) persists data and resets dirty flag
464+
//
465+
// Flush operates recursively, telling each "cache" child node to flush;
466+
// Put'ing that "cache" node to the store;
467+
// updating this node's Link to the CID resulting from the store Put;
468+
// clearing the dirty flag of that pointer to flase
469+
// and then returning.
470+
// Flush doesn't operate unless there's a "cache" node.
471+
//
472+
// "cache" nodes were previously storing either updated values,
473+
// or, simply storing previously loaded data; these are disambiguated by the
474+
// dirty flag which is true when the cache node's data has not been persisted
475+
//
476+
// Notice that Flush _does not_ Put _this node_.
477+
// To fully persist changes, the caller still needs to Put this node to the
478+
// store themselves, and store the new resulting Link wherever they expect the
479+
// updated HAMT to be seen.
445480
func (n *Node) Flush(ctx context.Context) error {
446481
for _, p := range n.Pointers {
447482
if p.cache != nil && p.dirty {
@@ -463,6 +498,10 @@ func (n *Node) Flush(ctx context.Context) error {
463498

464499
// Set key k to value v, where v is has a MarshalCBOR(bytes.Buffer) method to
465500
// encode it.
501+
//
502+
// To fully commit the change, it is necessary to Flush the root Node,
503+
// and then additionally Put the root node to the store itself,
504+
// and save the resulting CID wherever you expect the HAMT root to persist.
466505
func (n *Node) Set(ctx context.Context, k string, v interface{}) error {
467506
var d *cbg.Deferred
468507

@@ -596,7 +635,12 @@ func (n *Node) modifyValue(ctx context.Context, hv *hashBits, k []byte, v *cbg.D
596635
child := n.getPointer(cindex)
597636
if child.isShard() {
598637
// if isShard, we have a pointer to a child that we need to load and
599-
// delegate our modify operation to
638+
// delegate our modify operation to.
639+
// Note that this loadChild operation will cause the loaded node to be
640+
// "cached" and this pointer to be marked as dirty;
641+
// it is an eventual Flush passing back over this "cache" node which
642+
// causes the updates made to the in-memory "cache" node to eventually
643+
// be persisted.
600644
chnd, err := child.loadChild(ctx, n.store, n.bitWidth, n.hash)
601645
if err != nil {
602646
return err

0 commit comments

Comments
 (0)