Skip to content

Conversation

@albe
Copy link
Owner

@albe albe commented Feb 27, 2021

This change checks the storage for torn writes when force unlocking and truncates all partitions (and indexes) to the last valid write. After a crash, this can be done by doing the following (once):

const eventstore = new EventStore('my-event-store', { storageConfig: { lock: EventStore.LOCK_RECLAIM } });

Related to #31 and #107

@albe albe added enhancement P: Storage Affects the storage layer labels Feb 27, 2021
@coveralls
Copy link

coveralls commented Feb 27, 2021

Coverage Status

Coverage decreased (-0.2%) to 97.428% when pulling 03da3a7 on repair-torn-writes into f02cad3 on master.

This was referenced Feb 27, 2021
@albe
Copy link
Owner Author

albe commented Mar 28, 2021

Torn writes always coincide with a still locked writer. This can currently only be unlocked with an explicit unlock() call, hence the check for torn writes only needs to happen there.

if (!this.lock()) {
return true;
}
this.checkTornWrites();
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doing this here is a waste, because in most cases there was no previously failed write.
This check should only happen when a lock still exists, but is reclaimed. That in turn can only happen when:

  • the user explicitly forces the reclaiming (e.g. by a constructor option)
  • the previous lock has "timed out"

Regarding timing out the lock, a possible way to do this sensibly is this (see proper-lockfile):

  • touch the lock file every X timeframe (to update mtime)
  • when opening, check if the lock is at least 2*X "old" and if so, reclaim it

Only checking if the lock is X time old without touching it would cause trouble with long running applications. Touching the lock every X time is additional work. So if implemented this way, this auto-reclaiming needs to be optional.

For the manual reclaiming, the API needs to be reconsidered:

try {
    eventstore = new EventStore(...);
} catch (EventStoreLockedError e) {
   eventstore = new EventStore(..., { forceLock: true });
}

wrapping class instanciation in a try/catch is awkward at best and if the reclaiming is known to be safe at instanciation time this can always be reduced to eventstore = new EventStore(..., { forceLock: true }); anyway.

eventstore = new EventStore(..., { lock: EventStore.LOCK_AUTO, lockTimeout: 60 }); // Use lock dir with timeout (60s), reclaim+repair if timed out
eventstore = new EventStore(..., { lock: EventStore.LOCK_FORCE }); // Always take the lock, reclaim+repair if lock exists
eventstore = new EventStore(..., { lock: EventStore.LOCK_MANUAL }); // Throw if an second instance is created, needs manual intervention (delete lock) after a crash

Right now the behaviour is LOCK_MANUAL. What would be a good default once the other methods are implemented? Depends on the use-case - LOCK_FORCE is the least safe, LOCK_MANUAL is safest but hard to fix on error and LOCK_AUTO is the dont-make-me-think mode

@albe albe merged commit 8f83faf into master May 16, 2021
@albe albe deleted the repair-torn-writes branch May 16, 2021 15:32
@albe albe mentioned this pull request May 18, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement P: Storage Affects the storage layer

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants