You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: DOCUMENTATION.md
+29-29Lines changed: 29 additions & 29 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -476,49 +476,49 @@ The most significant effect of storing event in tip on reactors (and projection
476
476
477
477
## Things to get right
478
478
479
+
- Track versions of inputs for all derived state
480
+
- At least once delivery is a fact of life - you need to be able to handle the sequence S1E0, S1E1, S1E2, S1E1, S1E3 correctly
481
+
- simplest approach is to maintain a high water mark source `version` alongside any derived data you maintain. This enables you to easily disregard inputs that have already been ingested into your model
479
482
- Idempotency is critical; replays should not trigger writes
480
-
- safe idempotent processing is non-negotiable
481
-
- being able to apply updates without triggering null writes is key for most target stores, regardless of whether there is a direct Request Unit measure associated with the consumption
482
-
- replays should neer expose historic states of the view
483
-
- Think in spans
484
-
- Similar to how a SQL database woks best when you think in terms of sets, applying events to a derived state should never be thought of as happening one event at a time
485
-
- At least once delivery is a fact of life
486
-
- as part of managing at least once delivery, the simplest approach is to maintain a high water mark source `version` alongside any derived data you maintain. This enables you to simply disregard inputs that have already been ingested into your model
487
-
- Tolerate gaps
488
-
- assuming that you are dealing with a contiguous span of events and/or considering a single event's position (sequence number) in its stream is counterproductive
489
-
- removing this assumption allows you (or the source streams) to be trimmed or filtered in response to the inevitable change that any system worth considering will undergo
483
+
- safe idempotent processing is non-negotiable due to at least once delivery
484
+
- avoiding null writes is key, regardless of whether there is a direct Request Unit measure associated
485
+
- doing this correctly also avoids exposing historic states during replays
486
+
- if items can get removed when they reach a terminal state, consider a cached read of the upstream rather than maintaining a tombstone
487
+
- All processing should handle batches of events
488
+
- Similar to how a SQL database works best when you think in terms of sets, applying events to a derived state should never be thought of as happening one event at a time
489
+
- Tolerate gaps; do not depend on individual event `Index` values
490
+
- this allows the batch (or the source streams) to be trimmed or filtered in response to the inevitable change that any system worth considering will undergo
490
491
- Set limits
491
-
- every view needs a basis whereby you guarantee it will not overflow
492
492
- there are no unlimited data structures in CosmosDb - items are max 4MB, logical streams max 20/50GB
493
-
- while cross-partition queries should not be a first choice in CosmosDb, careful consideration should be given to whether fitting everything into a single logical stream is the correct design
494
-
495
-
## Sweet spots
496
-
497
-
- Think in units of 1MB of state. See limits, above. If the back of the envelope says you'll be merging inputs into larger datasets, find a way to split/shard/epoch in order to maintain Billing Model Sympathy
498
-
- If you can get stuff into 1-4MB, you can cache it with periodic checks on read costing 1RU. The state can be compressed internally, so that amount of data can go further than you'd think
499
-
- If you need to do cross-document queries, you can selectively expose (uncompressed) indexes and use CosmosStore.Linq querying to locate items to be loaded efficiently
500
-
501
-
## Conventions
502
-
493
+
- every view needs a basis whereby you guarantee it will not overflow
494
+
- be careful about accumulating tombstones unless something limits the number that can be created over time
495
+
- ensure upstreams have guards in place so Views can be kept simple
503
496
- Don't version view data structures; just roll a new Category Name
504
-
- Because replays are efficient, it's easier to lay down a restructured parallel variant of a given view until the last consumer of the old structure is decommissioned, than to have complexiyt in your data structures that you'd normally accept as a fact of life for an actual event sourced model
497
+
- avoid complexity and tricks in View data structures that you'd normally accept as a fact of life for an actual event sourced model
498
+
- Because replays are efficient, it's easier to lay down a restructured parallel variant of a given view until the last consumer of the old structure is decommissioned
505
499
- Name your Categories starting with `$Name0.rc1` ala semantic versioning
506
-
-if in doubt, or you need to validate you can safely and correctly build a given set of views, simply increment the suffix
507
-
- when a view is stable and/or you're promoting it to production, remove the `.rcn` appendages
500
+
-enables validating can safely and correctly build a given set of views at will by simply incrementing the suffix
501
+
- when a view is stable and/or you're promoting it to production, remove the `.rcN` suffix
508
502
509
-
## TODO
503
+
## Heuristics
504
+
505
+
- Think in units of 1MB of state
506
+
- If the back of the envelope says you'll be merging inputs into larger datasets, find a way to split/shard/epoch in order to maintain Billing Model Sympathy
507
+
- If you can get stuff into 1-4MB, you can cache it with periodic checks on read costing 1RU. The state can be compressed internally, so that amount of data can go further than you'd think
508
+
- while cross-partition queries should not be a first choice in CosmosDb, careful consideration should be given to whether fitting everything into a single logical stream is the correct design
509
+
- If you need to do cross-document queries, you can selectively expose (uncompressed) indexes and use `CosmosStore.Linq` to efficiently pick items to be loaded
510
510
511
-
###Testing
511
+
# Testing
512
512
513
-
####Unit testing projections
513
+
## Unit testing projections
514
514
515
515
- No Propulsion required, but MemoryStore can help
0 commit comments