Commit ce93f0b
authored
When multiple fragments have the same timestamp, the sparse global order
reader (and likely others) do not have "repeatable reads". That is, if
you run the same read query multiple times, then you do not receive the
same result.
The story provides a reproducer which does something like:
```
for i in range(0, 10):
write a new fragment A[0] = i at timestamp t=1
for j in range(0, 10):
read A[0]
```
Below is the result:
```
python tmp/write_consistency.py
OrderedDict([('a', array([], dtype=int32)), ('d1', array([], dtype=int64))])
Wrote 0, read [0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
Wrote 1, read [0, 1, 1, 1, 1, 1, 1, 1, 1, 0]
Wrote 2, read [0, 1, 0, 0, 1, 2, 1, 0, 1, 1]
Wrote 3, read [0, 0, 0, 0, 3, 2, 2, 1, 2, 1]
Wrote 4, read [0, 3, 4, 1, 4, 2, 1, 2, 1, 4]
Wrote 5, read [1, 1, 5, 4, 3, 0, 0, 4, 0, 0]
Wrote 6, read [4, 6, 0, 2, 0, 3, 4, 4, 4, 4]
Wrote 7, read [4, 4, 4, 4, 7, 4, 5, 4, 4, 7]
Wrote 8, read [5, 0, 0, 7, 7, 0, 0, 4, 4, 4]
Wrote 9, read [8, 4, 3, 5, 2, 7, 9, 8, 2, 7]
```
We can see that the reads do not all produce the same value.
In the story we discussed whether we would be satisfied with "repeatable
read", in which each row from the reproducing script contains just one
distinct value; or whether we need "strict write order", in which each
row `i` contains only values `i`.
Repeatable read is sufficient, so that's what is implemented in this
pull request. An example new result of the above script is:
```
OrderedDict([('a', array([], dtype=int32)), ('d1', array([], dtype=int64))])
Wrote 0, read [0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
Wrote 1, read [0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
Wrote 2, read [0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
Wrote 3, read [0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
Wrote 4, read [0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
Wrote 5, read [0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
Wrote 6, read [6, 6, 6, 6, 6, 6, 6, 6, 6, 6]
Wrote 7, read [6, 6, 6, 6, 6, 6, 6, 6, 6, 6]
Wrote 8, read [6, 6, 6, 6, 6, 6, 6, 6, 6, 6]
Wrote 9, read [6, 6, 6, 6, 6, 6, 6, 6, 6, 6]
```
# Implementation
The implementation ultimately is straightforward. The coordinates
comparator breaks ties in coordinate values using the tile timestamp. We
add an additional tiebreaker which compares the fragment index. The
fragment index is itself determined by the lexicographic ordering of
fragment names, which includes the timestamp and UUID; as such for a
fixed array the fragment indices will always be the same for each read.
The greatest fragment index in typical cases is the most recently
written, so it wins the comparison.
("In typical cases"... meaning what? If two fragments are written at the
same timestamp which is `now()`, then there is logic in the UUID
generation to ensure that the first fragment's UUID is lexicographically
less than the second fragment's UUID. However, this logic is incorrect
if the array was opened at a fixed non-`now()` timestamp. Multiple
fragments written at a fixed timestamp will have uncorrelated UUIDs and
thus can appear in any order when reading the array. Hence we get a
repeatable read order but not always one which matches the physical
order of the writes..)
In any case, I spent a bit of time diving into the above, and there are
some artifacts which I elected to leave in here because they are both
harmless and tested - specifically some additional accessors to
components of `FragmentID`.
---
TYPE: BUG
DESC: repeatable read for multiple fragments written at fixed timestamp
1 parent 022af98 commit ce93f0b
File tree
5 files changed
+299
-21
lines changed- test
- src
- support/src
- tiledb/sm
- fragment
- test
- misc
5 files changed
+299
-21
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
461 | 461 | | |
462 | 462 | | |
463 | 463 | | |
464 | | - | |
| 464 | + | |
465 | 465 | | |
466 | 466 | | |
467 | 467 | | |
| |||
630 | 630 | | |
631 | 631 | | |
632 | 632 | | |
633 | | - | |
634 | | - | |
635 | | - | |
| 633 | + | |
| 634 | + | |
| 635 | + | |
| 636 | + | |
| 637 | + | |
| 638 | + | |
| 639 | + | |
| 640 | + | |
| 641 | + | |
| 642 | + | |
636 | 643 | | |
637 | 644 | | |
638 | 645 | | |
| |||
772 | 779 | | |
773 | 780 | | |
774 | 781 | | |
| 782 | + | |
| 783 | + | |
| 784 | + | |
| 785 | + | |
| 786 | + | |
| 787 | + | |
| 788 | + | |
| 789 | + | |
| 790 | + | |
775 | 791 | | |
776 | 792 | | |
777 | 793 | | |
| |||
797 | 813 | | |
798 | 814 | | |
799 | 815 | | |
800 | | - | |
| 816 | + | |
| 817 | + | |
| 818 | + | |
| 819 | + | |
| 820 | + | |
| 821 | + | |
801 | 822 | | |
802 | 823 | | |
803 | 824 | | |
| |||
806 | 827 | | |
807 | 828 | | |
808 | 829 | | |
809 | | - | |
| 830 | + | |
810 | 831 | | |
811 | 832 | | |
812 | 833 | | |
| |||
822 | 843 | | |
823 | 844 | | |
824 | 845 | | |
825 | | - | |
| 846 | + | |
826 | 847 | | |
827 | | - | |
| 848 | + | |
| 849 | + | |
828 | 850 | | |
829 | 851 | | |
830 | 852 | | |
| |||
2968 | 2990 | | |
2969 | 2991 | | |
2970 | 2992 | | |
| 2993 | + | |
| 2994 | + | |
| 2995 | + | |
| 2996 | + | |
| 2997 | + | |
| 2998 | + | |
| 2999 | + | |
| 3000 | + | |
| 3001 | + | |
| 3002 | + | |
| 3003 | + | |
| 3004 | + | |
| 3005 | + | |
| 3006 | + | |
| 3007 | + | |
| 3008 | + | |
| 3009 | + | |
| 3010 | + | |
| 3011 | + | |
| 3012 | + | |
| 3013 | + | |
| 3014 | + | |
| 3015 | + | |
| 3016 | + | |
| 3017 | + | |
| 3018 | + | |
| 3019 | + | |
| 3020 | + | |
| 3021 | + | |
| 3022 | + | |
| 3023 | + | |
| 3024 | + | |
| 3025 | + | |
| 3026 | + | |
| 3027 | + | |
| 3028 | + | |
| 3029 | + | |
| 3030 | + | |
| 3031 | + | |
| 3032 | + | |
| 3033 | + | |
| 3034 | + | |
| 3035 | + | |
| 3036 | + | |
| 3037 | + | |
| 3038 | + | |
| 3039 | + | |
| 3040 | + | |
| 3041 | + | |
| 3042 | + | |
| 3043 | + | |
| 3044 | + | |
| 3045 | + | |
| 3046 | + | |
| 3047 | + | |
| 3048 | + | |
| 3049 | + | |
| 3050 | + | |
| 3051 | + | |
| 3052 | + | |
| 3053 | + | |
| 3054 | + | |
| 3055 | + | |
| 3056 | + | |
| 3057 | + | |
| 3058 | + | |
| 3059 | + | |
| 3060 | + | |
| 3061 | + | |
| 3062 | + | |
| 3063 | + | |
| 3064 | + | |
| 3065 | + | |
| 3066 | + | |
| 3067 | + | |
| 3068 | + | |
| 3069 | + | |
| 3070 | + | |
| 3071 | + | |
| 3072 | + | |
| 3073 | + | |
| 3074 | + | |
| 3075 | + | |
| 3076 | + | |
| 3077 | + | |
| 3078 | + | |
| 3079 | + | |
| 3080 | + | |
| 3081 | + | |
| 3082 | + | |
| 3083 | + | |
| 3084 | + | |
| 3085 | + | |
| 3086 | + | |
| 3087 | + | |
| 3088 | + | |
| 3089 | + | |
| 3090 | + | |
| 3091 | + | |
| 3092 | + | |
| 3093 | + | |
| 3094 | + | |
| 3095 | + | |
| 3096 | + | |
| 3097 | + | |
| 3098 | + | |
| 3099 | + | |
| 3100 | + | |
| 3101 | + | |
| 3102 | + | |
| 3103 | + | |
| 3104 | + | |
| 3105 | + | |
| 3106 | + | |
| 3107 | + | |
| 3108 | + | |
| 3109 | + | |
| 3110 | + | |
| 3111 | + | |
| 3112 | + | |
| 3113 | + | |
| 3114 | + | |
| 3115 | + | |
| 3116 | + | |
| 3117 | + | |
| 3118 | + | |
| 3119 | + | |
| 3120 | + | |
| 3121 | + | |
| 3122 | + | |
| 3123 | + | |
| 3124 | + | |
| 3125 | + | |
| 3126 | + | |
| 3127 | + | |
| 3128 | + | |
| 3129 | + | |
| 3130 | + | |
| 3131 | + | |
| 3132 | + | |
| 3133 | + | |
2971 | 3134 | | |
2972 | 3135 | | |
2973 | 3136 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
101 | 101 | | |
102 | 102 | | |
103 | 103 | | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
104 | 110 | | |
105 | 111 | | |
106 | 112 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
72 | 72 | | |
73 | 73 | | |
74 | 74 | | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
75 | 81 | | |
76 | 82 | | |
77 | 83 | | |
| |||
94 | 100 | | |
95 | 101 | | |
96 | 102 | | |
97 | | - | |
| 103 | + | |
98 | 104 | | |
99 | 105 | | |
100 | 106 | | |
| |||
103 | 109 | | |
104 | 110 | | |
105 | 111 | | |
106 | | - | |
| 112 | + | |
107 | 113 | | |
108 | 114 | | |
109 | 115 | | |
| |||
118 | 124 | | |
119 | 125 | | |
120 | 126 | | |
121 | | - | |
| 127 | + | |
122 | 128 | | |
123 | 129 | | |
124 | 130 | | |
| |||
132 | 138 | | |
133 | 139 | | |
134 | 140 | | |
135 | | - | |
| 141 | + | |
136 | 142 | | |
137 | 143 | | |
138 | 144 | | |
| |||
0 commit comments