You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: specification/changes/2025-08-25-bucket-beacons/background.md
+77-3Lines changed: 77 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -4,7 +4,7 @@
4
4
# Bucket Beacons
5
5
6
6
`Bucket Beacons` refers to a way to add a little bit more randomness to your [beacons](../../searchable-encryption/beacons.md),
7
-
so they will be less likely to leak information when your data distribution is uneven.
7
+
to add anonymity when your data distribution is uneven
8
8
9
9
Probably read [changes](./change.md) first, as it gives a brief overview of the interface.
10
10
@@ -23,7 +23,7 @@ In this case, some hashes will have many many occurrences, and some will have on
23
23
24
24
An external observer can look at census data and make an educated guess that the hashes with many occurrences are probably "Jones" or "Smith". Pretty soon, you've leaked real information.
25
25
26
-
## Enter Bucket Beacons
26
+
## Introducing Bucket Beacons
27
27
28
28
One strategy to combat this is to further divide each item's beacons into separate `buckets`,
29
29
so that the same value in different records might produce different hashes.
@@ -41,14 +41,15 @@ The hash for a value in bucket number zero is the same as the hash in an unbucke
41
41
therefore, there is no difference between "one bucket" and "not using buckets".
42
42
43
43
Unfortunately, when items are distributed across N buckets, retrieving all of them requires N separate queries.
44
+
44
45
- Only the `Query` operation is affected.
45
46
-`Scan` and `Get` operations continue to work as usual.
46
47
47
48
The reason Query behaves differently is that it searches against an index, which requires an exact match. This leaves no opportunity to optimize the query with a “this OR that OR other” approach, unlike a Scan which examines all items.
48
49
49
50
Note: There is no way to determine an item’s bucket just by looking at its encrypted value.
50
51
51
-
### Performance
52
+
### Performance Penalties
52
53
53
54
Multiple queries will always take longer than one query, however;
54
55
if the number of "pages" of results returned by DynamoDB for a query is large compared to the number of buckets,
@@ -57,6 +58,18 @@ then the overall query performance is not impacted very much.
57
58
If the query is only expected to return one result, then of course, making five queries will take five time as long,
58
59
with four of the queries returning nothing.
59
60
61
+
### Performance Advantages
62
+
63
+
On the other hand, bucket beacons can provide performance enhancements as well.
64
+
65
+
Sometimes a common value can share the same hash as a rare one.
66
+
In this case, a search for the rare one pays the penalty of retrieving and discarding all the
67
+
matches for the common one.
68
+
Bucket beacon reduces the maximum number occurrences of a single hash, reducing this penalty.
69
+
70
+
Increasing the number of buckets can allow you to increase the length of a standard beacon,
71
+
further decreasing the number of results that must be retrieved and discarded.
72
+
60
73
### Constrained Beacons
61
74
62
75
Maybe you need a different number of buckets for different standard beacons.
@@ -245,3 +258,64 @@ which tells the interceptor to calculate beacon A for bucket 4 and beacon B for
245
258
If you know exactly what data has been written, a long enough list of these would find all of the items.
246
259
247
260
We don't want to implement this until we're sure someone would actually use it.
261
+
262
+
## Migration
263
+
264
+
A non-bucketed table is the same as a table with `maximumNumberOfBuckets == 1`.
265
+
266
+
Further, each individual beacon can either be considered to be unconstrained,
267
+
or constrained to one bucket.
268
+
269
+
Given that, migration from non-bucketed to bucketed follows the same rules as any other change in bucket settings.
270
+
271
+
You can always increase maximumNumberOfBuckets.
272
+
It won't magically improve the anonymity of you existing data,
273
+
but new data will be properly anonymized and both old and new data will be found when doing bucketed searches.
274
+
275
+
The individual beacon constraints must remain unchanged.
276
+
This when you first move to multiple bucket,
277
+
each beacon must either be constrained to one bucket, or be unconstrained.
278
+
279
+
## Test Strategy
280
+
281
+
### Normal Operation
282
+
283
+
Create a table with a maximum of 5 buckets, a variety of beacons with different numbers of buckets,
284
+
and with GSIs built on a variety of combinations of those indexes.
285
+
286
+
Put 100 items in that table, with different PK attributes,
287
+
but the exact same values for all the other attributes.
288
+
We expect around 20 items per bucket, and are pretty guaranteed that no bucket is empty.
289
+
290
+
Perform a variety of queries against the table.
291
+
For each, assert that
292
+
293
+
- GetNumberOfQueries returns the expected value
294
+
- When performing `GetNumberOfQueries` queries,
295
+
- each bucket returns at least one value
296
+
- every item is returned exactly one
297
+
- When performing queries with the bucket number set to `GetNumberOfQueries` or greater, an error is returned.
298
+
299
+
Perform a very large number of Scans against the table.
300
+
These are easier to test, as they do not require a different index for each one, as the Query ones do.
301
+
Test that a single Scan returns every item exactly once.
302
+
303
+
### Test Bucket Selector
304
+
305
+
Repeat [Normal Operation](#normal-operation), but
306
+
307
+
- add an attribute to hold a bucket number
308
+
- include a [Bucket Selector](../../searchable-encryption/search-config.md#bucket-selector)
309
+
that puts each item in the indicated bucket.
310
+
- when searching on a bucket, assert that the item was in the correct bucket.
311
+
312
+
### Test Filter Expressions
313
+
314
+
The functionality of the [Filter Expressions for Query](../../dynamodb-encryption-client/ddb-support.md#filter-expressions-for-query) is well tested by the above.
315
+
316
+
For a few scan operations, assert that the text of the calculated filter expressions are as expected.
317
+
318
+
### Test Compound Beacons
319
+
320
+
Create some complex compound beacons, create indexes with them,
321
+
and repeat the same test as in [Normal Operation](#normal-operation).
This operation MUST take as input the QueryInput structure under consideration.
24
+
25
+
This operation MUST return the number of queries necessary.
26
+
27
+
### Behavior
28
+
29
+
Based on the [standard beacons](../searchable-encryption/beacons.md#standard-beacon)
30
+
used in the [KeyConditionExpression](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Query.KeyConditionExpressions.html) of the query, calculate the required number of queries.
31
+
32
+
This value is the minimum of the calculated value,
33
+
and the [maximum number of buckets](../searchable-encryption/search-config.md#max-buckets)
34
+
configured for the table.
35
+
36
+
The calculated value is the least common multiple of the number of buckets for each of the beacons involved.
37
+
38
+
This is not needed for a [scan](https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_Scan.html)
39
+
operation. Only one Scan is needed, regardless of bucket settings.
Copy file name to clipboardExpand all lines: specification/dynamodb-encryption-client/ddb-support.md
+47-1Lines changed: 47 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -78,6 +78,10 @@ and [Encrypt Item Output](./encrypt-item.md#output).
78
78
To obtain [Beacon Key Materials] GetEncryptedBeacons
79
79
MUST call [Get beacon key after encrypt](../searchable-encryption/search-config.md#get-beacon-key-after-encrypt).
80
80
81
+
A [bucket number](search-config.md#bucketnumber) MUST be generated
82
+
by calling the [bucket selector](search-config.md#bucket-selector).
83
+
This [bucket number](search-config.md#bucketnumber) is to be used for all [standard beacons](./searchable-encryption/beacons.md#standard-beacon) in the item.
84
+
81
85
GetEncryptedBeacons MUST NOT operate on [compound beacons](../searchable-encryption/beacons.md#compound-beacon)
82
86
that only have [signed parts](../searchable-encryption/beacons.md#compound-beacon-initialization).
83
87
@@ -154,6 +158,18 @@ from [Get beacon key for query](../searchable-encryption/search-config.md#get-be
154
158
If the [QueryObject does not have encrypted values](#queryobject-has-encrypted-values)
155
159
then QueryInputForBeacons MUST NOT attempt to obtain [Beacon Key Materials](../searchable-encryption/search-config.md#beacon-key-materials).
156
160
161
+
When querying, a [bucket number](search-config.md#bucketnumber) MUST be determined by examining
162
+
the `:aws_dbe_bucket` value in the `ExpressionAttributeValues`.
163
+
164
+
If this value is absent, a bucket number of `0` MUST be used.
165
+
166
+
If this value is not of type `N` or fails to hold an integer value
167
+
greater than or equal to zero and less than the [max buckets](search-config.md#max-buckets),
168
+
an error MUST be returned.
169
+
170
+
If this value is valid, then this value is used and the `:aws_dbe_bucket` field MUST
171
+
be removed from the `ExpressionAttributeValues`.
172
+
157
173
For any operand in the KeyConditionExpression or FilterExpression which is a beacon name,
158
174
the name MUST be replaced by the internal beacon name (i.e. NAME replaced by aws_dbe_b_NAME).
159
175
@@ -174,6 +190,9 @@ MUST be obtained from the [Beacon Key Materials](../searchable-encryption/search
174
190
[HMAC Keys map](../searchable-encryption/search-config.md#hmac-keys) using the beacon name
175
191
as the key.
176
192
193
+
If [Bucket Beacons](../changes/2025-08-25-bucket-beacons/background.md) are being used,
194
+
then the [FilterExpression](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Query.FilterExpression.html) must be augmented as described in [Filter Expressions for Query](#filter-expressions-for-query).
195
+
177
196
For example if the query is
178
197
"MyBeacon = :value" and ExpressionAttributeValues holds (:value = banana),
179
198
then the ExpressionAttributeValues must be changed to (:value = 13fd),
@@ -192,6 +211,31 @@ in this situation, but that risks leaking the connection between the two beacons
192
211
Similarly, if one of the two was not a beacon, then we would be leaking the fact that
193
212
this beacon came from that text.
194
213
214
+
### Filter Expressions for Query
215
+
216
+
if [GetNumberOfQueries](./ddb-get-number-of-queries.md) returns a number less than the configured
217
+
[maximum number of buckets](../searchable-encryption/search-config.md#max-buckets)
218
+
then the FilterExpression MUST be augmented to match against buckets greater than
219
+
the limit returned from GetNumberOfQueries.
220
+
221
+
For each bucket number that would map to the current bucket,
222
+
calculate the Filter Expression as for that bucket.
223
+
The FilterExpression sent to DynamoDB MUST be the `OR` combination of all of these expressions.
224
+
225
+
The text of the FilterExpression is unlikely to change between buckets.
226
+
What will change is the values in the ExpressionAttributeValues,
227
+
which will be different if they involve standard beacons calculated with different buckets.
228
+
This implies that additional unique values will need to be added to ExpressionAttributeValues.
229
+
230
+
As an example, if a table is configured with five buckets,
231
+
and GetNumberOfQueries returns two, and `foo[n]` represents the expression as calculated for bucket `n`,
232
+
then when `:aws_dbe_bucket = 0` the filter expression must be `(foo[0]) OR (foo[2]) OR (foo[4])`
233
+
ands when `:aws_dbe_bucket = 1` the filter expression must be `(foo[1]) OR (foo[3])`.
234
+
235
+
The resulting FilterExpression might look something like this:
236
+
237
+
`(aws_dbe_b_Attr3 = :attr3 AND aws_dbe_b_Attr4 = :attr4) OR (aws_dbe_b_Attr3 = :attr3AA AND aws_dbe_b_Attr4 = :attr4AA) OR (aws_dbe_b_Attr3 = :attr3AB AND aws_dbe_b_Attr4 = :attr4AB)`
238
+
195
239
### QueryObject has encrypted values
196
240
197
241
Determines if a Query Object has encrypted values (ENCRYPT_AND_SIGN fields)
@@ -260,7 +304,9 @@ any error encountered during filtering MUST result in a failure of the query ope
260
304
261
305
Transform an unencrypted ScanInput object for searchable encryption.
262
306
263
-
The ScanInput is transformed in the same way as [QueryInputForBeacons](#queryinputforbeacons).
307
+
The ScanInput is transformed in the same way as [QueryInputForBeacons](#queryinputforbeacons),
308
+
except that [Filter Expressions for Query](#filter-expressions-for-query) is calculated
0 commit comments