Skip to content

Commit ddf3cca

Browse files
committed
m
1 parent 483ba2e commit ddf3cca

File tree

6 files changed

+272
-17
lines changed

6 files changed

+272
-17
lines changed

TestVectors/dafny/DDBEncryption/src/TestVectors.dfy

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -761,10 +761,10 @@ module {:options "-functionSyntax:4"} DdbEncryptionTestVectors {
761761
TestBucketQueries(rClient, 5, GetBucketQuery51(), trans, "bucket query 51");
762762
TestBucketQueries(rClient, 5, GetBucketQuery23(), trans, "bucket query 23");
763763

764-
TestBucketQueries(rClient, 1, GetBucketQuery15F(), trans, "bucket query 15F", false);
765-
TestBucketQueries(rClient, 2, GetBucketQuery25F(), trans, "bucket query 25F", false);
766-
TestBucketQueries(rClient, 3, GetBucketQuery35F(), trans, "bucket query 35F", false);
767-
TestBucketQueries(rClient, 4, GetBucketQuery45F(), trans, "bucket query 45F", false);
764+
TestBucketQueries(rClient, 1, GetBucketQuery15F(), trans, "bucket query 15F");
765+
TestBucketQueries(rClient, 2, GetBucketQuery25F(), trans, "bucket query 25F");
766+
TestBucketQueries(rClient, 3, GetBucketQuery35F(), trans, "bucket query 35F");
767+
TestBucketQueries(rClient, 4, GetBucketQuery45F(), trans, "bucket query 45F");
768768

769769
var scanCount4 := 0;
770770
var scanCount5 := 0;

specification/changes/2025-08-25-bucket-beacons/background.md

Lines changed: 77 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44
# Bucket Beacons
55

66
`Bucket Beacons` refers to a way to add a little bit more randomness to your [beacons](../../searchable-encryption/beacons.md),
7-
so they will be less likely to leak information when your data distribution is uneven.
7+
to add anonymity when your data distribution is uneven
88

99
Probably read [changes](./change.md) first, as it gives a brief overview of the interface.
1010

@@ -23,7 +23,7 @@ In this case, some hashes will have many many occurrences, and some will have on
2323

2424
An external observer can look at census data and make an educated guess that the hashes with many occurrences are probably "Jones" or "Smith". Pretty soon, you've leaked real information.
2525

26-
## Enter Bucket Beacons
26+
## Introducing Bucket Beacons
2727

2828
One strategy to combat this is to further divide each item's beacons into separate `buckets`,
2929
so that the same value in different records might produce different hashes.
@@ -41,14 +41,15 @@ The hash for a value in bucket number zero is the same as the hash in an unbucke
4141
therefore, there is no difference between "one bucket" and "not using buckets".
4242

4343
Unfortunately, when items are distributed across N buckets, retrieving all of them requires N separate queries.
44+
4445
- Only the `Query` operation is affected.
4546
- `Scan` and `Get` operations continue to work as usual.
4647

4748
The reason Query behaves differently is that it searches against an index, which requires an exact match. This leaves no opportunity to optimize the query with a “this OR that OR other” approach, unlike a Scan which examines all items.
4849

4950
Note: There is no way to determine an item’s bucket just by looking at its encrypted value.
5051

51-
### Performance
52+
### Performance Penalties
5253

5354
Multiple queries will always take longer than one query, however;
5455
if the number of "pages" of results returned by DynamoDB for a query is large compared to the number of buckets,
@@ -57,6 +58,18 @@ then the overall query performance is not impacted very much.
5758
If the query is only expected to return one result, then of course, making five queries will take five time as long,
5859
with four of the queries returning nothing.
5960

61+
### Performance Advantages
62+
63+
On the other hand, bucket beacons can provide performance enhancements as well.
64+
65+
Sometimes a common value can share the same hash as a rare one.
66+
In this case, a search for the rare one pays the penalty of retrieving and discarding all the
67+
matches for the common one.
68+
Bucket beacon reduces the maximum number occurrences of a single hash, reducing this penalty.
69+
70+
Increasing the number of buckets can allow you to increase the length of a standard beacon,
71+
further decreasing the number of results that must be retrieved and discarded.
72+
6073
### Constrained Beacons
6174

6275
Maybe you need a different number of buckets for different standard beacons.
@@ -245,3 +258,64 @@ which tells the interceptor to calculate beacon A for bucket 4 and beacon B for
245258
If you know exactly what data has been written, a long enough list of these would find all of the items.
246259

247260
We don't want to implement this until we're sure someone would actually use it.
261+
262+
## Migration
263+
264+
A non-bucketed table is the same as a table with `maximumNumberOfBuckets == 1`.
265+
266+
Further, each individual beacon can either be considered to be unconstrained,
267+
or constrained to one bucket.
268+
269+
Given that, migration from non-bucketed to bucketed follows the same rules as any other change in bucket settings.
270+
271+
You can always increase maximumNumberOfBuckets.
272+
It won't magically improve the anonymity of you existing data,
273+
but new data will be properly anonymized and both old and new data will be found when doing bucketed searches.
274+
275+
The individual beacon constraints must remain unchanged.
276+
This when you first move to multiple bucket,
277+
each beacon must either be constrained to one bucket, or be unconstrained.
278+
279+
## Test Strategy
280+
281+
### Normal Operation
282+
283+
Create a table with a maximum of 5 buckets, a variety of beacons with different numbers of buckets,
284+
and with GSIs built on a variety of combinations of those indexes.
285+
286+
Put 100 items in that table, with different PK attributes,
287+
but the exact same values for all the other attributes.
288+
We expect around 20 items per bucket, and are pretty guaranteed that no bucket is empty.
289+
290+
Perform a variety of queries against the table.
291+
For each, assert that
292+
293+
- GetNumberOfQueries returns the expected value
294+
- When performing `GetNumberOfQueries` queries,
295+
- each bucket returns at least one value
296+
- every item is returned exactly one
297+
- When performing queries with the bucket number set to `GetNumberOfQueries` or greater, an error is returned.
298+
299+
Perform a very large number of Scans against the table.
300+
These are easier to test, as they do not require a different index for each one, as the Query ones do.
301+
Test that a single Scan returns every item exactly once.
302+
303+
### Test Bucket Selector
304+
305+
Repeat [Normal Operation](#normal-operation), but
306+
307+
- add an attribute to hold a bucket number
308+
- include a [Bucket Selector](../../searchable-encryption/search-config.md#bucket-selector)
309+
that puts each item in the indicated bucket.
310+
- when searching on a bucket, assert that the item was in the correct bucket.
311+
312+
### Test Filter Expressions
313+
314+
The functionality of the [Filter Expressions for Query](../../dynamodb-encryption-client/ddb-support.md#filter-expressions-for-query) is well tested by the above.
315+
316+
For a few scan operations, assert that the text of the calculated filter expressions are as expected.
317+
318+
### Test Compound Beacons
319+
320+
Create some complex compound beacons, create indexes with them,
321+
and repeat the same test as in [Normal Operation](#normal-operation).
Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
[//]: # "Copyright Amazon.com Inc. or its affiliates. All Rights Reserved."
2+
[//]: # "SPDX-License-Identifier: CC-BY-SA-4.0"
3+
4+
# Get Number of Queries
5+
6+
## Overview
7+
8+
When using [Bucket Beacons](../changes/2025-08-25-bucket-beacons/background.md),
9+
more than one [query](https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_Query.html)
10+
can be necessary to retrieve all of the desired results,
11+
leading to code something like this:
12+
13+
```text
14+
for i = 0 to transformClient.GetNumberOfQueries(query)
15+
query.ExpressionAttributeValues.Add(":aws_dbe_bucket", N(i.to_string())
16+
dynamoClient.query(query)
17+
```
18+
19+
## Operation
20+
21+
### Input
22+
23+
This operation MUST take as input the QueryInput structure under consideration.
24+
25+
This operation MUST return the number of queries necessary.
26+
27+
### Behavior
28+
29+
Based on the [standard beacons](../searchable-encryption/beacons.md#standard-beacon)
30+
used in the [KeyConditionExpression](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Query.KeyConditionExpressions.html) of the query, calculate the required number of queries.
31+
32+
This value is the minimum of the calculated value,
33+
and the [maximum number of buckets](../searchable-encryption/search-config.md#max-buckets)
34+
configured for the table.
35+
36+
The calculated value is the least common multiple of the number of buckets for each of the beacons involved.
37+
38+
This is not needed for a [scan](https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_Scan.html)
39+
operation. Only one Scan is needed, regardless of bucket settings.

specification/dynamodb-encryption-client/ddb-support.md

Lines changed: 47 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -78,6 +78,10 @@ and [Encrypt Item Output](./encrypt-item.md#output).
7878
To obtain [Beacon Key Materials] GetEncryptedBeacons
7979
MUST call [Get beacon key after encrypt](../searchable-encryption/search-config.md#get-beacon-key-after-encrypt).
8080

81+
A [bucket number](search-config.md#bucketnumber) MUST be generated
82+
by calling the [bucket selector](search-config.md#bucket-selector).
83+
This [bucket number](search-config.md#bucketnumber) is to be used for all [standard beacons](./searchable-encryption/beacons.md#standard-beacon) in the item.
84+
8185
GetEncryptedBeacons MUST NOT operate on [compound beacons](../searchable-encryption/beacons.md#compound-beacon)
8286
that only have [signed parts](../searchable-encryption/beacons.md#compound-beacon-initialization).
8387

@@ -154,6 +158,18 @@ from [Get beacon key for query](../searchable-encryption/search-config.md#get-be
154158
If the [QueryObject does not have encrypted values](#queryobject-has-encrypted-values)
155159
then QueryInputForBeacons MUST NOT attempt to obtain [Beacon Key Materials](../searchable-encryption/search-config.md#beacon-key-materials).
156160

161+
When querying, a [bucket number](search-config.md#bucketnumber) MUST be determined by examining
162+
the `:aws_dbe_bucket` value in the `ExpressionAttributeValues`.
163+
164+
If this value is absent, a bucket number of `0` MUST be used.
165+
166+
If this value is not of type `N` or fails to hold an integer value
167+
greater than or equal to zero and less than the [max buckets](search-config.md#max-buckets),
168+
an error MUST be returned.
169+
170+
If this value is valid, then this value is used and the `:aws_dbe_bucket` field MUST
171+
be removed from the `ExpressionAttributeValues`.
172+
157173
For any operand in the KeyConditionExpression or FilterExpression which is a beacon name,
158174
the name MUST be replaced by the internal beacon name (i.e. NAME replaced by aws_dbe_b_NAME).
159175

@@ -174,6 +190,9 @@ MUST be obtained from the [Beacon Key Materials](../searchable-encryption/search
174190
[HMAC Keys map](../searchable-encryption/search-config.md#hmac-keys) using the beacon name
175191
as the key.
176192

193+
If [Bucket Beacons](../changes/2025-08-25-bucket-beacons/background.md) are being used,
194+
then the [FilterExpression](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Query.FilterExpression.html) must be augmented as described in [Filter Expressions for Query](#filter-expressions-for-query).
195+
177196
For example if the query is
178197
"MyBeacon = :value" and ExpressionAttributeValues holds (:value = banana),
179198
then the ExpressionAttributeValues must be changed to (:value = 13fd),
@@ -192,6 +211,31 @@ in this situation, but that risks leaking the connection between the two beacons
192211
Similarly, if one of the two was not a beacon, then we would be leaking the fact that
193212
this beacon came from that text.
194213

214+
### Filter Expressions for Query
215+
216+
if [GetNumberOfQueries](./ddb-get-number-of-queries.md) returns a number less than the configured
217+
[maximum number of buckets](../searchable-encryption/search-config.md#max-buckets)
218+
then the FilterExpression MUST be augmented to match against buckets greater than
219+
the limit returned from GetNumberOfQueries.
220+
221+
For each bucket number that would map to the current bucket,
222+
calculate the Filter Expression as for that bucket.
223+
The FilterExpression sent to DynamoDB MUST be the `OR` combination of all of these expressions.
224+
225+
The text of the FilterExpression is unlikely to change between buckets.
226+
What will change is the values in the ExpressionAttributeValues,
227+
which will be different if they involve standard beacons calculated with different buckets.
228+
This implies that additional unique values will need to be added to ExpressionAttributeValues.
229+
230+
As an example, if a table is configured with five buckets,
231+
and GetNumberOfQueries returns two, and `foo[n]` represents the expression as calculated for bucket `n`,
232+
then when `:aws_dbe_bucket = 0` the filter expression must be `(foo[0]) OR (foo[2]) OR (foo[4])`
233+
ands when `:aws_dbe_bucket = 1` the filter expression must be `(foo[1]) OR (foo[3])`.
234+
235+
The resulting FilterExpression might look something like this:
236+
237+
`(aws_dbe_b_Attr3 = :attr3 AND aws_dbe_b_Attr4 = :attr4) OR (aws_dbe_b_Attr3 = :attr3AA AND aws_dbe_b_Attr4 = :attr4AA) OR (aws_dbe_b_Attr3 = :attr3AB AND aws_dbe_b_Attr4 = :attr4AB)`
238+
195239
### QueryObject has encrypted values
196240

197241
Determines if a Query Object has encrypted values (ENCRYPT_AND_SIGN fields)
@@ -260,7 +304,9 @@ any error encountered during filtering MUST result in a failure of the query ope
260304

261305
Transform an unencrypted ScanInput object for searchable encryption.
262306

263-
The ScanInput is transformed in the same way as [QueryInputForBeacons](#queryinputforbeacons).
307+
The ScanInput is transformed in the same way as [QueryInputForBeacons](#queryinputforbeacons),
308+
except that [Filter Expressions for Query](#filter-expressions-for-query) is calculated
309+
as if GetNumberOfQueries returned `1`.
264310

265311
## ScanOutputForBeacons
266312

specification/searchable-encryption/beacons.md

Lines changed: 39 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -5,10 +5,18 @@
55

66
## Version
77

8-
1.0.0
8+
1.1.0
99

1010
### Changelog
1111

12+
- 1.1.0
13+
- add bucket beacons
14+
15+
## Conventions used in this document
16+
17+
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL"
18+
in this document are to be interpreted as described in [RFC 2119](https://tools.ietf.org/html/rfc2119).
19+
1220
## Overview
1321

1422
Beacons use stable hashes of plaintext values of encrypted fields
@@ -276,11 +284,6 @@ If `NAME` appears in an record to be written,
276284
and `NAME` can also be constructed from other parts of the record,
277285
then the write must fail if the constructed and supplied values are not equal.
278286

279-
### Conventions used in this document
280-
281-
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL"
282-
in this document are to be interpreted as described in [RFC 2119](https://tools.ietf.org/html/rfc2119).
283-
284287
### Standard Beacon Initialization
285288

286289
On initialization of a Standard Beacon, the caller MUST provide:
@@ -291,13 +294,17 @@ On initialization of a Standard Beacon, the caller MUST provide:
291294
On initialization of a Standard Beacon, the caller MAY provide:
292295

293296
- a [terminal location](virtual.md#terminal-location) -- a string
294-
- a [beacon style](beacon-style-initialization)
297+
- a [beacon style](#beacon-style-initialization)
298+
- a [number of buckets](#beacon-constraint)
295299

296300
If no [terminal location](virtual.md#terminal-location) is provided,
297301
the `name` MUST be used as the [terminal location](virtual.md#terminal-location).
298302

299303
Initialization MUST fail if two standard beacons are configured with the same location.
300304

305+
Initialization MUST fail if [number of buckets](#beacon-constraint) is specified, and is greater than or equal to
306+
the maximum number of buckets specified in the [beacon version](search-config.md#beacon-version-initialization).
307+
301308
### Beacon Style Initialization
302309

303310
On initialization of a Beacon Style, the caller MUST provide exactly one of
@@ -401,6 +408,19 @@ This name MUST match the name of one of the [encrypted](#encrypted-part-initiali
401408
These parts may come from these locally defined parts lists, or from the
402409
[Global Parts List](search-config.md#global-parts-list), in any combination.
403410

411+
#### Beacon Constraint
412+
413+
A beacon may be constrained to fewer buckets than is specified in the [beacon version](search-config.md#beacon-version-initialization).
414+
415+
If an item is being written or queried as bucket `X`, but the [standard beacon](#standard-beacon-initialization) is constrained to only `N` buckets,
416+
then the bucket used to [encode](#bucket-beacon-encoding) the beacon MUST be `X % N`, where `%` is the modulo or remainder operation.
417+
418+
Examples:
419+
420+
- If a beacon is constrained to one bucket, then it is always encoded as bucket `0`.
421+
- If a beacon is constrained to three buckets, then the beacon is always encoded as bucket `0`, `1` or `2`.
422+
If the current bucket is `7`, then the beacon is encoded as for bucket `1`.
423+
404424
### Default Construction
405425

406426
- If no constructors are configured, a default constructor MUST be generated.
@@ -471,11 +491,23 @@ Both standard and compound beacons define two operations
471491

472492
- This operation MUST convert the attribute value of the associated field to
473493
a sequence of bytes, as per [attribute serialization](../dynamodb-encryption-client/ddb-attribute-serialization.md).
494+
- The serialized form MUST be augmented as per [bucket beacon encoding](#bucket-beacon-encoding).
474495
- This operation MUST return the [basicHash](#basichash) of the resulting bytes and the configured [beacon length](#beacon-length).
475496
- The returned
476497
[AttributeValue](https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_AttributeValue.html)
477498
MUST be type "S" String.
478499

500+
#### Bucket Beacon Encoding
501+
502+
Bucket Beacon Encoding converts a sequence of bytes into a possibly different sequence of bytes.
503+
504+
To calculate the [bucket number](search-config.md#bucketnumber) for this beacon by calculating `X % N`
505+
where `X` is the bucket specified and `N` is the [number of buckets](#beacon-constraint) specified for the beacon.
506+
507+
If this number is zero, then the input sequence of bytes MUST be returned unchanged.
508+
509+
Otherwise, a single byte with a value equal to this calculated bucket number, MUST be appended to the input sequence of bytes.
510+
479511
### value for a set standard beacon
480512

481513
- This operation MUST convert the value of each item in the set to

0 commit comments

Comments
 (0)