Skip to content

Commit d2753e5

Browse files
authored
chore(ADR): Create fractional-non-string-rand-units.md (#1783)
## This PR - Proposes `Support Non-String Inputs for Fractional Bucketing` ADR ### Related Issues #1737 --------- Signed-off-by: cupofcat <[email protected]> Signed-off-by: Maks Osowski <[email protected]>
1 parent 44edcc9 commit d2753e5

File tree

1 file changed

+169
-0
lines changed

1 file changed

+169
-0
lines changed
Lines changed: 169 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,169 @@
1+
---
2+
# Valid statuses: draft | proposed | rejected | accepted | superseded
3+
status: draft
4+
author: Maks Osowski (@cupofcat)
5+
created: 2025-08-21
6+
updated: 2025-09-02
7+
---
8+
9+
# Harden Hashing Consistency And Add Support For Non-string Attributes in Fractional Evaluation
10+
11+
This proposal aims to enhance the `fractional` operator to:
12+
13+
1. Explicitly ensure hashes are consistent across all providers and platforms.
14+
2. Support non-string values as the hashing input (i.e., the randomization unit).
15+
16+
Currently, all inputs are coerced to strings before hashing, which, in some rare cases, can lead to inconsistent bucketing across different provider implementations (e.g. Java provider running on a non UTF-8 platform). With this change, the targeting attributes of various types will be supported and will always be explicitly encoded in a consistent, language- and platform-independent manner in every provider
17+
18+
This change will be backward-compatible in terms of flags schema but will be a breaking behavioral change for 100% of the users due to rebucketing.
19+
20+
```json
21+
"fractional": [
22+
{
23+
// This will now work for non-string types
24+
"var": "my-non-string-var"
25+
},
26+
["a", 50],
27+
["b", 50]
28+
]
29+
```
30+
31+
## Background
32+
33+
The `fractional` operator in flagd determines bucket allocation (e.g., for percentage rollouts) by hashing an input value. Currently, there are two primary methods for providing this input:
34+
35+
1. **Implicitly:** Providing a string-type `targetingKey` in the evaluation context, which is used if the `fractional` block only contains the variant distribution.
36+
2. **Explicitly:** Providing an expression as the first element of the `fractional` array. Today, this expression *must* evaluate to a string (standard recommendation is to use the `"cat"` operator with `$flagd.flagKey` and `"var"`); that string will be used as hashing input, usually via murmur's `StringSum32` method.
37+
38+
The requirement that the input evaluates to a string has two main drawbacks:
39+
40+
* **Inconsistent Hashing:** Different providers (Go, PHP, Java) may encode the same string into bytes differently (e.g., UTF-8 vs UTF-16). Since hashing functions like MurmurHash3 operate on bytes, this leads to different hash results and thus different bucket assignments for the same logical input across platforms.
41+
* **Unnecessary Coercion:** If a user wishes to bucket based on a numeric ID (e.g., `userId: 12345`), they must first explicitly cast it to a string (`"12345"`) within the flag definition using an operator like `"cat"`.
42+
43+
This proposal seeks to resolve these issues by allowing `fractional` to operate directly on the byte representation of non-string inputs and to explicitly encode values to bytes with deterministic encoders.
44+
45+
## Requirements
46+
47+
### 1. Users must be able to use both string and non-string variables (e.g., integers, booleans) as the primary input for `fractional` evaluation
48+
49+
### 2. Same "value" (e.g. 57.2, "some text", true, etc) should result in the same bucket assignment no matter the language of the provider and platform used
50+
51+
Please note:
52+
53+
* some languages (e.g. Python) don't necessarily have standard types by default (e.g. int32 vs int64).
54+
* [OpenFeature spec 312](https://openfeature.dev/specification/sections/evaluation-context/#requirement-312) dictates that evaluation context needs to support `boolean` | `string` | `number` | `structure` | `datetime` types.
55+
* JSON supports 6 fundamental types: `boolean` | `string` | `number` | `object` | `array` | `null`
56+
57+
As such, the encodings for the following types as first argument (either as literals or results of evaluation) will be standardized:
58+
59+
1. boolean
60+
2. string
61+
3. integer (any integer number, Python style)
62+
4. float (any floating point number, Python style)
63+
5. object (structure / map)
64+
6. datetime
65+
7. null
66+
67+
**array / sequence** will be explicitly not supported as the first argument in fractional so it's possible to distinguish between hashing input and variant bucket. Nevertheless, it can be a part of object type and its encoding needs to be standardized as well.
68+
69+
## Non-requirements
70+
71+
* This change does not need to be backward-compatible.
72+
* Support advanced features like salting non-string types in JSON directly (that will be a separate ADR).
73+
* Bucketing improvements (that will be a separate ADR).
74+
75+
## Considered Options
76+
77+
1. **Proposed:** *Type-Aware Hashing:* Extend the current behavior to support non-string types as first arguments to `fractional`.
78+
2. *New Operator:* Introduce a new operator, such as `"bytesVar"`, to explicitly signal that the variable's raw bytes should be hashed.
79+
3. *Operator Overloading:* Reuse an existing operator (e.g., `"merge"`) or structure (e.g., providing a list) to imply byte-based hashing.
80+
81+
Option 1 was chosen for its ergonomics and zero-impact on existing schemas. Option 2 adds unnecessary complexity to the flag definition language, and Option 3 creates confusing and non-obvious semantics.
82+
83+
## Proposal
84+
85+
We will modify the evaluation logic for the `fractional` operator.
86+
87+
When inspecting the first element of the `fractional` array:
88+
89+
1. If the first element in `fractional` evaluates to a non-array type then deterministically encode it to a well defined byte array and hash the bytes.
90+
2. Otherwise, if `targetingKey` is a string, build a 2-elements array of `flagKey` and `targetingKey`, deterministically encode that and hash (**NOTE:** This is different than string concatenation used today).
91+
3. Otherwise, if `targetingKey` is non-string, report an error and return nil (as this breaks the [OpenFeature spec](https://openfeature.dev/specification/glossary/#targeting-key)).
92+
4. Otherwise, if `targetingKey` is missing, report an error and return nil
93+
94+
```json
95+
// Will use the new logic
96+
"fractional": [
97+
{
98+
"var": "my-non-string-var"
99+
},
100+
["a", 50], ...
101+
]
102+
103+
// Will use new logic
104+
"fractional": [
105+
{
106+
"cat": [{"var" : "$flagd.flagKey"}, {"var" : "some-var"}]
107+
},
108+
["a", 50], ...
109+
]
110+
111+
// Will use targetingKey
112+
"fractional": [
113+
["a", 50], ...
114+
]
115+
116+
// Will use targetingKey
117+
"fractional": [
118+
{
119+
"merge": [{"var" : "evaluates-to-some-variant-name"}, {"var" : "evaluates-to-some-int"}]
120+
},
121+
["a", 50], ...
122+
]
123+
```
124+
125+
### Deterministic and consistent byte encodings
126+
127+
To meet requirement (2) [RFC 8949 Concise Binary Object Representation (CBOR)](https://www.rfc-editor.org/rfc/rfc8949.html) will be used to decide on byte encodings.
128+
129+
* `boolean` is major type 7
130+
* `null` is major type 7
131+
* `string` is major type 3
132+
* `integer`:
133+
* `unsigned integer` is major type 0
134+
* `negative integer` is major type 1
135+
* `float` is major type 7
136+
* `map` (object, structure, dict) is major type 5
137+
* `array` (list, sequence) is major type 4
138+
* `datetime` is converted to POSIX epoch time (including fractional seconds for sub-second precision) and CBOR Tag 1 is used
139+
140+
**ATTENTION: When encoding strings, CBOR appends the size of the encoding in first bytes. As such, even though the actual encoding of the string is still UTF-8, the resulting byte array will differ from raw UTF-8 encoding. As such, after this change, all hashes will change, which will result in rebucketing.**
141+
142+
Additionally, it is required to use [4.2.1. Core Deterministic Encoding Requirements](https://www.rfc-editor.org/rfc/rfc8949.html#section-4.2.1) (which includes Preferred Serizalization), to ensure:
143+
144+
1. **Map Key Ordering**: Implementations must strictly adhere to the requirement that keys in maps (objects/structures) must be sorted using bytewise lexicographic order of their deterministic encodings.
145+
2. **Preferred Serialization (Numbers)**: CBOR mandates using the shortest possible encoding. Providers must ensure consistency, especially between integer and float representations, and across different precisions. For example, if a value fits within a 32-bit float, it must be used instead of a 64-bit float, regardless of the native type in the provider's language.
146+
147+
### API changes
148+
149+
There are **no** changes to the flagd JSON schema. The change is purely semantic, affecting the evaluation logic within providers.
150+
151+
### Consequences
152+
153+
* Good, because any variable can be used for hashing.
154+
* Good, because it avoids unnecessary casting.
155+
* Bad, because all of the users will experience rebucketing.
156+
157+
### Timeline
158+
159+
Prior to flagd 1.0 launch.
160+
161+
## More Information
162+
163+
Today, flagd recommends salting the variable with flagKey directly in the `fractional` logic, using the `"cat"` operator. This will not be possible for non-string types. Advanced features like that will be considered in a separate ADR.
164+
165+
Salting of the string types will continue to be possible using the `"cat"` operator as it is built directly into JSON Logic.
166+
167+
### Testing considerations
168+
169+
As part of implementation of this ADR, the current Gherkin suite will need to be updated to ensure more in-depth testing of consistency (e.g. by looking at the distribution of buckets for many samples), as well as support for many new types.

0 commit comments

Comments
 (0)