Commit b85c6b9
Fix max number of tokens for synthetic data generator (#170)
When using `prompt_tokens_max` (and not using `prompt_tokens_stdev`),
there will occasionally be one token more than the maximum number
specified. This can be tested as follows:
```
from guidellm.utils import IntegerRangeSampler
MIN_VALUE = 5
MAX_VALUE = 15
irs = IntegerRangeSampler(average=(MAX_VALUE - MIN_VALUE) // 2, variance=None, min_value=MIN_VALUE, max_value=MAX_VALUE, random_seed=None)
it = iter(irs)
for _ in range(10000):
assert next(it) != 16
```
The assertion will fire, despite the max being set to 15. This happens
because `random.randint`, which is used by `IntegerRangeSampler`,
generates numbers up to and including the max value it is given. This PR
fixes that.
Co-authored-by: Mark Kurtz <[email protected]>1 parent 6d8f10c commit b85c6b9
1 file changed
+1
-1
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
37 | 37 | | |
38 | 38 | | |
39 | 39 | | |
40 | | - | |
| 40 | + | |
41 | 41 | | |
42 | 42 | | |
43 | 43 | | |
0 commit comments