Skip to content

Conversation

@ostronaut
Copy link
Contributor

Rationale for this change

Fix small typos in VariantShredding documentation

What changes are included in this PR?

Changes to MD file

Do these changes have PoC implementations?

No

@alamb alamb changed the title Fix typos in VariantShredding Improve wording about error cases in VariantShredding Oct 8, 2025
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @ostronaut and @emkornfield

I think this is an improvement. I updated the title of this PR to reflect what I think this PR now represents

cc @scovich in case you would also like to review

Copy link

@scovich scovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well this opens a can of worms...

| INVALID | `02 00` (object with 0 fields) | null | | | | | INVALID: `typed_value` is null for object |
| INVALID CASE: `{"event_type": "login", "event_ts": 1729795057774}` | `{"event_type": "login"}` | non-null | null | `login` | null | 1729795057774 | INVALID: Shredded field is present in `value` |
| INVALID CASE: `"a"` | `"a"` | non-null | null | null | null | null | INVALID: `typed_value` is present for non-object |
| INVALID CASE: `{}` | `02 00` (object with 0 fields) | null | | | | | INVALID: `typed_value` is null for object |
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about this case?

| `{"event_type": "login"}`                 | `02 00` (object with 0 fields) | non-null      | null                           | `login`

Is it legal for an object that shredded perfectly to still have an empty object in the value column? It's sub-optimal but is it incorrect?

Conversely, what about this case?

| `{"event_type": "login"}`                 | null | non-null      | null                           | `login`

Is it allowed for an object that shredded perfectly to have NULL in the value column? Or is it required to have an empty object there? (guessing it's allowed to have NULL, and maybe even required to do so)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are good points! I could add them but first need to have confirmation on expected behaviour from other collaborators.
For the first point, can we say that that the following is correct:

| Event object                            | `value`                        | `typed_value` | `typed_value.event_type.value` | `typed_value.event_type.typed_value` | `typed_value.event_ts.value` | `typed_value.event_ts.typed_value` | Notes                                                                                     |
|-----------------------------------------|--------------------------------|---------------|--------------------------------|--------------------------------------|------------------------------|------------------------------------|-------------------------------------------------------------------------------------------|
| INVALID CASE: `{"event_type": "login"}` | `02 00` (object with 0 fields) | non-null      | null                           | `login`                              | null                         | null                               | INVALID CASE: an empty object (object with 0 fields) is present for fully shredded object |

cc: @alamb @emkornfield

For the second point, i think it is described in the case one (fully shredded object), where value is null.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the first point, can we say that that the following is correct:
[value of a perfectly shredded object is a variant object with no fields instead of NULL]

That's exactly my question... it's sub-optimal, but well-defined, and and I'm not immediately seeing why it would confuse or break readers?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True! Lets here from the other collaborators maybe? I can add this case to the table once we agree on it.
cc: @alamb @emkornfield

Copy link

@scovich scovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM (FWIW), modulo some wording nits

Comment on lines 217 to 220
| INVALID CASE: `{"event_type": "login", "event_ts": 1729795057774}` | `{"event_type": "login"}` | non-null | null | `login` | null | 1729795057774 | INVALID CASE: Shredded field is present in `value` |
| INVALID CASE: `{"event_type": "login"}` | `{"event_type": "login"}` | null | | | | | INVALID CASE: Shredded field is present in `value`, while `typed_value` is null |
| INVALID CASE: `"a"` | `"a"` | non-null | null | null | null | null | INVALID CASE: `typed_value` is present for non-object while being null, even though `value` is not an object |
| INVALID CASE: `{}` | `02 00` (object with 0 fields) | null | | | | | INVALID CASE: `typed_value` is null for object |
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The table is already super wide; I'm not sure CASE adds enough information to be worth the space it consumes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added it for more clarity, but agree, it should be understandable just with INVALID as well

| missing | null | null | | | | | Object/value is missing |
| INVALID CASE: `{"event_type": "login", "event_ts": 1729795057774}` | `{"event_type": "login"}` | non-null | null | `login` | null | 1729795057774 | INVALID CASE: Shredded field is present in `value` |
| INVALID CASE: `{"event_type": "login"}` | `{"event_type": "login"}` | null | | | | | INVALID CASE: Shredded field is present in `value`, while `typed_value` is null |
| INVALID CASE: `"a"` | `"a"` | non-null | null | null | null | null | INVALID CASE: `typed_value` is present for non-object while being null, even though `value` is not an object |
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not understanding the "while being null" part? What about just:

INVALID CASE: typed_value is present and value is not an object

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rephrased as per suggestion!

@ostronaut
Copy link
Contributor Author

hey @alamb, @scovich. Can we merge the PR now?

@emkornfield
Copy link
Contributor

LGTM, seems like others are happy with it, will merge once CI passes. Thank you @ostronaut

@ostronaut
Copy link
Contributor Author

@emkornfield do you think we can merge this PR now?

@emkornfield
Copy link
Contributor

Yes, apologies for the delay.

@emkornfield emkornfield merged commit 786142e into apache:master Oct 20, 2025
4 checks passed
@ostronaut ostronaut deleted the patch-1 branch October 21, 2025 09:50
@alamb
Copy link
Contributor

alamb commented Oct 21, 2025

Thank you @ostronaut @emkornfield and @scovich

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants