-
Notifications
You must be signed in to change notification settings - Fork 243
refactor: Use a common from_value
method
#472
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
refactor: Use a common from_value
method
#472
Conversation
This comment was marked as outdated.
This comment was marked as outdated.
6ca707b
to
1c8ed59
Compare
Explicit `ValueKind` mapping instead of implicitly inferred `Value` type. Matches the `format/json5.rs` logic. Signed-off-by: Brennan Kinney <[email protected]>
DRY: The `json`, `json5` and `toml` parsers all leverage `serde` and can share a common enum to deserialize data into, instead of individual methods performing roughly the same transformations. - While `ron` improves their support for serde untagged enums with `v0.9`, it is still not compatible with this approach (_Their README details why_). - The `yaml` support doesn't leverage `serde`, thus is not compatible. `from_parsed_value()` is based on the approached used by `format/json5.rs`: - It has been adjusted to reflect the `ValueKind` enum, which could not directly be used due to the `Table` and `Array` types using `Value` as their value storage type instead of self-referencing the enum. - Very similar to a `impl From`, but supports the complimentary `uri` parameter for each `Value` derived. Signed-off-by: Brennan Kinney <[email protected]>
- The enum did not properly handle the `Datetime` TOML value which needed to be converted into a `String` type. - This workaround approach avoids duplicating `from_parsed_value()` logic to support one enum variant. Signed-off-by: Brennan Kinney <[email protected]>
The enum is only intended as a helper for this deserializer into `String` type, it can be bundled inside. Likewise, no need to `impl Display`. Signed-off-by: Brennan Kinney <[email protected]>
If no other variant could be deserialized into successfully, the type is not supported and treated as the `Nil` type. This better communicates failures within tests when a type is compared and is not expected to be `Nil`. Signed-off-by: Brennan Kinney <[email protected]>
1c8ed59
to
cfabdba
Compare
- `Char` variant needed to be converted to `String`. - `Option` variant could be introduced generically. NOTE: With `v0.9` of Ron, more types are introduced. Without tests, if these do not deserialize into a supported type they will be treated as `Nil` type. Signed-off-by: Brennan Kinney <[email protected]>
68df817
to
c82013e
Compare
- Required switching to `serde_yaml`. - Like with `yaml-rust`, requires special handling for table keys. Signed-off-by: Brennan Kinney <[email protected]>
c82013e
to
d4f2f35
Compare
This PR has been updated to also include support for formats Ron and Yaml.
Ron:
Yaml:
During the integration of Ron and Yaml into this PR (like with earlier TOML test for datetime), both failed running their tests due to the more implicit Resolving requires either:
It's not too different from the prior approach, these types were in the formats own For TOML with the
Which IIRC I had read was due to how
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some inline context if it assists with review.
If it's easier to follow/digest, I've staged out changes into scoped commits. Each commit has an associated commit message describing the changes 👍
let key = match key { | ||
serde_yaml::Value::Number(k) => Some(k.to_string()), | ||
serde_yaml::Value::String(k) => Some(k), | ||
_ => None, | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Originally I tried to use:
let key = serde_yaml::from_value::<String>(k).ok();
However this resulted in a "stack overflow" according to my notes at the time.
IIRC this failure wasn't restricted to the untagged enum approach here, and the explicit match
is also required for the existing approach used (including for the map variant in the previous yaml-rust
crate).
I also have this failure in the notes that was caused by the keys string being mismatched (eg: instead of the string expected_key
, a similar string like unexpected_key
) was not caught well by the tests failure output:
Test failure output
running 7 tests
test test_error_parse ... ok
test test_override_uppercase_value_for_struct ... FAILED
test test_override_uppercase_value_for_enums ... FAILED
test test_override_lowercase_value_for_struct ... FAILED
test test_file ... FAILED
test test_yaml_parsing_key ... FAILED
test test_override_lowercase_value_for_enums ... FAILED
failures:
---- test_override_uppercase_value_for_struct stdout ----
thread 'test_override_uppercase_value_for_struct' panicked at tests/file_yaml.rs:151:66:
called `Result::unwrap()` on an `Err` value: missing field `bar`
---- test_override_uppercase_value_for_enums stdout ----
thread 'test_override_uppercase_value_for_enums' panicked at tests/file_yaml.rs:203:54:
called `Result::unwrap()` on an `Err` value: value of enum EnumSettings should be represented by either string or table with exactly one key
---- test_override_lowercase_value_for_struct stdout ----
thread 'test_override_lowercase_value_for_struct' panicked at tests/file_yaml.rs:186:56:
called `Result::unwrap()` on an `Err` value: missing field `foo`
---- test_file stdout ----
thread 'test_file' panicked at tests/file_yaml.rs:43:43:
called `Result::unwrap()` on an `Err` value: missing field `debug`
---- test_yaml_parsing_key stdout ----
thread 'test_yaml_parsing_key' panicked at tests/file_yaml.rs:115:10:
called `Result::unwrap()` on an `Err` value: missing field `inner_string`
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
---- test_override_lowercase_value_for_enums stdout ----
thread 'test_override_lowercase_value_for_enums' panicked at tests/file_yaml.rs:221:54:
called `Result::unwrap()` on an `Err` value: value of enum EnumSettings should be represented by either string or table with exactly one key
// Option to Result: | ||
match (key, value) { | ||
(Some(k), Some(v)) => Ok((k, v)), | ||
_ => Err(serde::de::Error::custom("should not be serialized to Map")), | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bit verbose, but easier to grok?
An alternative with .zip().ok_or_else()
was considered, including here if you prefer it.
When I completed my first iteration and was after some feedback for better handling this .map()
call having mixed error return types (serde_yaml
+ de::Error::custom
), it was advised:
-
Have the
key
+value
conversions convert theirResult
toOption
. -
Then if either
key
orvalue
wasNone
use the desired serde deserialization error here (whichdeserialize_any
uses to understand this variant was unsuccessful and to try deserializing the next variant of theParsedValue
enum). -
They had refactored to use separate functions for
key
andvalue
, instead of nested inline within themap()
closure.While instead of this
match
they usedzip()
on the tuple:let key = key_to_string(k); let value = val_to_parsed_value(v); key.zip(value).ok_or_else(|| serde::de::Error::custom("should not be serialized to Map") )
Which accomplishes the same by creating an iterator on the single value of key
and value
:
- Produces a tuple (
a.zip(b)
) ofSome(key, value)
until one of the iterators encounters aNone
. - Return the option as a
Result
(ok_or_else()
).
match ParsedMap::deserialize(deserializer)? { | ||
ParsedMap::Table(v) => Ok(v), | ||
#[cfg(feature = "yaml")] | ||
ParsedMap::YamlMap(table) => { | ||
table | ||
.into_iter() | ||
.map(|(key, value)| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The .map()
will produce a collection of Result<String, ParsedValue>
items, and .collect()
will transform that into a single Result<T, E>
(where T
would be Map<String, ParsedValue>
).
An implicit feature of Rust, no explicit transform of each Result
to a collection required.
I wasn't sure if it'd short-circuit on first error encountered, but due to the return type with Result
for collect()
it appears this is supported via FromIterator
trait, thus doesn't require try_collect()
.
@@ -27,12 +27,13 @@ async = ["async-trait"] | |||
[dependencies] | |||
lazy_static = "1.0" | |||
serde = "1.0.8" | |||
serde_with = "3" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Required for the Nil
variant fallback (scoped to it's own commit) if deserialize_any
fails to match anything prior.
The crate could be avoided if you wanted to implement the equivalent directly 🤷♂️
Any format that supports a `char` type would now be matched for conversion to the expected `String` type. Signed-off-by: Brennan Kinney <[email protected]>
@matthiasbeyer this is ready for review, I know you're quite busy so here's a TLDR:
|
For custom format support, the As it's all focused on using Serde with untagged enum, perhaps it shouldn't live in I also need to verify an assumption that other number types like I will also share the other formats I integrated locally, if desired they can be contributed via follow-up PR. |
So from what I read, we want this in. |
I still need to address my concerns from my prior comment here. I will return to After that's sorted locally, I'll rebase as requested and it'd be great to see this get merged :) Regarding custom format support, how would you approach verifying that? Mostly so that you can catch that |
Note to self, since this PR was raised the proposed yaml alternative (serde compatible) has been archived and a different yaml crate has been introduced instead, so this PR will need to drop that support (unfortunate, since the serde approach was preferrable I think). |
// NOTE: Order of variants is important. Serde will use whichever | ||
// the input successfully deserializes into first. | ||
#[derive(serde::Deserialize, Debug)] | ||
#[serde(untagged)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
serde(untagged)
is a performance footgun in situations like this (easily noticeable in benchmarks when I made the same mistake in toml::Value
).
We'd likely want to use serde-untagged
.
Yes, json5 did this already but I care more if it spreads to all formats
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
serde(untagged)
is a performance footgun in situations like this (easily noticeable in benchmarks when I made the same mistake intoml::Value
).
Ah, I think when I contributed this I favored it for being more convenient maintenance wise and thinking of it from an application config viewpoint where you'd usually not have so much config to parse that it'd be a perf concern? 😅
We'd likely want to use
serde-untagged
.
Glancing over at the crate docs, I assume this means losing the convenience of serde(untagged)
and requires a bit more explicit mapping to each variant?
I don't have a problem with that, but it was nice to have for simple mappings of types to their variant.
Yes, json5 did this already but I care more if it spreads to all formats
This wasn't clear to me at first where you were referencing as I thought you meant upstream json5
crate (that only uses serde(untagged)
in their tests) 😅 but I see you meant the config-rs
format implementation for json5
uses serde(untagged)
(and is removed via this PR too) 👍 (EDIT: I see it's also the first thing my PR description mentioned, I forgot as it's been a while 😓)
So as a bonus perhaps, this would address your concern about the json5 format being deferred to this common serde handler for supported formats?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless there is a significant problem we run into, I stand by using serde-untagged
as I don't want the other formats to regress by adding this performance hit.
I don't have a problem with that, but it was nice to have for simple mappings of types to their variant.
I tried to push for serde_derive to have attributes to force type-variant mappings to avoid the performance overhead of untagged
but the maintainers weren't interested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I stand by using
serde-untagged
as I don't want the other formats to regress by adding this performance hit.
That's fair.
It could be useful as an alternative format helper to implement additional formats, which although might not be used by any official format, if a user could extend config-rs
support separately without requiring upstreaming of a format into the crate?
I'll look into adapting to serde-untagged
when I can find the time, but if anyone else wants to tackle that it'd be appreciated 😅
I tried to push for
serde_derive
to have attributes to force type-variant mappings to avoid the performance overhead ofuntagged
but the maintainers weren't interested.
That's unfortunate, sounds like a very reasonable request 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could be useful as an alternative format helper to implement additional formats, which although might not be used by any official format, if a user could extend config-rs support separately without requiring upstreaming of a format into the crate?
They can with https://docs.rs/config/latest/config/trait.Format.html
They won't get our extension-agnostic loader but that isn't always necessary and can easily be implemented on its own.
// Equivalent to ValueKind, except Table + Array store the same enum | ||
// Useful for serde to serialize values into, then convert to Value. | ||
// NOTE: Order of variants is important. Serde will use whichever | ||
// the input successfully deserializes into first. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As this just replaces our existing type we are deserializing to, I'll give this a pass. Overall, I'd rather we deserialize directly into our needed type.
That would either require
- Putting a dummy value in for the origin
- Building up my
serde-basepath
idea I've been toying with, see Designing for layered configs clap-rs/clap#2763 (reply in thread)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As this just replaces our existing type we are deserializing to, I'll give this a pass. Overall, I'd rather we deserialize directly into our needed type.
I believe that I wanted to deserialize directly too, but I think the comment here highlighted the reasoning that did not happen was due to config::Value
being a struct with a value: ValueKind
field while config::ValueKind
enum has the Table
+ Array
variants reference config::Value
instead of Self
to self-reference the enum, which complicated adapting to that?
If you look at what this PR does, it effectively replaces the format specific methods (like from_yaml_value()
/ from_toml_value()
) with a single from_parsed_value()
method instead.
Most of those format from_*_value()
methods (that are serde reliant) took in a deserialized format value type as a parameter, delegating away error handling involved (except partially for Ron and the non-serde Yaml format).
In both cases the before/after approach of these methods are only really performing the task mapping to config::Value
. That logic doesn't belong in value.rs
when it's format/type specific though? You can see at the end of format.rs
in this PR how the additional serde deserializers are added for format specific support concerns, with types/crates not really relevant to bundle into value.rs
?
Hopefully that clarifies why ParsedValue
was introduced as an intermediary convenience for the deserializing + mapping step. With from_parsed_value()
it effectively consolidates the associated formats it can support, making this logic in config-rs
much easier to grok and maintain?
Last I recall the various from_*_value()
implementations were (or at least at some point when I was contributing) not particularly consistent in style/quality despite having roughly the same logic. One could tidy that up, or just make it DRY (which this PR favored), such that any new serde compatible formats would be much simpler to add/support, and thus maintain without as much boilerplate prone to diverging.
Existing (and new) serde compatible formats would have this common call going forward:
let value = format::from_parsed_value(uri, serde_json::from_str(text)?);
If format.rs
can be adapted to make needing ParsedValue
redundant (by adapting the same improvement to config::Value
/ ValueKind
directly), that'd be great too! This PR was just focused on that initial step to take a DRY approach without being too invasive of a change to the codebase 😅
That said some of the PR value was lost since as the switch to serde-yaml
didn't age well. It would have been nice to shift the added error handling there to serde.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Putting a dummy value in for the origin
NOTE: I used uri
in my responses since it's frequently the variable name used instead of origin
and was what came to mind when writing out my thoughts 😅
from_parsed_value()
(like how other format methods before it did) provides the uri
value to use when mapping from a ParsedValue
to a config::Value
(with it's associated uri
+ ValueKind
variant).
Adding a placeholder uri
(effectively None
?) would then need a step that updates the uri
for each Value
(along with it's nested Value
items depending on type), and I'm not sure how you'd like to approach that in the context of a method like from_parsed_value()
where the uri
is already given?
ParsedValue
enum leveraged the serde untagged convenience to map the formats own Value type, but to have that done for config::Value
instead just to add the uri
would be verbose of an implementation?
I understand if using an intermediary this way instead of skipping that may seem as a codesmell, and I think I may have originally intended to look into how I could improve that after this PR was resolved. Partly because of a lack of familiarity with the project as a whole and not wanting to introduce breaking changes or deal with extra churn / considerations that may cause vs this reduced scope that was easier for me to reason about and get through review 😅
As such, I still think the current approach with this PR using the intermediary enum would be valid and keeps the PR focused on the goal of unifying the serde compatible formats to a common parse method.
Refactoring that to then make ParsedValue
redundant and work directly with config::Value
+ ValueKind
seems better suited to a follow-up PR, and would be more pleasant in commit/merge history as context for these distinct changes to any interested party, as opposed to munged together (but that could just be more perspective as I tend to encourage more focused PRs with all commits squashed via the PR merge strategy).
All the uri
value represents is the source, so for these formats the config::Value
returned and any nested value produced, that uri
is the same throughout.
As such it's not that meaningful during the format mapping to config::Value
, but collections of config::Value
need each item annotated with that same uri
value/ref, so that merging/layering has that traceability of source.
If that were something better handled to avoid the need for ValueKind::Array
and ValueKind::Table
to require config::Value
items just to track the associated uri
, then this whole thing could be simplified as desired 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Building up my
serde-basepath
idea I've been toying with
I glanced through the linked ref but that still didn't click for me sorry.
In the current state of this PR, I could have from_parsed_value()
assign None
as the uri
(in lieu of a serde-untagged
implementation effectively transforming to Value::new(uri, kind)
for deserializing to config::ValueKind
on the Array
/Table
variants, or kind.into()
if deserializing straight to config::Value
?), and then I could call a separate helper to update all the uri
origin
to the desired one before returning the final config::Value
result?
// In value.rs:
impl Value {
pub fn set_origin_recursively(&mut self, origin: Option<&String>) {
self.origin = origin.cloned();
// If kind is a collection, update the `origin` of their `Value`s too:
match self.kind {
ValueKind::Array(ref mut collection) => {
for v in collection.iter_mut() {
v.set_origin_recursively(origin)
}
}
ValueKind::Table(ref mut collection) => {
for (_, v) in collection.iter_mut() {
v.set_origin_recursively(origin)
}
}
// This is presumably unacceptable for handling all other variants
// of `ValueKind` that don't contain nested `Value`s, but short of listing
// all remaining variants here I'm not sure how you'd want to approach it
_ => {},
};
}
}
// format.rs:
// Nothing to map if already deserialized into Value:
pub fn from_parsed_value(uri: Option<&String>, mut value: Value) -> Value {
value.set_origin_recursively(uri);
value
}
// Alternatively for ValueKind (implies Array/Table variants already have `v.origin = None`):
pub fn from_parsed_value(uri: Option<&String>, value: ValueKind) -> Value {
let mut v = Value::from(value);
v.set_origin_recursively(uri);
v
}
Both would make from_parsed_value()
largely redundant then 😅
The above set_origin_recursively()
utility method could be added via a separate PR in the meantime? (origin
isn't public, so updating the origin
after it's set isn't supported I think?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose with the more verbose addition of serde-untagged
, handling the Array
+ Table
types of ValueKind
would be a non-issue as .map
/.seq
calls would could leverage .into()
similar to how you could construct ValueKind
like this:
// ValueKind::Table(T) storing `Vec<T> => ValueKind::Array(T) => Value`
let map_of_vecs: ValueKind = HashMap::<String, Value>::from([
("abc".to_string(), vec!["a", "b", "c"].into()),
("xyz".to_string(), vec!["x", "y", "z"].into()),
]).into();
// ValueKind::Table(T) => Value
let v = Value::from(map_of_vecs);
Value::from
will provide the None
placeholder for origin
and the added recursive method can patch that afterwards 👍
I think these two collection variants of ValueKind
were only problematic with serde(untagged)
when paired with the from_parsed_value()
recursively calling itself to map the enum value, since they were not wrapping Self
😅
Other than that, there's the Option
value type from Ron which I'm not sure how you'd approach that with serde-untagged
, for serde(untagged)
with ParsedValue
the recursive call through from_parsed_value()
worked with the type Option<Box<Self>>
.
Likewise, while serde-untagged
and formats like Ron support char
, ValueKind
lacks it and casts to ValueKind::String
. A similar concern exists for Toml support when parsing it's date value as there is no equivalent in ValueKind
it is mapped to the string variant too. Both cases were handled in this PR with a separate ParsedString
enum delegated to serde(untagged)
, all wrapped in a custom deserializer method for the ParsedValue::String
variant and the deserialize_with
annotation (that I've not looked into if that's compatible with serde-untagged
, I assume it is?).
So a few gaps on how to proceed there with serde-untagged
, but I think it might still be better deferred to a follow-up PR replacing serde(untagged)
. It's diff might then be more straight-forward?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies for the verbosity, limited time to condense all my comments but I'll try 💪
The gist of my response is as follows:
ParsedValue
+from_parsed_value()
unifies the equivalent code it replaces. Deserializing directly toconfig::Value
would be a separate improvement, this PR just consolidates / streamlines existing mapping. It might not be ideal, but I think it's still worthwhile?- I don't think performance is the deciding factor for choosing
config-rs
over the competition. I'm open to adoptingserde-untagged
here if someone could fill the gaps in proceeding, but it might be better deferred to a separate PR at a later date?
ParsedValue
is not replacing ValueKind
As this just replaces our existing type we are deserializing to, I'll give this a pass.
The PR DRYs up existing format implementations (which are inconsistent in style/approach for blocks that accomplish the same task).
Assuming that your comment was referring to the ValueKind
type being replaced (but I could be mistaken and you actually meant each individual format value type):
ParsedValue
substitutes each formats own Value type, whilefrom_parsed_value()
unifies the existing equivalent format methods mapping of a format specificValue
type toconfig::ValueKind
(and subsequent conversion toconfig::Value
).- The
from_*_value()
format methods tend to be recursively called for mapping to the finalconfig::Value
type (optionally with nestedconfig::Value
whenValueKind
is a collection type) and associating the fixedorigin
source to each mappedconfig::Value
. ValueKind
contains collection variants that store items ofconfig::Value
notSelf
, which is problematic for recursive calls offrom_*_value()
. You would instead need to deserialize directly intoconfig::Value
AFAIK and then recursively updateorigin
.
This transition of existing formats to a common ParsedValue
enum + from_parsed_value()
method is AFAIK an improvement for maintainability and anyone referencing the format support to contribute new formats.
When I authored this PR I was able to easily add a few other formats without copy/pasting boilerplate of one of the existing implementations. That seems worthwhile?
serde-untagged
/ performance
serde(untagged)
is a performance footgun in situations like this (easily noticeable in benchmarks when I made the same mistake intoml::Value
).We'd likely want to use
serde-untagged
.
Personally that may not be as digestible of a change within this PR?
- The
serde(untagged)
vsserde-untagged
concern on performance is valid, but IMO is better served as a follow-up PR to this one if it will result in additions akin to this support forconfig::Value
(and this much more verbose related serde support). - No one has tackled the concern this PR addresses since I raised it, so I'm not sure if blocking on requiring additional complexity will be beneficial 😓
That said, should anyone find the performance an issue when parsing configs in their real-world projects, they'd likely be motivated to contribute that suggested improvement (which is either reverting to the current state we have now prior to this PR with it's drawbacks, or adapting from the changes of this PR to use serde-untagged
and skip ParsedValue
?), you're an example of that with your contributions to the toml crates, so I completely understand the reluctance 😅
Config parsing benchmarks are presumably a rather niche concern for the bulk of projects though (which might prefer a config crate that is easier to grok and maintain, as existing issues tend to be focused on).
As you're effectively the core maintainer of this crate now, that is totally up to you. I am in support of your requests if you're familiar enough with how to approach the concerns I raised in my feedback that are unclear to me, but I get the impression it'd still be better delegated to a subsequent PR? (it has already been suggested that this PR alone in it's current state would benefit from splitting)
// NOTE: Order of variants is important. Serde will use whichever | ||
// the input successfully deserializes into first. | ||
#[derive(serde::Deserialize, Debug)] | ||
#[serde(untagged)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
serde(untagged)
is a performance footgun in situations like this (easily noticeable in benchmarks when I made the same mistake intoml::Value
).
Ah, I think when I contributed this I favored it for being more convenient maintenance wise and thinking of it from an application config viewpoint where you'd usually not have so much config to parse that it'd be a perf concern? 😅
We'd likely want to use
serde-untagged
.
Glancing over at the crate docs, I assume this means losing the convenience of serde(untagged)
and requires a bit more explicit mapping to each variant?
I don't have a problem with that, but it was nice to have for simple mappings of types to their variant.
Yes, json5 did this already but I care more if it spreads to all formats
This wasn't clear to me at first where you were referencing as I thought you meant upstream json5
crate (that only uses serde(untagged)
in their tests) 😅 but I see you meant the config-rs
format implementation for json5
uses serde(untagged)
(and is removed via this PR too) 👍 (EDIT: I see it's also the first thing my PR description mentioned, I forgot as it's been a while 😓)
So as a bonus perhaps, this would address your concern about the json5 format being deferred to this common serde handler for supported formats?
// Equivalent to ValueKind, except Table + Array store the same enum | ||
// Useful for serde to serialize values into, then convert to Value. | ||
// NOTE: Order of variants is important. Serde will use whichever | ||
// the input successfully deserializes into first. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As this just replaces our existing type we are deserializing to, I'll give this a pass. Overall, I'd rather we deserialize directly into our needed type.
I believe that I wanted to deserialize directly too, but I think the comment here highlighted the reasoning that did not happen was due to config::Value
being a struct with a value: ValueKind
field while config::ValueKind
enum has the Table
+ Array
variants reference config::Value
instead of Self
to self-reference the enum, which complicated adapting to that?
If you look at what this PR does, it effectively replaces the format specific methods (like from_yaml_value()
/ from_toml_value()
) with a single from_parsed_value()
method instead.
Most of those format from_*_value()
methods (that are serde reliant) took in a deserialized format value type as a parameter, delegating away error handling involved (except partially for Ron and the non-serde Yaml format).
In both cases the before/after approach of these methods are only really performing the task mapping to config::Value
. That logic doesn't belong in value.rs
when it's format/type specific though? You can see at the end of format.rs
in this PR how the additional serde deserializers are added for format specific support concerns, with types/crates not really relevant to bundle into value.rs
?
Hopefully that clarifies why ParsedValue
was introduced as an intermediary convenience for the deserializing + mapping step. With from_parsed_value()
it effectively consolidates the associated formats it can support, making this logic in config-rs
much easier to grok and maintain?
Last I recall the various from_*_value()
implementations were (or at least at some point when I was contributing) not particularly consistent in style/quality despite having roughly the same logic. One could tidy that up, or just make it DRY (which this PR favored), such that any new serde compatible formats would be much simpler to add/support, and thus maintain without as much boilerplate prone to diverging.
Existing (and new) serde compatible formats would have this common call going forward:
let value = format::from_parsed_value(uri, serde_json::from_str(text)?);
If format.rs
can be adapted to make needing ParsedValue
redundant (by adapting the same improvement to config::Value
/ ValueKind
directly), that'd be great too! This PR was just focused on that initial step to take a DRY approach without being too invasive of a change to the codebase 😅
That said some of the PR value was lost since as the switch to serde-yaml
didn't age well. It would have been nice to shift the added error handling there to serde.
// Equivalent to ValueKind, except Table + Array store the same enum | ||
// Useful for serde to serialize values into, then convert to Value. | ||
// NOTE: Order of variants is important. Serde will use whichever | ||
// the input successfully deserializes into first. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Putting a dummy value in for the origin
NOTE: I used uri
in my responses since it's frequently the variable name used instead of origin
and was what came to mind when writing out my thoughts 😅
from_parsed_value()
(like how other format methods before it did) provides the uri
value to use when mapping from a ParsedValue
to a config::Value
(with it's associated uri
+ ValueKind
variant).
Adding a placeholder uri
(effectively None
?) would then need a step that updates the uri
for each Value
(along with it's nested Value
items depending on type), and I'm not sure how you'd like to approach that in the context of a method like from_parsed_value()
where the uri
is already given?
ParsedValue
enum leveraged the serde untagged convenience to map the formats own Value type, but to have that done for config::Value
instead just to add the uri
would be verbose of an implementation?
I understand if using an intermediary this way instead of skipping that may seem as a codesmell, and I think I may have originally intended to look into how I could improve that after this PR was resolved. Partly because of a lack of familiarity with the project as a whole and not wanting to introduce breaking changes or deal with extra churn / considerations that may cause vs this reduced scope that was easier for me to reason about and get through review 😅
As such, I still think the current approach with this PR using the intermediary enum would be valid and keeps the PR focused on the goal of unifying the serde compatible formats to a common parse method.
Refactoring that to then make ParsedValue
redundant and work directly with config::Value
+ ValueKind
seems better suited to a follow-up PR, and would be more pleasant in commit/merge history as context for these distinct changes to any interested party, as opposed to munged together (but that could just be more perspective as I tend to encourage more focused PRs with all commits squashed via the PR merge strategy).
All the uri
value represents is the source, so for these formats the config::Value
returned and any nested value produced, that uri
is the same throughout.
As such it's not that meaningful during the format mapping to config::Value
, but collections of config::Value
need each item annotated with that same uri
value/ref, so that merging/layering has that traceability of source.
If that were something better handled to avoid the need for ValueKind::Array
and ValueKind::Table
to require config::Value
items just to track the associated uri
, then this whole thing could be simplified as desired 😅
// Equivalent to ValueKind, except Table + Array store the same enum | ||
// Useful for serde to serialize values into, then convert to Value. | ||
// NOTE: Order of variants is important. Serde will use whichever | ||
// the input successfully deserializes into first. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Building up my
serde-basepath
idea I've been toying with
I glanced through the linked ref but that still didn't click for me sorry.
In the current state of this PR, I could have from_parsed_value()
assign None
as the uri
(in lieu of a serde-untagged
implementation effectively transforming to Value::new(uri, kind)
for deserializing to config::ValueKind
on the Array
/Table
variants, or kind.into()
if deserializing straight to config::Value
?), and then I could call a separate helper to update all the uri
origin
to the desired one before returning the final config::Value
result?
// In value.rs:
impl Value {
pub fn set_origin_recursively(&mut self, origin: Option<&String>) {
self.origin = origin.cloned();
// If kind is a collection, update the `origin` of their `Value`s too:
match self.kind {
ValueKind::Array(ref mut collection) => {
for v in collection.iter_mut() {
v.set_origin_recursively(origin)
}
}
ValueKind::Table(ref mut collection) => {
for (_, v) in collection.iter_mut() {
v.set_origin_recursively(origin)
}
}
// This is presumably unacceptable for handling all other variants
// of `ValueKind` that don't contain nested `Value`s, but short of listing
// all remaining variants here I'm not sure how you'd want to approach it
_ => {},
};
}
}
// format.rs:
// Nothing to map if already deserialized into Value:
pub fn from_parsed_value(uri: Option<&String>, mut value: Value) -> Value {
value.set_origin_recursively(uri);
value
}
// Alternatively for ValueKind (implies Array/Table variants already have `v.origin = None`):
pub fn from_parsed_value(uri: Option<&String>, value: ValueKind) -> Value {
let mut v = Value::from(value);
v.set_origin_recursively(uri);
v
}
Both would make from_parsed_value()
largely redundant then 😅
The above set_origin_recursively()
utility method could be added via a separate PR in the meantime? (origin
isn't public, so updating the origin
after it's set isn't supported I think?)
// Equivalent to ValueKind, except Table + Array store the same enum | ||
// Useful for serde to serialize values into, then convert to Value. | ||
// NOTE: Order of variants is important. Serde will use whichever | ||
// the input successfully deserializes into first. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose with the more verbose addition of serde-untagged
, handling the Array
+ Table
types of ValueKind
would be a non-issue as .map
/.seq
calls would could leverage .into()
similar to how you could construct ValueKind
like this:
// ValueKind::Table(T) storing `Vec<T> => ValueKind::Array(T) => Value`
let map_of_vecs: ValueKind = HashMap::<String, Value>::from([
("abc".to_string(), vec!["a", "b", "c"].into()),
("xyz".to_string(), vec!["x", "y", "z"].into()),
]).into();
// ValueKind::Table(T) => Value
let v = Value::from(map_of_vecs);
Value::from
will provide the None
placeholder for origin
and the added recursive method can patch that afterwards 👍
I think these two collection variants of ValueKind
were only problematic with serde(untagged)
when paired with the from_parsed_value()
recursively calling itself to map the enum value, since they were not wrapping Self
😅
Other than that, there's the Option
value type from Ron which I'm not sure how you'd approach that with serde-untagged
, for serde(untagged)
with ParsedValue
the recursive call through from_parsed_value()
worked with the type Option<Box<Self>>
.
Likewise, while serde-untagged
and formats like Ron support char
, ValueKind
lacks it and casts to ValueKind::String
. A similar concern exists for Toml support when parsing it's date value as there is no equivalent in ValueKind
it is mapped to the string variant too. Both cases were handled in this PR with a separate ParsedString
enum delegated to serde(untagged)
, all wrapped in a custom deserializer method for the ParsedValue::String
variant and the deserialize_with
annotation (that I've not looked into if that's compatible with serde-untagged
, I assume it is?).
So a few gaps on how to proceed there with serde-untagged
, but I think it might still be better deferred to a follow-up PR replacing serde(untagged)
. It's diff might then be more straight-forward?
I'm detailing this as a refresher for future me or anyone else who may be interested in pushing this PR forward. My time is often sparse, so with the additional requirements to adopt Remaining tasks:
|
Note that I said this was non-blocking but we will eventually want to do this. |
It is non-blocking to update all formats to this |
Use of |
Prep for #472 `serde(untagged)` translates to an `if`/`else if` ladder that ends up being pretty expensive, including the error generation for each branch that gets skipped. This switches us to dispatch on type instead, speeding things up. This is also prep for speeding up builds by removing the need for `serde_derive`. Once `serde_core` is published, we can depend on that and not wait for `syn`, `quote`, etc.
Inspired from the
json5
feature (Original PR July 2020, revised PR May 2021) with the refactoring in the revision by @matthiasbeyerIt looked useful for other formats that use
serde
to simplify their logic (so long as test coverage is informative of no issues 🙏 )DRY: The
json
,json5
andtoml
parsers all leverageserde
and can share a common enum to deserialize data into, instead of individual methods performing roughly the same transformations.ron
improves their support for serde untagged enums with v0.9,it is still not compatible with this approach (Their README details why).(UPDATE: At least for the currentconfig-rs
test coverage, this appears to be passing now)yaml
support doesn't leverageserde
thus is not compatible. (UPDATE: Since there is talk about the unmaintained status, it could be switched forserde-yaml
, I've verified it can pass the tests with an extra deserializer method)from_parsed_value()
is based on the approached used byformat/json5.rs
:ValueKind
enum, which could not directly be used due to theTable
andArray
types usingValue
as their value storage type instead of self-referencing the enum.impl From
, but supports the complimentaryuri
parameter for eachValue
derived.Resolves: #394