Replies: 27 comments
-
My 2 cents- I don't see the need for a new It's also confusing in that it reads as "unsigned string", given the precedent for the numeric keywords. |
Beta Was this translation helpful? Give feedback.
-
🍝 if |
Beta Was this translation helpful? Give feedback.
-
#bikeshedding. I would just have it be |
Beta Was this translation helpful? Give feedback.
-
@CyrusNajmabadi I agree on the bike shedding part as far as the class name goes. However, I do think the bar to adding language keywords should be much, much higher than adding a new class to the framework and keywords should only be added if they enable new functionality not possible otherwise or substantially reduce verbosity in an extremely common use case. |
Beta Was this translation helpful? Give feedback.
-
I agree. And i think this is such a case. |
Beta Was this translation helpful? Give feedback.
-
Since I'm not seeing it mentioned, would there at least be a way to explicitly make a |
Beta Was this translation helpful? Give feedback.
-
could just use a well-known implicit conversion from string, as long as it's not necessary to make it work with var. edit: however that would make it impossible/ugly to overload on string and utf8 string at the same time. |
Beta Was this translation helpful? Give feedback.
-
Yeah, i'm on the fence here. I think i'd prefer a hybrid approach. Allow explicit It's similar if you had this:
You can write Similar stuff would happen with string-literals and the different destination type. You could disambiguate in teh cases where it was necessary. |
Beta Was this translation helpful? Give feedback.
-
alternatively we could just decide that utf8 is a "better" overload if that's always preferable when the framework offers both for binary compat. I don't want to worry about utf when I pass a string to a framework method. if it can work with utf8 then great, prefer the new method. (note: this still can be considered as a breaking change for the framework if it depends on utf16 but it's only limited to direct literals) |
Beta Was this translation helpful? Give feedback.
-
@alrz Definitely an interesting idea. I think that could make a lot of sense. There would be no binary breaks. There would be source breaks for how literals were treated. But it would only be for APIs effectively stating they supported both, where the presumption would likely be that utf8 would be preferred. So that sgtm. |
Beta Was this translation helpful? Give feedback.
-
this is similar to params Span proposal where it's preferred over params array so a mere recompilation would resolve all methods to the non allocating overload. |
Beta Was this translation helpful? Give feedback.
-
Yup. I like the idea that the language can add things about new language/API features and say that if an API adds support for both, that htey'll prefer the new/faster thing. |
Beta Was this translation helpful? Give feedback.
-
Same. That was the thing that really annoyed me about I hope the team doesn't make the same decision with Utf8String. |
Beta Was this translation helpful? Give feedback.
-
Just to mention that not all utf-16 strings can be converted to utf-8 strings, which leads to non-standard wtf-8 encoding. |
Beta Was this translation helpful? Give feedback.
-
Literals like I also think a default indexer or enumerator on Utf8String is a bad idea. A good idea would be to bake in a special error code so instead of |
Beta Was this translation helpful? Give feedback.
-
That could be done by implementing a throwing implementation of the enumerator/indexer and marking it as obsolete. |
Beta Was this translation helpful? Give feedback.
-
@YairHalberstadt Wouldn't that result in a warning instead of an error? |
Beta Was this translation helpful? Give feedback.
-
Not if it's |
Beta Was this translation helpful? Give feedback.
-
On an unrelated note, mixing declarations and variables in deconstruction is something that I would really love to see in the nearest release. |
Beta Was this translation helpful? Give feedback.
-
I agree, |
Beta Was this translation helpful? Give feedback.
-
I would avoid special literal like u8"blah" and u16"blah", given the sad history of C++ string literals catching up encoding changes over decades. (Utf8String)"blah" Which is not much longer. No new syntax either. And this pattern can be applied to all other encodings in future, e.g. |
Beta Was this translation helpful? Give feedback.
-
Is there any experimentation to do utf8 work in runtime? From a gitter discussion this might result in fragmentation for quite some time until perhaps everyone moved to utf8 strings? and that's only besides of the fact that corefx has to support both, doubling the number of overloads. I do agree with that concern as string is too fundamental to be fragmented in two types and this goes beyond mere adaptation - we'll be stuck with multiple variations of string forever. Rust has the same problem with String and str which is a source of confusion for newcomers. |
Beta Was this translation helpful? Give feedback.
-
Rust has it easy. |
Beta Was this translation helpful? Give feedback.
-
And that is adjusted by different capabilities each type provides. My point is that as long as the only difference between string and Utf8String is the internal representation, there shouldn't be a separate type for each encoding - even if we have to have utf8 literals, it makes more sense to use the same string type. |
Beta Was this translation helpful? Give feedback.
-
A possibility might be that string could become a sort of DU over a normal string and a utf8string. When it is stores a utf8string, it can store the location of the last indexed char to make sequential indexing (the most common sort) O(1) rather than O(n). I imagine the devil's in the details, and this will actually turn out to be impossible/ridiculously complex in practice. |
Beta Was this translation helpful? Give feedback.
-
Except the different internal representation would be quite visible in terms of performance and also confusing, to maintain compatibility with the current Regarding performance: string s16 = "…";
char c = s16[i]; // O(1)
string s8 = u8"…";
c = s8[i]; // O(n) As for confusion, I'm talking about the fact that |
Beta Was this translation helpful? Give feedback.
-
If a codebase is using indexing extensively, with an opt-in approach you could just decide to not move to utf8. That could also make it possible to alter See #184 (comment) |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
There hasn't been a topic for the design notes in a while. Opening this one so people can discuss the most recent ones.
https://github.com/dotnet/csharplang/blob/master/meetings/2019/LDM-2019-09-16.md
https://github.com/dotnet/csharplang/blob/master/meetings/2019/LDM-2019-09-18.md
Beta Was this translation helpful? Give feedback.
All reactions