Skip to content

Add Rune links to String class #4206

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
May 8, 2020
Merged

Add Rune links to String class #4206

merged 2 commits into from
May 8, 2020

Conversation

tdykstra
Copy link
Contributor

@tdykstra tdykstra commented May 7, 2020

Follow-up to #4189

@tdykstra tdykstra requested a review from BillWagner May 7, 2020 22:37
@dotnet-bot dotnet-bot added this to the May 2020 milestone May 7, 2020
@@ -98,6 +98,8 @@

A string is a sequential collection of characters that is used to represent text. A <xref:System.String> object is a sequential collection of <xref:System.Char?displayProperty=nameWithType> objects that represent a string; a <xref:System.Char?displayProperty=nameWithType> object corresponds to a UTF-16 code unit. The value of the <xref:System.String> object is the content of the sequential collection of <xref:System.Char?displayProperty=nameWithType> objects, and that value is immutable (that is, it is read-only). For more information about the immutability of strings, see the [Immutability and the StringBuilder class](#Immutability) section later in this topic. The maximum size of a <xref:System.String> object in memory is 2GB, or about 1 billion characters.

For more information about Unicode, UTF-16, code units, code points, and the <xref:System.Char> and <xref:System.Text.Rune> types, see [Introduction to character encoding in .NET](/dotnet/standard/base-types/character-encoding-introduction).
Copy link
Contributor

@gewarren gewarren May 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it better to have the link in the ~ format? I see other links in this file that use the ~ format.

Suggested change
For more information about Unicode, UTF-16, code units, code points, and the <xref:System.Char> and <xref:System.Text.Rune> types, see [Introduction to character encoding in .NET](/dotnet/standard/base-types/character-encoding-introduction).
For more information about Unicode, UTF-16, code units, code points, and the <xref:System.Char> and <xref:System.Text.Rune> types, see [Introduction to character encoding in .NET](~/docs/standard/base-types/character-encoding-introduction.md).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have to include .md with that format.

@@ -214,15 +216,15 @@
[!code-csharp-interactive[System.String.Class#5](~/samples/snippets/csharp/VS_Snippets_CLR_System/system.String.Class/cs/index2.cs#5)]
[!code-vb[System.String.Class#5](~/samples/snippets/visualbasic/VS_Snippets_CLR_System/system.String.Class/vb/index2.vb#5)]

Consecutive index values might not correspond to consecutive Unicode characters, because a Unicode character might be encoded as more than one <xref:System.Char> object. In particular, a string may contain multi-character units of text that are formed by a base character followed by one or more combining characters or by surrogate pairs. To work with Unicode characters instead of <xref:System.Char> objects, use the <xref:System.Globalization.StringInfo?displayProperty=nameWithType> and <xref:System.Globalization.TextElementEnumerator> classes. The following example illustrates the difference between code that works with <xref:System.Char> objects and code that works with Unicode characters. It compares the number of characters or text elements in each word of a sentence. The string includes two sequences of a base character followed by a combining character.
Consecutive index values might not correspond to consecutive Unicode characters, because a Unicode character might be encoded as more than one <xref:System.Char> object. In particular, a string may contain multi-character units of text that are formed by a base character followed by one or more combining characters or by surrogate pairs. To work with Unicode characters instead of <xref:System.Char> objects, use the <xref:System.Globalization.StringInfo?displayProperty=nameWithType> and <xref:System.Globalization.TextElementEnumerator> classes, or the <xref:System.String.EnumerateRunes%2A?displayProperty=nameWithType> method and the <xref:System.Text.Rune> type. The following example illustrates the difference between code that works with <xref:System.Char> objects and code that works with Unicode characters. It compares the number of characters or text elements in each word of a sentence. The string includes two sequences of a base character followed by a combining character.
Copy link
Contributor

@gewarren gewarren May 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Consecutive index values might not correspond to consecutive Unicode characters, because a Unicode character might be encoded as more than one <xref:System.Char> object. In particular, a string may contain multi-character units of text that are formed by a base character followed by one or more combining characters or by surrogate pairs. To work with Unicode characters instead of <xref:System.Char> objects, use the <xref:System.Globalization.StringInfo?displayProperty=nameWithType> and <xref:System.Globalization.TextElementEnumerator> classes, or the <xref:System.String.EnumerateRunes%2A?displayProperty=nameWithType> method and the <xref:System.Text.Rune> type. The following example illustrates the difference between code that works with <xref:System.Char> objects and code that works with Unicode characters. It compares the number of characters or text elements in each word of a sentence. The string includes two sequences of a base character followed by a combining character.
Consecutive index values might not correspond to consecutive Unicode characters, because a Unicode character might be encoded as more than one <xref:System.Char> object. In particular, a string may contain multi-character units of text that are formed by a base character followed by one or more combining characters or by surrogate pairs. To work with Unicode characters instead of <xref:System.Char> objects, use the <xref:System.Globalization.StringInfo?displayProperty=nameWithType> and <xref:System.Globalization.TextElementEnumerator> classes, or the <xref:System.String.EnumerateRunes%2A?displayProperty=nameWithType> method and the <xref:System.Text.Rune> struct. The following example illustrates the difference between code that works with <xref:System.Char> objects and code that works with Unicode characters. It compares the number of characters or text elements in each word of a sentence. The string includes two sequences of a base character followed by a combining character.

type -> class for consistency with the previous sentence. Not sure which is better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rune is actually a struct, so I'll use that here.


[!code-cpp[System.String.Class#6](~/samples/snippets/cpp/VS_Snippets_CLR_System/system.String.Class/cpp/string.index3.cpp#6)]
[!code-csharp-interactive[System.String.Class#6](~/samples/snippets/csharp/VS_Snippets_CLR_System/system.String.Class/cs/index3.cs#6)]
[!code-vb[System.String.Class#6](~/samples/snippets/visualbasic/VS_Snippets_CLR_System/system.String.Class/vb/index3.vb#6)]

This example works with text elements by using the <xref:System.Globalization.StringInfo.GetTextElementEnumerator%2A?displayProperty=nameWithType> method and the <xref:System.Globalization.TextElementEnumerator> class to enumerate all the text elements in a string. You can also retrieve an array that contains the starting index of each text element by calling the <xref:System.Globalization.StringInfo.ParseCombiningCharacters%2A?displayProperty=nameWithType> method.

For more information about working with units of text rather than individual <xref:System.Char> values, see the <xref:System.Globalization.StringInfo> class.
For more information about working with units of text rather than individual <xref:System.Char> values, see [Introduction to character encoding in .NET](/dotnet/standard/base-types/character-encoding-introduction).
Copy link
Contributor

@gewarren gewarren May 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
For more information about working with units of text rather than individual <xref:System.Char> values, see [Introduction to character encoding in .NET](/dotnet/standard/base-types/character-encoding-introduction).
For more information about working with units of text rather than individual <xref:System.Char> values, see [Introduction to character encoding in .NET](~/docs/standard/base-types/character-encoding-introduction.md).

@tdykstra
Copy link
Contributor Author

tdykstra commented May 8, 2020

Thanks for the review @gewarren.

@@ -214,15 +216,15 @@
[!code-csharp-interactive[System.String.Class#5](~/samples/snippets/csharp/VS_Snippets_CLR_System/system.String.Class/cs/index2.cs#5)]
[!code-vb[System.String.Class#5](~/samples/snippets/visualbasic/VS_Snippets_CLR_System/system.String.Class/vb/index2.vb#5)]

Consecutive index values might not correspond to consecutive Unicode characters, because a Unicode character might be encoded as more than one <xref:System.Char> object. In particular, a string may contain multi-character units of text that are formed by a base character followed by one or more combining characters or by surrogate pairs. To work with Unicode characters instead of <xref:System.Char> objects, use the <xref:System.Globalization.StringInfo?displayProperty=nameWithType> and <xref:System.Globalization.TextElementEnumerator> classes. The following example illustrates the difference between code that works with <xref:System.Char> objects and code that works with Unicode characters. It compares the number of characters or text elements in each word of a sentence. The string includes two sequences of a base character followed by a combining character.
Consecutive index values might not correspond to consecutive Unicode characters, because a Unicode character might be encoded as more than one <xref:System.Char> object. In particular, a string may contain multi-character units of text that are formed by a base character followed by one or more combining characters or by surrogate pairs. To work with Unicode characters instead of <xref:System.Char> objects, use the <xref:System.Globalization.StringInfo?displayProperty=nameWithType> and <xref:System.Globalization.TextElementEnumerator> classes, or the <xref:System.String.EnumerateRunes%2A?displayProperty=nameWithType> method and the <xref:System.Text.Rune> type. The following example illustrates the difference between code that works with <xref:System.Char> objects and code that works with Unicode characters. It compares the number of characters or text elements in each word of a sentence. The string includes two sequences of a base character followed by a combining character.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Consecutive index values might not correspond to consecutive Unicode characters, because a Unicode character might be encoded as more than one <xref:System.Char> object. In particular, a string may contain multi-character units of text that are formed by a base character followed by one or more combining characters or by surrogate pairs. To work with Unicode characters instead of <xref:System.Char> objects, use the <xref:System.Globalization.StringInfo?displayProperty=nameWithType> and <xref:System.Globalization.TextElementEnumerator> classes, or the <xref:System.String.EnumerateRunes%2A?displayProperty=nameWithType> method and the <xref:System.Text.Rune> type. The following example illustrates the difference between code that works with <xref:System.Char> objects and code that works with Unicode characters. It compares the number of characters or text elements in each word of a sentence. The string includes two sequences of a base character followed by a combining character.
Consecutive index values might not correspond to consecutive Unicode characters, because a Unicode character might be encoded as more than one <xref:System.Char> object. In particular, a string may contain multi-character units of text that are formed by a base character followed by one or more combining characters or by surrogate pairs. To work with Unicode characters instead of <xref:System.Char> objects, use the <xref:System.Globalization.StringInfo?displayProperty=nameWithType> and <xref:System.Globalization.TextElementEnumerator> classes, or the <xref:System.String.EnumerateRunes%2A?displayProperty=nameWithType> method and the <xref:System.Text.Rune> struct. The following example illustrates the difference between code that works with <xref:System.Char> objects and code that works with Unicode characters. It compares the number of characters or text elements in each word of a sentence. The string includes two sequences of a base character followed by a combining character.

@tdykstra tdykstra removed the request for review from BillWagner May 8, 2020 23:46
@tdykstra tdykstra merged commit 0f4839d into dotnet:master May 8, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants