Skip to content

Add Rune links to String class #4206

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
May 8, 2020
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 4 additions & 2 deletions xml/System/String.xml
Original file line number Diff line number Diff line change
Expand Up @@ -98,6 +98,8 @@

A string is a sequential collection of characters that is used to represent text. A <xref:System.String> object is a sequential collection of <xref:System.Char?displayProperty=nameWithType> objects that represent a string; a <xref:System.Char?displayProperty=nameWithType> object corresponds to a UTF-16 code unit. The value of the <xref:System.String> object is the content of the sequential collection of <xref:System.Char?displayProperty=nameWithType> objects, and that value is immutable (that is, it is read-only). For more information about the immutability of strings, see the [Immutability and the StringBuilder class](#Immutability) section later in this topic. The maximum size of a <xref:System.String> object in memory is 2GB, or about 1 billion characters.

For more information about Unicode, UTF-16, code units, code points, and the <xref:System.Char> and <xref:System.Text.Rune> types, see [Introduction to character encoding in .NET](/dotnet/standard/base-types/character-encoding-introduction).
Copy link
Contributor

@gewarren gewarren May 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it better to have the link in the ~ format? I see other links in this file that use the ~ format.

Suggested change
For more information about Unicode, UTF-16, code units, code points, and the <xref:System.Char> and <xref:System.Text.Rune> types, see [Introduction to character encoding in .NET](/dotnet/standard/base-types/character-encoding-introduction).
For more information about Unicode, UTF-16, code units, code points, and the <xref:System.Char> and <xref:System.Text.Rune> types, see [Introduction to character encoding in .NET](~/docs/standard/base-types/character-encoding-introduction.md).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have to include .md with that format.


In this section:

[Instantiate a String object](#Instantiation)\
Expand Down Expand Up @@ -214,15 +216,15 @@
[!code-csharp-interactive[System.String.Class#5](~/samples/snippets/csharp/VS_Snippets_CLR_System/system.String.Class/cs/index2.cs#5)]
[!code-vb[System.String.Class#5](~/samples/snippets/visualbasic/VS_Snippets_CLR_System/system.String.Class/vb/index2.vb#5)]

Consecutive index values might not correspond to consecutive Unicode characters, because a Unicode character might be encoded as more than one <xref:System.Char> object. In particular, a string may contain multi-character units of text that are formed by a base character followed by one or more combining characters or by surrogate pairs. To work with Unicode characters instead of <xref:System.Char> objects, use the <xref:System.Globalization.StringInfo?displayProperty=nameWithType> and <xref:System.Globalization.TextElementEnumerator> classes. The following example illustrates the difference between code that works with <xref:System.Char> objects and code that works with Unicode characters. It compares the number of characters or text elements in each word of a sentence. The string includes two sequences of a base character followed by a combining character.
Consecutive index values might not correspond to consecutive Unicode characters, because a Unicode character might be encoded as more than one <xref:System.Char> object. In particular, a string may contain multi-character units of text that are formed by a base character followed by one or more combining characters or by surrogate pairs. To work with Unicode characters instead of <xref:System.Char> objects, use the <xref:System.Globalization.StringInfo?displayProperty=nameWithType> and <xref:System.Globalization.TextElementEnumerator> classes, or the <xref:System.String.EnumerateRunes%2A?displayProperty=nameWithType> method and the <xref:System.Text.Rune> type. The following example illustrates the difference between code that works with <xref:System.Char> objects and code that works with Unicode characters. It compares the number of characters or text elements in each word of a sentence. The string includes two sequences of a base character followed by a combining character.
Copy link
Contributor

@gewarren gewarren May 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Consecutive index values might not correspond to consecutive Unicode characters, because a Unicode character might be encoded as more than one <xref:System.Char> object. In particular, a string may contain multi-character units of text that are formed by a base character followed by one or more combining characters or by surrogate pairs. To work with Unicode characters instead of <xref:System.Char> objects, use the <xref:System.Globalization.StringInfo?displayProperty=nameWithType> and <xref:System.Globalization.TextElementEnumerator> classes, or the <xref:System.String.EnumerateRunes%2A?displayProperty=nameWithType> method and the <xref:System.Text.Rune> type. The following example illustrates the difference between code that works with <xref:System.Char> objects and code that works with Unicode characters. It compares the number of characters or text elements in each word of a sentence. The string includes two sequences of a base character followed by a combining character.
Consecutive index values might not correspond to consecutive Unicode characters, because a Unicode character might be encoded as more than one <xref:System.Char> object. In particular, a string may contain multi-character units of text that are formed by a base character followed by one or more combining characters or by surrogate pairs. To work with Unicode characters instead of <xref:System.Char> objects, use the <xref:System.Globalization.StringInfo?displayProperty=nameWithType> and <xref:System.Globalization.TextElementEnumerator> classes, or the <xref:System.String.EnumerateRunes%2A?displayProperty=nameWithType> method and the <xref:System.Text.Rune> struct. The following example illustrates the difference between code that works with <xref:System.Char> objects and code that works with Unicode characters. It compares the number of characters or text elements in each word of a sentence. The string includes two sequences of a base character followed by a combining character.

type -> class for consistency with the previous sentence. Not sure which is better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rune is actually a struct, so I'll use that here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Consecutive index values might not correspond to consecutive Unicode characters, because a Unicode character might be encoded as more than one <xref:System.Char> object. In particular, a string may contain multi-character units of text that are formed by a base character followed by one or more combining characters or by surrogate pairs. To work with Unicode characters instead of <xref:System.Char> objects, use the <xref:System.Globalization.StringInfo?displayProperty=nameWithType> and <xref:System.Globalization.TextElementEnumerator> classes, or the <xref:System.String.EnumerateRunes%2A?displayProperty=nameWithType> method and the <xref:System.Text.Rune> type. The following example illustrates the difference between code that works with <xref:System.Char> objects and code that works with Unicode characters. It compares the number of characters or text elements in each word of a sentence. The string includes two sequences of a base character followed by a combining character.
Consecutive index values might not correspond to consecutive Unicode characters, because a Unicode character might be encoded as more than one <xref:System.Char> object. In particular, a string may contain multi-character units of text that are formed by a base character followed by one or more combining characters or by surrogate pairs. To work with Unicode characters instead of <xref:System.Char> objects, use the <xref:System.Globalization.StringInfo?displayProperty=nameWithType> and <xref:System.Globalization.TextElementEnumerator> classes, or the <xref:System.String.EnumerateRunes%2A?displayProperty=nameWithType> method and the <xref:System.Text.Rune> struct. The following example illustrates the difference between code that works with <xref:System.Char> objects and code that works with Unicode characters. It compares the number of characters or text elements in each word of a sentence. The string includes two sequences of a base character followed by a combining character.


[!code-cpp[System.String.Class#6](~/samples/snippets/cpp/VS_Snippets_CLR_System/system.String.Class/cpp/string.index3.cpp#6)]
[!code-csharp-interactive[System.String.Class#6](~/samples/snippets/csharp/VS_Snippets_CLR_System/system.String.Class/cs/index3.cs#6)]
[!code-vb[System.String.Class#6](~/samples/snippets/visualbasic/VS_Snippets_CLR_System/system.String.Class/vb/index3.vb#6)]

This example works with text elements by using the <xref:System.Globalization.StringInfo.GetTextElementEnumerator%2A?displayProperty=nameWithType> method and the <xref:System.Globalization.TextElementEnumerator> class to enumerate all the text elements in a string. You can also retrieve an array that contains the starting index of each text element by calling the <xref:System.Globalization.StringInfo.ParseCombiningCharacters%2A?displayProperty=nameWithType> method.

For more information about working with units of text rather than individual <xref:System.Char> values, see the <xref:System.Globalization.StringInfo> class.
For more information about working with units of text rather than individual <xref:System.Char> values, see [Introduction to character encoding in .NET](/dotnet/standard/base-types/character-encoding-introduction).
Copy link
Contributor

@gewarren gewarren May 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
For more information about working with units of text rather than individual <xref:System.Char> values, see [Introduction to character encoding in .NET](/dotnet/standard/base-types/character-encoding-introduction).
For more information about working with units of text rather than individual <xref:System.Char> values, see [Introduction to character encoding in .NET](~/docs/standard/base-types/character-encoding-introduction.md).


<a name="Nulls"></a>
## Null strings and empty strings
Expand Down