Skip to content

Commit 3391a13

Browse files
authored
Merge pull request #11712 from dotnet/copilot/fix-11711
Clarify Encoding.GetEncoding(0) and Encoding.Default behavior differences between .NET Framework and .NET Core+
2 parents 23135c3 + a47f994 commit 3391a13

File tree

2 files changed

+69
-13
lines changed

2 files changed

+69
-13
lines changed

xml/System.Text/CodePagesEncodingProvider.xml

Lines changed: 25 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -75,6 +75,15 @@
7575
7676
After an <xref:System.Text.EncodingProvider> object is registered, the encodings that it supports are available by calling the overloads of <xref:System.Text.Encoding.GetEncoding%2A?displayProperty=nameWithType>; you should not call the <xref:System.Text.EncodingProvider.GetEncoding%2A?displayProperty=nameWithType> overloads.
7777
78+
### Impact on default encoding behavior
79+
80+
Registering <xref:System.Text.CodePagesEncodingProvider> also affects the behavior of <xref:System.Text.Encoding.GetEncoding(System.Int32)> when called with a `codepage` argument of `0` in .NET (Core):
81+
82+
- On Windows, `GetEncoding(0)` returns the encoding that matches the system's active code page, which is the same behavior as in .NET Framework.
83+
- On non-Windows platforms, `GetEncoding(0)` returns UTF-8, maintaining cross-platform consistency.
84+
85+
When no encoding provider is registered, `GetEncoding(0)` returns UTF-8 on all platforms in .NET Core and later versions.
86+
7887
]]></format>
7988
</remarks>
8089
</Docs>
@@ -154,7 +163,22 @@ The .NET Framework supports a large number of character encodings and code pages
154163
<param name="codepage">The code page identifier of the preferred encoding that the encoding provider might support.</param>
155164
<summary>Returns the encoding associated with the specified code page identifier.</summary>
156165
<returns>The encoding associated with the specified code page identifier, or <see langword="null" /> if the provider does not support the requested codepage encoding.</returns>
157-
<remarks>To be added.</remarks>
166+
<remarks>
167+
<format type="text/markdown"><![CDATA[
168+
169+
## Remarks
170+
171+
This method provides access to code page encodings that are available in .NET Framework but not natively supported in .NET Core and later versions.
172+
173+
When `codepage` is `0`, this method has special behavior that affects the default encoding returned by <xref:System.Text.Encoding.GetEncoding(System.Int32)>:
174+
175+
- **On Windows**: Returns the encoding that matches the system's active code page, providing the same behavior as .NET Framework.
176+
- **On non-Windows platforms**: Returns `null`, allowing <xref:System.Text.Encoding.GetEncoding(System.Int32)> to fall back to its default UTF-8 behavior.
177+
178+
For all other supported code page identifiers, this method returns the corresponding encoding if it's available from the code pages encoding provider, or `null` if the code page is not supported.
179+
180+
]]></format>
181+
</remarks>
158182
</Docs>
159183
</Member>
160184
<Member MemberName="GetEncoding">

xml/System.Text/Encoding.xml

Lines changed: 44 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -985,7 +985,23 @@ The returned <xref:System.IO.Stream>'s <xref:System.IO.Stream.CanRead> and <xref
985985
<Docs>
986986
<summary>Gets the default encoding for this .NET implementation.</summary>
987987
<value>The default encoding for this .NET implementation.</value>
988-
<remarks>For more information about this API, see <see href="/dotnet/fundamentals/runtime-libraries/system-text-encoding-default">Supplemental API remarks for Encoding.Default</see>.</remarks>
988+
<remarks>
989+
<format type="text/markdown"><![CDATA[
990+
991+
## Remarks
992+
993+
The behavior of the <xref:System.Text.Encoding.Default%2A> property varies between different .NET implementations:
994+
995+
- **In .NET Framework**: Returns the encoding that corresponds to the system's active code page. This is the same encoding returned by <xref:System.Text.Encoding.GetEncoding(System.Int32)> when called with a `codepage` argument of `0`.
996+
997+
- **In .NET Core and later versions**: Always returns a <xref:System.Text.UTF8Encoding> object. This behavior was changed to encourage the use of Unicode encodings for better cross-platform compatibility and data integrity.
998+
999+
For the most consistent results across different platforms and .NET implementations, consider using a specific Unicode encoding such as UTF-8 directly instead of relying on the default encoding. You can obtain UTF-8 encoding by calling <xref:System.Text.Encoding.UTF8?displayProperty=nameWithType> or <xref:System.Text.Encoding.GetEncoding(System.String)?displayProperty=nameWithType> with "utf-8".
1000+
1001+
For more information about this API, see <see href="/dotnet/fundamentals/runtime-libraries/system-text-encoding-default">Supplemental API remarks for Encoding.Default</see>.
1002+
1003+
]]></format>
1004+
</remarks>
9891005
</Docs>
9901006
</Member>
9911007
<Member MemberName="EncoderFallback">
@@ -3531,13 +3547,19 @@ The returned <xref:System.IO.Stream>'s <xref:System.IO.Stream.CanRead> and <xref
35313547
35323548
In addition to the encodings that are natively available on .NET Core or that are intrinsically supported on a specific platform version of .NET Framework, the <xref:System.Text.Encoding.GetEncoding%2A> method returns any additional encodings that are made available by registering an <xref:System.Text.EncodingProvider> object. If the same encoding has been registered by multiple <xref:System.Text.EncodingProvider> objects, this method returns the last one registered.
35333549
3534-
You can also supply a value of 0 for the `codepage` argument. Its precise behavior depends on whether any encodings have been made available by registering an <xref:System.Text.EncodingProvider> object:
3550+
You can also supply a value of 0 for the `codepage` argument. The behavior varies between .NET Framework and .NET Core and later versions:
3551+
3552+
**In .NET Framework**: Always returns the encoding that corresponds to the system's active code page in Windows. This is the same encoding returned by the <xref:System.Text.Encoding.Default?displayProperty=nameWithType> property.
35353553
3536-
- If one or more encoding providers have been registered, it returns the encoding of the last registered provider that has chosen to return a encoding when the <xref:System.Text.Encoding.GetEncoding%2A> method is passed a `codepage` argument of 0.
3554+
**In .NET Core and later versions**: The behavior depends on the encoding configuration of the application:
35373555
3538-
- On .NET Framework, if no encoding provider has been registered, if the <xref:System.Text.CodePagesEncodingProvider> is the registered encoding provider, or if no registered encoding provider handles a `codepage` value of 0, it returns the operating system's active code page. To determine the active code page on Windows systems, call the Windows [GetACP](/windows/win32/api/winnls/nf-winnls-getacp) function from .NET Framework.
3556+
- **No encoding provider registered**: Returns a <xref:System.Text.UTF8Encoding>, same as <xref:System.Text.Encoding.Default?displayProperty=nameWithType>.
35393557
3540-
- On .NET Core, if no encoding provider has been registered or if no registered encoding provider handles a `codepage` value of 0, it returns the <xref:System.Text.UTF8Encoding>.
3558+
- **<xref:System.Text.CodePagesEncodingProvider> registered**:
3559+
- On **Windows**, returns the encoding that matches the system's active code page (same as .NET Framework behavior).
3560+
- On **non-Windows platforms**, always returns a <xref:System.Text.UTF8Encoding>.
3561+
3562+
- **A different provider registered**: The behavior is determined by that provider. Consult its documentation for details. If multiple providers are registered, the method returns the encoding from the last registered provider that handles a `codepage` argument of 0.
35413563
35423564
> [!NOTE]
35433565
> - Some unsupported code pages cause an <xref:System.ArgumentException> to be thrown, whereas others cause a <xref:System.NotSupportedException>. Therefore, your code must catch all exceptions indicated in the Exceptions section.
@@ -3731,13 +3753,19 @@ In .NET Framework, the <xref:System.Text.Encoding.GetEncoding%2A> method relies
37313753
37323754
In addition to the encodings that are natively available on .NET Core or that are intrinsically supported on a specific platform version of .NET Framework, the <xref:System.Text.Encoding.GetEncoding%2A> method returns any additional encodings that are made available by registering an <xref:System.Text.EncodingProvider> object. If the same encoding has been registered by multiple <xref:System.Text.EncodingProvider> objects, this method returns the last one registered.
37333755
3734-
You can also supply a value of 0 for the `codepage` argument. Its precise behavior depends on whether any encodings have been made available by registering an <xref:System.Text.EncodingProvider> object:
3756+
You can also supply a value of 0 for the `codepage` argument. The behavior varies between .NET Framework and .NET Core and later versions:
3757+
3758+
**In .NET Framework**: Always returns the encoding that corresponds to the system's active code page in Windows. This is the same encoding returned by the <xref:System.Text.Encoding.Default?displayProperty=nameWithType> property.
37353759
3736-
- If one or more encoding providers have been registered, it returns the encoding of the last registered provider that has chosen to return a encoding when the <xref:System.Text.Encoding.GetEncoding%2A> method is passed a `codepage` argument of 0.
3760+
**In .NET Core and later versions**: The behavior depends on the encoding configuration of the application:
37373761
3738-
- On .NET Framework, if no encoding provider has been registered, if the <xref:System.Text.CodePagesEncodingProvider> is the registered encoding provider, or if no registered encoding provider handles a `codepage` value of 0, it returns the active code page.
3762+
- **No encoding provider registered**: Returns a <xref:System.Text.UTF8Encoding>, same as <xref:System.Text.Encoding.Default?displayProperty=nameWithType>.
37393763
3740-
- On .NET Core, if no encoding provider has been registered or if no registered encoding provider handles a `codepage` value of 0, it returns the <xref:System.Text.UTF8Encoding> encoding.
3764+
- **<xref:System.Text.CodePagesEncodingProvider> registered**:
3765+
- On **Windows**, returns the encoding that matches the system's active code page (same as .NET Framework behavior).
3766+
- On **non-Windows platforms**, always returns a <xref:System.Text.UTF8Encoding>.
3767+
3768+
- **A different provider registered**: The behavior is determined by that provider. Consult its documentation for details. If multiple providers are registered, the method returns the encoding from the last registered provider that handles a `codepage` argument of 0.
37413769
37423770
> [!NOTE]
37433771
> The ANSI code pages can be different on different computers and can change on a single computer, leading to data corruption. For this reason, if the active code page is an ANSI code page, encoding and decoding data using the default code page returned by `Encoding.GetEncoding(0)` is not recommended. For the most consistent results, you should use Unicode, such as UTF-8 (code page 65001) or UTF-16, instead of a specific code page.
@@ -5296,11 +5324,15 @@ The goal is to save this file, then open and decode it as a binary stream.
52965324
## Remarks
52975325
The <xref:System.Text.Encoding.RegisterProvider%2A> method allows you to register a class derived from <xref:System.Text.EncodingProvider> that makes character encodings available on a platform that does not otherwise support them. Once the encoding provider is registered, the encodings that it supports can be retrieved by calling any <xref:System.Text.Encoding.GetEncoding%2A?displayProperty=nameWithType> overload. If there are multiple encoding providers, the <xref:System.Text.Encoding.GetEncoding%2A?displayProperty=nameWithType> method attempts to retrieve a specified encoding from each provider starting with the one most recently registered.
52985326
5299-
Registering an encoding provider by using the <xref:System.Text.Encoding.RegisterProvider%2A> method also modifies the behavior of the [Encoding.GetEncoding(Int32)](<xref:System.Text.Encoding.GetEncoding(System.Int32)>) and [EncodingProvider.GetEncoding(Int32, EncoderFallback, DecoderFallback)](xref:System.Text.Encoding.GetEncoding(System.Int32,System.Text.EncoderFallback,System.Text.DecoderFallback)) methods when passed an argument of `0`:
5327+
Registering an encoding provider by using the <xref:System.Text.Encoding.RegisterProvider%2A> method also affects the behavior of <xref:System.Text.Encoding.GetEncoding(System.Int32)> when passed an argument of `0`. This is particularly important in .NET Core and later versions where the default behavior for <xref:System.Text.Encoding.GetEncoding(System.Int32)> with `codepage` 0 is to return UTF-8:
5328+
5329+
- **If the registered provider is <xref:System.Text.CodePagesEncodingProvider>**:
5330+
- On **Windows**, <xref:System.Text.Encoding.GetEncoding(System.Int32)> with `codepage` 0 returns the encoding that matches the system's active code page (same as .NET Framework behavior).
5331+
- On **non-Windows platforms**, it still returns UTF-8.
53005332
5301-
- If the registered provider is the <xref:System.Text.CodePagesEncodingProvider>, the method returns the encoding that matches the system active code page when running on the Windows operating system.
5333+
- **If a custom encoding provider is registered**: The provider can choose which encoding to return when <xref:System.Text.Encoding.GetEncoding(System.Int32)> is passed an argument of `0`. The provider can also choose to not handle this case by returning `null` from its <xref:System.Text.EncodingProvider.GetEncoding%2A?displayProperty=nameWithType> method, in which case the default UTF-8 behavior is used.
53025334
5303-
- A custom encoding provider can choose which encoding to return when either of these <xref:System.Text.Encoding.GetEncoding%2A> method overloads is passed an argument of `0`. The provider can also choose to not return an encoding by having the <xref:System.Text.EncodingProvider.GetEncoding%2A?displayProperty=nameWithType> method return `null`.
5335+
If multiple providers are registered, <xref:System.Text.Encoding.GetEncoding(System.Int32)> attempts to retrieve the encoding from the most recently registered provider first.
53045336
53055337
Starting with .NET Framework 4.6, .NET Framework includes one encoding provider, <xref:System.Text.CodePagesEncodingProvider>, that makes the encodings available that are present in the full .NET Framework but are not available in the Universal Windows Platform. By default, the Universal Windows Platform only supports the Unicode encodings, ASCII, and code page 28591.
53065338

0 commit comments

Comments
 (0)