From c2547a387ce34a3ea13c56a15a99aa64ef408699 Mon Sep 17 00:00:00 2001 From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com> Date: Tue, 19 Aug 2025 20:35:18 +0000 Subject: [PATCH 1/3] Initial plan From 13c709f07670339cb3dc02a99743b8e372d6e28f Mon Sep 17 00:00:00 2001 From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com> Date: Tue, 19 Aug 2025 20:44:52 +0000 Subject: [PATCH 2/3] Update Encoding documentation to clarify GetEncoding(0) and Default behavior differences Co-authored-by: danmoseley <6385855+danmoseley@users.noreply.github.com> --- xml/System.Text/CodePagesEncodingProvider.xml | 26 ++++++++- xml/System.Text/Encoding.xml | 56 +++++++++++++++---- 2 files changed, 69 insertions(+), 13 deletions(-) diff --git a/xml/System.Text/CodePagesEncodingProvider.xml b/xml/System.Text/CodePagesEncodingProvider.xml index debf149416f..0c8a549b51d 100644 --- a/xml/System.Text/CodePagesEncodingProvider.xml +++ b/xml/System.Text/CodePagesEncodingProvider.xml @@ -75,6 +75,15 @@ After an object is registered, the encodings that it supports are available by calling the overloads of ; you should not call the overloads. +## Impact on Default Encoding Behavior + +Registering also affects the behavior of when called with a `codepage` argument of `0`: + +- **On Windows**: Returns the encoding that matches the system's active code page, which is the same behavior as in .NET Framework. +- **On non-Windows platforms**: Still returns UTF-8, maintaining cross-platform consistency. + +Without any encoding provider registered, with `codepage` 0 returns UTF-8 on all platforms in .NET Core and later versions. + ]]> @@ -154,7 +163,22 @@ The .NET Framework supports a large number of character encodings and code pages The code page identifier of the preferred encoding that the encoding provider might support. Returns the encoding associated with the specified code page identifier. The encoding associated with the specified code page identifier, or if the provider does not support the requested codepage encoding. - To be added. + + : + +- **On Windows**: Returns the encoding that matches the system's active code page, providing the same behavior as .NET Framework. +- **On non-Windows platforms**: Returns `null`, allowing to fall back to its default UTF-8 behavior. + +For all other supported code page identifiers, this method returns the corresponding encoding if it's available from the code pages encoding provider, or `null` if the code page is not supported. + + ]]> + diff --git a/xml/System.Text/Encoding.xml b/xml/System.Text/Encoding.xml index 0d9357ed830..1d1c7935e07 100644 --- a/xml/System.Text/Encoding.xml +++ b/xml/System.Text/Encoding.xml @@ -985,7 +985,23 @@ The returned 's and Gets the default encoding for this .NET implementation. The default encoding for this .NET implementation. - For more information about this API, see Supplemental API remarks for Encoding.Default. + + property varies between different .NET implementations: + +- **In .NET Framework**: Returns the encoding that corresponds to the system's active code page. This is the same encoding returned by when called with a `codepage` argument of `0`. + +- **In .NET Core and later versions**: Always returns a object. This behavior was changed to encourage the use of Unicode encodings for better cross-platform compatibility and data integrity. + +For the most consistent results across different platforms and .NET implementations, consider using a specific Unicode encoding such as UTF-8 directly instead of relying on the default encoding. You can obtain UTF-8 encoding by calling or with "utf-8". + +For more information about this API, see Supplemental API remarks for Encoding.Default. + + ]]> + @@ -3531,13 +3547,19 @@ The returned 's and method returns any additional encodings that are made available by registering an object. If the same encoding has been registered by multiple objects, this method returns the last one registered. -You can also supply a value of 0 for the `codepage` argument. Its precise behavior depends on whether any encodings have been made available by registering an object: +You can also supply a value of 0 for the `codepage` argument. The behavior varies between .NET Framework and .NET Core and later versions: + +**In .NET Framework**: Always returns the encoding that corresponds to the system's active code page in Windows. This is the same encoding returned by the property. -- If one or more encoding providers have been registered, it returns the encoding of the last registered provider that has chosen to return a encoding when the method is passed a `codepage` argument of 0. +**In .NET Core and later versions**: The behavior depends on the encoding configuration of the application: -- On .NET Framework, if no encoding provider has been registered, if the is the registered encoding provider, or if no registered encoding provider handles a `codepage` value of 0, it returns the operating system's active code page. To determine the active code page on Windows systems, call the Windows [GetACP](/windows/win32/api/winnls/nf-winnls-getacp) function from .NET Framework. +- **No encoding provider registered**: Returns a , same as . -- On .NET Core, if no encoding provider has been registered or if no registered encoding provider handles a `codepage` value of 0, it returns the . +- ** registered**: + - On **Windows**, returns the encoding that matches the system's active code page (same as .NET Framework behavior). + - On **non-Windows platforms**, always returns a . + +- **A different provider registered**: The behavior is determined by that provider. Consult its documentation for details. If multiple providers are registered, the method returns the encoding from the last registered provider that handles a `codepage` argument of 0. > [!NOTE] > - Some unsupported code pages cause an to be thrown, whereas others cause a . Therefore, your code must catch all exceptions indicated in the Exceptions section. @@ -3731,13 +3753,19 @@ In .NET Framework, the method relies In addition to the encodings that are natively available on .NET Core or that are intrinsically supported on a specific platform version of .NET Framework, the method returns any additional encodings that are made available by registering an object. If the same encoding has been registered by multiple objects, this method returns the last one registered. -You can also supply a value of 0 for the `codepage` argument. Its precise behavior depends on whether any encodings have been made available by registering an object: +You can also supply a value of 0 for the `codepage` argument. The behavior varies between .NET Framework and .NET Core and later versions: + +**In .NET Framework**: Always returns the encoding that corresponds to the system's active code page in Windows. This is the same encoding returned by the property. -- If one or more encoding providers have been registered, it returns the encoding of the last registered provider that has chosen to return a encoding when the method is passed a `codepage` argument of 0. +**In .NET Core and later versions**: The behavior depends on the encoding configuration of the application: -- On .NET Framework, if no encoding provider has been registered, if the is the registered encoding provider, or if no registered encoding provider handles a `codepage` value of 0, it returns the active code page. +- **No encoding provider registered**: Returns a , same as . -- On .NET Core, if no encoding provider has been registered or if no registered encoding provider handles a `codepage` value of 0, it returns the encoding. +- ** registered**: + - On **Windows**, returns the encoding that matches the system's active code page (same as .NET Framework behavior). + - On **non-Windows platforms**, always returns a . + +- **A different provider registered**: The behavior is determined by that provider. Consult its documentation for details. If multiple providers are registered, the method returns the encoding from the last registered provider that handles a `codepage` argument of 0. > [!NOTE] > The ANSI code pages can be different on different computers and can change on a single computer, leading to data corruption. For this reason, if the active code page is an ANSI code page, encoding and decoding data using the default code page returned by `Encoding.GetEncoding(0)` is not recommended. For the most consistent results, you should use Unicode, such as UTF-8 (code page 65001) or UTF-16, instead of a specific code page. @@ -5296,11 +5324,15 @@ The goal is to save this file, then open and decode it as a binary stream. ## Remarks The method allows you to register a class derived from that makes character encodings available on a platform that does not otherwise support them. Once the encoding provider is registered, the encodings that it supports can be retrieved by calling any overload. If there are multiple encoding providers, the method attempts to retrieve a specified encoding from each provider starting with the one most recently registered. -Registering an encoding provider by using the method also modifies the behavior of the [Encoding.GetEncoding(Int32)]() and [EncodingProvider.GetEncoding(Int32, EncoderFallback, DecoderFallback)](xref:System.Text.Encoding.GetEncoding(System.Int32,System.Text.EncoderFallback,System.Text.DecoderFallback)) methods when passed an argument of `0`: +Registering an encoding provider by using the method also affects the behavior of when passed an argument of `0`. This is particularly important in .NET Core and later versions where the default behavior for with `codepage` 0 is to return UTF-8: + +- **If the registered provider is **: + - On **Windows**, with `codepage` 0 returns the encoding that matches the system's active code page (same as .NET Framework behavior). + - On **non-Windows platforms**, it still returns UTF-8. -- If the registered provider is the , the method returns the encoding that matches the system active code page when running on the Windows operating system. +- **If a custom encoding provider is registered**: The provider can choose which encoding to return when is passed an argument of `0`. The provider can also choose to not handle this case by returning `null` from its method, in which case the default UTF-8 behavior is used. -- A custom encoding provider can choose which encoding to return when either of these method overloads is passed an argument of `0`. The provider can also choose to not return an encoding by having the method return `null`. +If multiple providers are registered, attempts to retrieve the encoding from the most recently registered provider first. Starting with .NET Framework 4.6, .NET Framework includes one encoding provider, , that makes the encodings available that are present in the full .NET Framework but are not available in the Universal Windows Platform. By default, the Universal Windows Platform only supports the Unicode encodings, ASCII, and code page 28591. From a47f9940f935186ec270e0bba9f2eb746fc6d896 Mon Sep 17 00:00:00 2001 From: Tarek Mahmoud Sayed <10833894+tarekgh@users.noreply.github.com> Date: Tue, 19 Aug 2025 16:04:54 -0700 Subject: [PATCH 3/3] Apply suggestions from code review Co-authored-by: Genevieve Warren <24882762+gewarren@users.noreply.github.com> --- xml/System.Text/CodePagesEncodingProvider.xml | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/xml/System.Text/CodePagesEncodingProvider.xml b/xml/System.Text/CodePagesEncodingProvider.xml index 0c8a549b51d..4100c51f09d 100644 --- a/xml/System.Text/CodePagesEncodingProvider.xml +++ b/xml/System.Text/CodePagesEncodingProvider.xml @@ -75,14 +75,14 @@ After an object is registered, the encodings that it supports are available by calling the overloads of ; you should not call the overloads. -## Impact on Default Encoding Behavior +### Impact on default encoding behavior -Registering also affects the behavior of when called with a `codepage` argument of `0`: +Registering also affects the behavior of when called with a `codepage` argument of `0` in .NET (Core): -- **On Windows**: Returns the encoding that matches the system's active code page, which is the same behavior as in .NET Framework. -- **On non-Windows platforms**: Still returns UTF-8, maintaining cross-platform consistency. +- On Windows, `GetEncoding(0)` returns the encoding that matches the system's active code page, which is the same behavior as in .NET Framework. +- On non-Windows platforms, `GetEncoding(0)` returns UTF-8, maintaining cross-platform consistency. -Without any encoding provider registered, with `codepage` 0 returns UTF-8 on all platforms in .NET Core and later versions. +When no encoding provider is registered, `GetEncoding(0)` returns UTF-8 on all platforms in .NET Core and later versions. ]]>