Self-Hosted Runtime String Marshalling #94416
-
I'm currently writing a library that provides an API for hosting a .NET Runtime using HostFXR, and I'm a little bit confused when it comes to the rules for passing value types between C++ and C#. Specifically I'm running into an issue with passing strings between C++ and C#, I wanted a neat API for this so I created a C# struct called [StructLayout(LayoutKind.Sequential)]
public struct NativeString : IDisposable
{
internal IntPtr m_NativeString;
private Bool32 m_IsDisposed;
public void Dispose()
{
if (!m_IsDisposed)
{
if (m_NativeString != IntPtr.Zero)
{
Marshal.FreeCoTaskMem(m_NativeString);
m_NativeString = IntPtr.Zero;
}
m_IsDisposed = true;
}
GC.SuppressFinalize(this);
}
public override string? ToString() => this;
public static NativeString Null() => new NativeString(){ m_NativeString = IntPtr.Zero };
public static implicit operator NativeString(string? InString) => new(){ m_NativeString = Marshal.StringToCoTaskMemAuto(InString) };
public static implicit operator string?(NativeString InString) => Marshal.PtrToStringAuto(InString.m_NativeString);
} And in C++ I've mimicked this struct: class NativeString
{
public:
NativeString() = default;
NativeString(std::string_view InString);
NativeString(const std::string& InString);
NativeString(const char* InString);
NativeString(const NativeString& InOther);
NativeString(NativeString&& InOther) noexcept;
~NativeString();
NativeString& operator=(const NativeString& InOther);
NativeString& operator=(NativeString&& InOther) noexcept;
void Assign(std::string_view InString);
operator std::string() const;
bool operator==(const NativeString& InOther) const;
CharType* Data() { return m_String; }
const CharType* Data() const { return m_String; }
private:
CharType* m_String = nullptr;
Bool32 m_IsDisposed = false;
}; I then try to pass it to this C# method: [UnmanagedCallersOnly]
private static unsafe int LoadAssembly(int InContextId, NativeString InAssemblyFilePathPtr); The strange thing is that this seems to work in my Windows build, but not on Linux, and someone else who used my API had issues with this on Windows as well, so I'm wondering if this is possible at all, or if I have to take a different approach to this? Edit: Accidentally pressed enter without finishing the post, updated with the rest of the question now 😅 |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 1 reply
-
Not that it helps your problem, but: An implicit conversion operator causing a native memory allocation that isn't automatically released doesn't strike me as the best of designs. You really want an explicit gesture for creating a NativeString from a private void SomeFunc(NativeString str) { ... }
...
SomeFunc("hello"); So |
Beta Was this translation helpful? Give feedback.
-
@AaronRobinsonMSFT - the interop expert. |
Beta Was this translation helpful? Give feedback.
-
Let me address the string angle and then talk about the value type and language rules. Strings always seem to be a special case and I find they tend to get overengineered at the margins. In this case I'm unsure of the utility for passing a value type across the interop boundary for a string. My suggestion here would be to default to the C definiton of a string, a sequence of non-zero bytes with a zero-value terminator, and rely upon that as you marshal across the interop boundary. That string definition should be updated appropriately based on encoding, which brings me to... Please be considerate of string encodings (that is, UTF-8/UTF-16/UTF-32), your application may not care and that is fine, but make it an intentional and documented decision point. In C#, this would look like: // Assume null terminated
[UnmanagedCallersOnly]
private static unsafe int LoadAssembly(int InContextId, IntPtr str); The above is clearer to me but if passing a
As far as value types are concerned, there are no differences between passing a #ifdef _MSC_VER
typedef wchar_t char_t; // UTF-16
#else
typedef char char_t; // UTF-8
#endif // _MSC_VER
typedef int32_t BOOL;
typedef struct
{
char_t* m_String;
BOOL m_IsDisposed;
} NativeString;
int32_t LoadAssembly(int32_t id, NativeString str);
.NET supports interop with C or C++/CLI only, not ISO C++. ISO C++ has slightly different rules for some types and that can get complicated quick and lead to undefined behavior when using different compilers (for example, MSVC vs Clang vs GCC). |
Beta Was this translation helpful? Give feedback.
Let me address the string angle and then talk about the value type and language rules.
Strings always seem to be a special case and I find they tend to get overengineered at the margins. In this case I'm unsure of the utility for passing a value type across the interop boundary for a string. My suggestion here would be to default to the C definiton of a string, a sequence of non-zero bytes with a zero-value terminator, and rely upon that as you marshal across the interop boundary. That string definition should be updated appropriately based on encoding, which brings me to.…