CoreCLR - 114 projects with mammoth # of cyclic dependencies into projects' global namespace "dumping grounds" #81413

craigajohnson · 2023-01-31T07:31:04Z

craigajohnson
Jan 31, 2023

Just wrapping my head around the CoreCLR source and the various interdependencies. I am awestruck.

Patient step-debugging (native and managed) along with reading ECMA-335 has yielded at least an initial picture of how the major components/features hang together.

Next step - I used CppDepend (like NDepend - glorious tool) which highlighted a few things:

Some important explicit namespaces carving out big swaths of functionality - this is good
Enormous global namespace dumping grounds within each of the larger core projects
Cyclic dependencies across nearly every project (See attachment "Bad1")

Observations:

The lack of namespaces and qualifiers makes it an enormous challenge to understand the dependency flow of the code unless you are actually step-debugging or wearing out Go To Definition to get necessary context
If the code was namespaced out, we could do static code analysis and cycle elimination at the namespace level instead of relying solely on project boundaries. Attachment "Bad2" shows the current ambiguity if we look solely at namespaces without project context.

Ideally, there would be a "left to right" flow of higher-order namespaces dependent on lower-order namespaces without the lower-order flowing back to any higher-order dependencies. Based on the big ball of mud / Singleton ball of goo as it stands, this refactor would be somewhat of a gnarly beast.

Are there restrictions on PR contributions on these types of more structural mods? I assume yes.

AaronRobinsonMSFT · 2023-01-31T16:09:58Z

AaronRobinsonMSFT
Jan 31, 2023
Collaborator

Are there restrictions on PR contributions on these types of more structural mods? I assume yes.

@craigajohnson There aren't any restrictions per se. We always appreciate community contributions that help improve the code base. The unmanaged portion of code is very old in parts and has disparate conventions that can make understanding very difficult. The big break down is 3 areas - JIT, GC, and the VM. The JIT and GC are largely internally consistent and have clear contracts with the VM. The VM itself is indeed a "big ball of mud".

Namespaces are a great tool, but can create chaos with massive refactors for little practical benefit - readability is important and so is ensuring non-runtime developers can contribute but the majority of development is happening in C#. We are continually pushing to share more code with the other runtimes - NativeAOT and mono, so C# is the preferred sharing solution. Sharing is also possible in pure C but that requires a lot more justification and finesse to implement correctly.

That said, understanding the short, medium and long term goal for a refactor is key. Once that is understood it should be incremental and ideally tracked in a larger issue with nice checkboxes - example 1 example 2. As a suggestion, I would start with a small project and experiment with the impact and design. It is much easier to champion a prototype/experiment than a general question of "am I allowed to make this code more readable?". We look forward to all contributions.

/cc @jkotas @mangod9

4 replies

jkotas Jan 31, 2023
Collaborator

In addition to what @AaronRobinsonMSFT said, we strongly prefer refactorings that add some additional value and not just move code around. For example, refactoring that enables to share more code between components - #78852 is an example of such refactoring that is in progress.

craigajohnson Jan 31, 2023
Author

We are continually pushing to share more code with the other runtimes - NativeAOT and mono, so C# is the preferred sharing solution

@AaronRobinsonMSFT - are there boundary/entry points for this handoff from managed to unmanaged? There was talk of GC moving to managed code, etc. Are you saying, better to spend time on managed code (e.g., the new thread pool) rather than monkeying in the native stuff since the intention is to reduce the unmanaged footprint and migrate to managed-first?

jkotas Feb 1, 2023
Collaborator

better to spend time on managed code (e.g., the new thread pool)

Yes, that's better. Note that these are very non-trivial projects. For example, the rewrite of threadpool in C# took several man-years.

As Aaron said, it is best to have a discussion about a concrete proposal first.

craigajohnson Feb 1, 2023
Author

^^ I understand these are all huge efforts. I'll work through some POC concepts below and see if anything is worth proposing given your preference for leaving the unmanaged side alone. Thank you.

craigajohnson · 2023-01-31T17:57:06Z

craigajohnson
Jan 31, 2023
Author

@AaronRobinsonMSFT and @jkotas - that makes rational sense, thank you. And #78852 makes sense why you would value that as a refactor.

The sheer scope of unpicking the cyclic dependencies (EDIT - At the namespace level) so there is a one-way flow may make it challenging to justify the effort. My first exploration would be to just experiment with finding some dependency leaf nodes and seeing what the impact might be to "layerize" them. If things look promising, I would build out a larger comprehensive checklist, just wanting to get a sense of whether this type of effort would be useful.

I think the value add in any of this effort would not be functionality per se but rather improved intelligibility of some very complex processes, ease of maintenance, ease of layer replacement, etc. For instance, even with the VM <-> JIT and VM <-> GC interfaces in place, there are still static calls back into the VM and other higher order layers. I imagine that would make efforts like migrating various bits from unmanaged to managed more of a challenge?

Here's a fun example (see diagram) - the Thread class takes a dependency on EEContract, which makes logical sense. However, EEContract -also- takes a dependency on Thread. Ideally, EEContract would be near the end of the dependency chain and not be reaching back to Thread?

If I take ECMA-335 itself as a guide, there seems to be some natural cut-points with the primitives there and it would be excellent if the code represented all of that in a super clean way. Mapping the extremely well-defined concepts there back to the enmeshed soup is a bit of a grok.

6 replies

craigajohnson Jan 31, 2023
Author

Incidentally the VM <-> GC is the first reason I started looking, as I am exploring a non-blocking concurrent compacting GC scenario using a local GC. What I didn't know until I looked at the source is that you can build the current GC as if it were a local isolated GC. Nice!

I think it is very good that there are explicit interfaces between VM <-> JIT and VM <-> GC.

However, on initial pass of the source it looked like there are also dependencies outside of the interfaces that both the JIT and the GC reference which go back to cee_wks_core. If I am mistaken, I withdraw my concern.

It looks like it will be very challenging to locate and isolate the leaf node dependencies and then build back from there. I think ETW/Diag could be another useful place to build an interface and get those concerns isolated also. My brain wants to think of the host as the root context and then have the config and the various subsystems constructed/initialized and attached to it, with any interactions between the subsystems and the host via interfaces. Would also be good for discoverability.

However maybe there is a virtualization cost on the C++ side to do this, and it would make no sense to go backwards perf-wise just to say the code is organized differently.

jkotas Jan 31, 2023
Collaborator

dependencies outside of the interfaces that both the JIT and the GC

Both JIT and GC are compiled into separate .dlls and the only way to talk to clrjit.dll and clrgc.dll is via the interfaces. This guarantees by construction that there are side channels.

craigajohnson Feb 1, 2023
Author

^^ That makes sense. So I guess there are 2 concepts at work then - the shared code at compile-time that is present in cee_wks_core and other VM sources which are available to JIT and GC and then become part of those .dlls, and then at runtime, the interaction between VM and JIT/GC are via the interfaces. So in a way, the VM code is a combination of the VM itself along with additional shared source.

Very helpful context, thank you for the information.

jkotas Feb 1, 2023
Collaborator

there VM sources which are available to JIT and GC and then become part of those .dlls,

No VM sources are compiled into JIT and GC .dlls today.

craigajohnson Feb 1, 2023
Author

^^ OK I will make a point to study this in more detail. I was basing the premise off of the bidirectional code dependencies between cee_wks_* and the clrjit* projects and that ceemain.cpp is in cee_wks*, which I assume is the entry point for the VM itself, hence the code in cee_wks_* that is referenced by clrjit as well as being the entry point for the VM. I will study further.

craigajohnson · 2023-01-31T19:41:26Z

craigajohnson
Jan 31, 2023
Author

Another example - gcHelpers->AllocateObject. This appears to be super-important but it breaks -so- many rules :)

The larger effort would probably be to build out a context and have all of the internal services attached to it with all of the encapsulation guarantees one would expect, eliminating as much as possible the global static helpers/utils/cyclic dependency intricacies. But that would be a super massive undertaking and everyone would hate the merge churn. As it stands though there must be perpetual concern of insidious bugs happening in these lower levels. Spooky stuff :-/

4 replies

jkotas Jan 31, 2023
Collaborator

As it stands though there must be perpetual concern of insidious bugs happening in these lower levels.

I do not recall any bugs caused by these cycles. Do you have examples of issues in this repo that would be avoided by refactoring these cycles?

craigajohnson Jan 31, 2023
Author

^^ I cannot answer that yet. In the case of AllocateObject, it looks like there are explicit guard conditions the developer must remember to do, or else introduce leaks? I wonder if this were a function directly available via something like VM.GC.AllocateWithProtect(), then it returns a stack wrapper where you could use the reference but then explicitly unprotect when ready? Basically start to drain off the static helpers little bit by little bit, with low risk, like climbing up a cliff one step at a time :)

jkotas Jan 31, 2023
Collaborator

Yes, there are special rules to follow for writing the "manually managed code": https://github.com/dotnet/runtime/blob/main/docs/coding-guidelines/clr-code-guide.md#2

craigajohnson Jan 31, 2023
Author

Very nice. OK I think I am going to monkey with a POC of "Isolated VM" which instantiates a new context concept and flows dependencies downward to wrapped subsystems of instances of JIT, GC, ETW, etc., with the intent of steering away from static state and starting to unpick the ball of mud. It will live side-by-side with everything else. I will post back learnings and I'm sure it will be a useful exercise even if just to grok all the things and the POC itself perishes quietly :)

craigajohnson · 2023-02-01T20:07:50Z

craigajohnson
Feb 1, 2023
Author

@jkotas am I hallucinating, or am I seeing the NativeAOT stuff, along with the [UnmanagedCallersOnly], along with the "Isolated VM" POC concept above, where there would be no reason I couldn't write my own corerunner in C# itself and then host an isolated VM "instance", and then from there begin to hybridize out the C++ stuff and move more and more into a managed environment?

Meta dogfooding to eat its own dogfood to make more dogfood to then eat the new dogfood.

IsolatedVM POC instance
|
------ Host / HostPolicyContext
------ Configuration
------ ETW / Diag
------ GC
------ JIT
------ Whatever else I am naive about still

The IsolatedVM would be the root context, and we break free of the static soup but rather instantiate/isolate the subsystems

Monkeying in process...

2 replies

jkotas Feb 2, 2023
Collaborator

Sure, you can use one VM to bootstrap another VM. It comes with fairly non-trivial overhead. It is fine academic exercise, but I do not think that it is something we would accept into the repo.

craigajohnson Feb 2, 2023
Author

Of course. Monkeying does not belong in the repo :) I do think eventually there will be value in being able to isolate the VM solely to an instantiated root with attached instantiated subsystems where the sole dependencies are via interfaces from the single root. The large amount of interdependent static state in the current source makes that a big challenge and we would have to patiently unpick this ball of mud but it could definitely be done. I think the GC and JIT is a good start, but things like JIT having its own config provider, and not even 3 weeks ago some new interface to VM host get/set property, etc., all of these things are kind of on-the-side dependency add-ons, when getting all of those concerns in place and out of independent static state as described I think would be hugely beneficial. Also, the large amount of new redundancy between the NativeAOT (the EE types, the modified object type, etc.) and the C++ stuff, feels like a waypoint but maybe there are options to eventually coalesce that to a single authoritative source. Who knows :) I guess I was thinking NativeAOT might be the start of some amazing things moving towards the eventual goal of managed-first, with only occasional look-backs to unmanaged helpers as needed. Kind of like the first day when Roslyn could compile Roslyn, then the world moved :)

craigajohnson · 2023-02-03T00:17:45Z

craigajohnson
Feb 3, 2023
Author

*** Isolated VM - Throwaway Monkeying ***

Naive Flailing

Find and fix minor CoreCLR debug doc error
Build clr+libs in debug. Step-debug from corerun main, breakpoints at various steps (e.g., config, assembly entry point, JIT, GC, ETW)
Run CppDepend - Takes 47 minutes on 24 core i12900K - CppCheck may be able to be disabled, unsure. First time using this. I know NDepend but not this. Massive static soup, various anonymous namespaces carved out so as to avoid collisions. Stuff just bolted on, still being bolted on, adding to the pile. Isolation would be massive challenge
If build fails, have to blast artifacts folder before retrying due to partial state. Added clean.cmd to do this
Run all tests. Documentation a bit unclear but back-traced previous steps and it all works:

.\build clr+libs
.\src\tests\build generatelayoutonly /p:LibrariesConfiguration=Debug
.\src\tests\build skipmanaged /p:LibrariesConfiguration=Debug
.\src\tests\build.cmd /p:LibrariesConfiguration=Debug
.\src\tests\run

Naive Monkeying - "pal" is static global state with conditional typedefs, heaven help us all

Appears that corerun.sln and corehost.sln take dependencies on pal. Used for hostpolicy context stuff (TODO - learn)
Attempt to refactor/rename "pal" to "monkeylicious" - VS does good job with rename in both solutions.
Rebuild - success. Retest - success.

Naive Flailing - Templated IsolatedVM class, move pal here as first attempt

Interfaces would be Level 0 dependency - TODO - where to root this?
Ugly singleton to begin with - "v" namespace - eventually flow this dependency to all the things
SxS v.pal with Monkeylicious namespace
auto to instantiate some stuff from v.pal, type safety, no weirdness
Use corerun as first 2-headed beast, migrate a few refs, rebuild, retest full suite

0 replies

craigajohnson · 2023-02-03T23:37:42Z

craigajohnson
Feb 3, 2023
Author

@jkotas is this good perf for a dev workstation for tests in debug configuration?

1 reply

jkotas Feb 4, 2023
Collaborator

Looks reasonable for debug runtime config.

We typically run the tests in checked runtime config that should be significantly faster.

craigajohnson · 2023-02-05T20:18:03Z

craigajohnson
Feb 5, 2023
Author

From https://github.com/dotnet/designs/blob/main/accepted/2020/form-factors.md:

At one point, the goal of .NET Native and CoreRT projects was to replace the established .NET runtime implementation in its entirety. We even had a project for that called Rover -- "Runtime over RedHawk". This goal was proven to be unrealistic. Re-architecting half of the .NET features built over 20 years (with a large team) to run on the nice clean runtime is prohibitively expensive. Executing this endeavor would require slowing down the investment into the mainstream .NET runtime to a trickle. The vast majority of customers would not see any material improvements for number of years. We consider that direction unacceptable.

Gotta say - From initial perusal of this repo, I completely and totally disagree with this and would consider the above almost strangely defeatist. I think there are reasonable ways to build this out, but there would be important prerequisites, the most important being (1) isolation in place of statics, and; (2) continuing the good work that started with GC and JIT, building out all interop between subsystems arising from an instantiated root. From there, the runtime can be hybridized safely, retaining the massive implementations of key subsystems where they are right now, but allowing for migrations/cleanups/ports/replacements as needed. This could go on as long as necessary, with little disruption.

1 reply

jkotas Feb 5, 2023
Collaborator

the most important being (1) isolation in place of statics,

Statics are only problem for subsystems that are not singletons. For example, GC is behind an interface, and it still has a ton of statics.

This could go on as long as necessary, with little disruption.

Yes, it can. It does not change the fact that it is prohibitively expensive. You are talking about refactoring millions of lines of code. If you do the math and estimate how long it would take, you will get a large number of man-years.

AustinWise · 2023-02-10T00:13:22Z

AustinWise
Feb 10, 2023

@craigajohnson If your goal is understand the components of a .NET runtime, I've found the CoreCLR NativeAOT runtime to be much more approachable. It shares the GC and the JIT components with CoreCLR (albeit the JIT is used as an ahead-of-time compiler). But the native runtime part is much smaller and simpler. The layering of the runtime is easier to understand as well.

Components are roughly, from lowest level to higher level:

Native Runtime - C++ - contains things like thread suspension and stack walking
Runtime.Base - C# - contains things like exception dispatch and type casting
System.Private.CoreLib - C# - this version of the CoreLib has C# versions of things that are C++ in regular CoreCLR, like running class constructors and Monitor.Enter.

1 reply

craigajohnson Feb 10, 2023
Author

Cool stuff @AustinWise. NativeAOT source did make a lot of logical sense and felt like it was the result of all the things learned to this point so far. I think I asked whether that may someday become the authoritative source for large portions of the overall CLR.

At this point I want to understand every line both unmanaged and managed (not all of the BCL due to massive size but definitely CLR stuff and fundamentals in System.Private.CoreLib). I probably should have read the BOTR all the way through before making my first comments. I've done that now a couple of times and have step-debugged most of the VM, JIT and the GC (well, on x64 Windows). I made a templated version of corerun and isolated the host pal out of statics. That was a low-hanging fruit but just enough to get familiar with certain aspects. The JIT is extremely well-organized code and very deliberately thought through and is a marvel. The GC is a monument to human achievement and a seriously complicated beast. It will take more time to be able to reason about the intricacies of the GC. Re: VM, the stub infrastructure is awesome. The more I dive in (via step-debugging and then tying back concepts in BOTR) the more things make reasonable sense. But just looking at C++ source with the anonymous soup and cycles and statics was a bit... challenging :)

I wonder if the team has considered ECMA-izing the JIT GenTrees (or whatever is the best "IR" prior to register allocation/other platform-specific concerns). Fascinating many-to-many going from many languages -> IL -> JIT -> IR -> many platforms. Maybe it's just considered an implementation detail. But there is likely something there that is more generally useful.

CoreCLR - 114 projects with mammoth # of cyclic dependencies into projects' global namespace "dumping grounds" #81413

Uh oh!

craigajohnson Jan 31, 2023

Replies: 8 comments · 19 replies

Uh oh!

AaronRobinsonMSFT Jan 31, 2023 Collaborator

Uh oh!

Uh oh!

jkotas Jan 31, 2023 Collaborator

Uh oh!

craigajohnson Jan 31, 2023 Author

Uh oh!

jkotas Feb 1, 2023 Collaborator

Uh oh!

Uh oh!

craigajohnson Feb 1, 2023 Author

Uh oh!

Uh oh!

craigajohnson Jan 31, 2023 Author

Uh oh!

craigajohnson Jan 31, 2023 Author

Uh oh!

Uh oh!

jkotas Jan 31, 2023 Collaborator

Uh oh!

craigajohnson Feb 1, 2023 Author

Uh oh!

jkotas Feb 1, 2023 Collaborator

Uh oh!

craigajohnson Feb 1, 2023 Author

Uh oh!

craigajohnson Jan 31, 2023 Author

Uh oh!

jkotas Jan 31, 2023 Collaborator

Uh oh!

craigajohnson Jan 31, 2023 Author

Uh oh!

jkotas Jan 31, 2023 Collaborator

Uh oh!

craigajohnson Jan 31, 2023 Author

Uh oh!

craigajohnson Feb 1, 2023 Author

Uh oh!

jkotas Feb 2, 2023 Collaborator

Uh oh!

Uh oh!

craigajohnson Feb 2, 2023 Author

Uh oh!

Uh oh!

craigajohnson Feb 3, 2023 Author

Naive Flailing

Naive Monkeying - "pal" is static global state with conditional typedefs, heaven help us all

Naive Flailing - Templated IsolatedVM class, move pal here as first attempt

Uh oh!

craigajohnson Feb 3, 2023 Author

Uh oh!

jkotas Feb 4, 2023 Collaborator

Uh oh!

Uh oh!

craigajohnson Feb 5, 2023 Author

Uh oh!

jkotas Feb 5, 2023 Collaborator

Uh oh!

AustinWise Feb 10, 2023

Uh oh!

craigajohnson Feb 10, 2023 Author

craigajohnson
Jan 31, 2023

Replies: 8 comments 19 replies

AaronRobinsonMSFT
Jan 31, 2023
Collaborator

jkotas Jan 31, 2023
Collaborator

craigajohnson Jan 31, 2023
Author

jkotas Feb 1, 2023
Collaborator

craigajohnson Feb 1, 2023
Author

craigajohnson
Jan 31, 2023
Author

craigajohnson Jan 31, 2023
Author

jkotas Jan 31, 2023
Collaborator

craigajohnson Feb 1, 2023
Author

jkotas Feb 1, 2023
Collaborator

craigajohnson Feb 1, 2023
Author

craigajohnson
Jan 31, 2023
Author

jkotas Jan 31, 2023
Collaborator

craigajohnson Jan 31, 2023
Author

jkotas Jan 31, 2023
Collaborator

craigajohnson Jan 31, 2023
Author

craigajohnson
Feb 1, 2023
Author

jkotas Feb 2, 2023
Collaborator

craigajohnson Feb 2, 2023
Author

craigajohnson
Feb 3, 2023
Author

craigajohnson
Feb 3, 2023
Author

jkotas Feb 4, 2023
Collaborator

craigajohnson
Feb 5, 2023
Author

jkotas Feb 5, 2023
Collaborator

AustinWise
Feb 10, 2023

craigajohnson Feb 10, 2023
Author