Skip to content

Conversation

@ablaom
Copy link
Member

@ablaom ablaom commented Jul 22, 2025

The new package MLJModelRegistryTools.jl is a rewrite of the developer tools currently embedded in MLJModels to manage the MLJ Model Registry. These tools are less brittle and better documented, and remove the burden of MLJModels compiling code the MLJ user never needs. This PR removes the old tools.

Merge after:

Notes:

  • The registry itself (Project.toml and Metadata.toml) is still hosted by MLJModels, which provides the user tools to search the registry and load model code, and the location "/src/registry" remains the same. (I had toyed with hosting the registry as pkg Artifact, but, after some online discussions, dismissed the plan as overly complicating maintenance.)

  • This PR should not break anything for MLJ users, but it might inadvertently breaks a few things for developers (one is noted next).

  • (breakage) One short developer method, registry_project(), has been kept, but now lives at "/src/registry_project.jl". This returns the lines of the registry Project.toml file and is used in integration testing at MLJ. It is now accessed via MLJModels.registry_path instead of MLJModels.Registry.registry_path which means this line at MLJ integration tests will need to be changed.

  • (breaking??) Previously the model registry file Metadata.toml, essentially a dictionary, was keyed on the package providing core algorithms (e.g, "DecisionTree") but it is now keyed on the package implementing the MLJ interface (e.g., "MLJDecisionTreeInterface"). This was far more natural because the Project.toml file used to generate the metadata has the API packages as its dependencies. This PR makes a small change in "/src/metadata.jl" to adapt MLJModel's interpretation of Metadata.toml appropriately.

  • The previous registry maintenance tools used a method called info_dict which is not needed elsewhere. Unfortunately its tests were integrated with the testing of some of the models currently provided by MLJModels. In a follow up PR, most of these models are migrating out, as they have been ported to the new package MLJTransforms.jl. So here I have just removed those tests altogether, or in a few cases modified them.

  • The "serialisation" of the metadata now happens at MLJModelRegistryTools.jl, so encode_dic is gone. The testing of the deserialisation method decode_dic now also happens at MLJModelRegistryTools.jl, so the testing here has been removed.

  • Previously we checked the model registry with a GitHub action "/.github/workflows/check_registry.yml" that attempts to load the code for every model in the registry. I have moved this testing into runtests.jl, and improved them to check @load instead of directly checking load_path. These tests now run when a new ENV variable MLJ_TEST_REGISTRATION is set to "true", which I've added to ci.yml. These take a while, so it's kind of a pain to run them locally every time. Note also, that every time MLJModelRegistryTools.jl is used to update the registry, similar checks happen automatically.

  • I removed MLJTransforms from the registry as a temporary measure to address some load_path issues still being resolved. This doesn't break anything because those additions have not yet been tagged in a new MLJModels release

  • I deleted some long dead orphaned code for GaussianProcesses.jl

@codecov
Copy link

codecov bot commented Jul 22, 2025

Codecov Report

Attention: Patch coverage is 85.71429% with 2 lines in your changes missing coverage. Please review.

Project coverage is 85.33%. Comparing base (38cc0c9) to head (53892ae).
Report is 13 commits behind head on dev.

Files with missing lines Patch % Lines
src/registry_project.jl 0.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##              dev     #590      +/-   ##
==========================================
+ Coverage   75.58%   85.33%   +9.75%     
==========================================
  Files          17        9       -8     
  Lines        1147      914     -233     
==========================================
- Hits          867      780      -87     
+ Misses        280      134     -146     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@ablaom ablaom mentioned this pull request Jul 22, 2025
8 tasks
Copy link
Member

@OkonSamuel OkonSamuel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks really good. You deleted lot of unused code.

@ablaom
Copy link
Member Author

ablaom commented Jul 25, 2025

@OkonSamuel Thank you for the review, it is much appreciated.

@ablaom ablaom merged commit c35207e into dev Jul 25, 2025
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants