Skip to content

Conversation

@multiplemonomials
Copy link
Collaborator

@multiplemonomials multiplemonomials commented Nov 16, 2025

Summary of changes

Ever since I started using Mbed, one of the biggest pain points has been working with JSON configuration files. Mbed uses these files as its configuration system, and they are used to store information such as the list of known targets and their properties, the options that can be configured for each macro and library, and the memory bank info for each target. This system isn't bad in concept, and basically any project as large and configurable as Mbed OS needs something like it.

However, the implementation has never been super solid: Mbed relies on python scripts that load tons of these JSON files, combine them together and munge the data in various ways into a single dict with every concievable setting in it, and then pass that dict off to the code that generates build flags and other configuration (which used to also be in python, but is now in CMake). This implementation meant that the valid properties for each JSON file, and what effect that they'd have within the configuration system, was in large part a mystery to everyone who did not directly work on the Mbed build system.

ARM did at least make some attempt to document the valid settings, but looking at the real JSON files shows that there were many more undocumented ones that are not mentioned in those pages. When you factor in that the way the config system merges JSONs means that target JSON attributes can be put in mbed_app.json and will "work" (though they might override other attributes!), and the fact that it has never issued warnings for unknown/unrecognized attributes, AND the fact that all of this stuff was JSON until recently and didn't allow comments, you get... a system made almost radioactive by a decade of cruft that no one wants to change for fear of breaking something. Even to relatively seasoned users, like me a few years ago, there were plenty of JSON settings that seemed like borderline magic incantations -- you put em in and something happens but you have no idea why,

Well, I am here to tell you that this ends today. Well, mostly. I've had the pleasure of using pydantic at my day job, and it's a super cool library -- it basically lets you define a schema for structured data as a python class, and then use that schema to parse, validate, and dump data. Me, and @VictorWTang as well, thought that this would be a great use for it.

For this PR, I read through many, many existing JSON files as well as nearly all the existing config code, and reverse engineered a schema for all of Mbed's JSON files. This schema should cover the large, large majority of existing use cases for these files, but removes all the legacy attributes which haven't had meaning since Mbed CLI 1 was being used. It also provides, at last, real documentation for every single legal field of each JSON file (right now it's in the form of a Python class, but we also have options to convert this to a JSON schema and, from there, markdown docs if we'd like to). Right now, the schema is being enforced for all mbed_lib.json and targets/custom_targets.json files, as these are mostly used only within Mbed, mbed_app.json, meanwhile, is validated against the schema but validation errors only are treated as a warning. This way, compatibility will not be broken for projects using mbed_app.json in unexpected ways.

For this PR, I did some refactoring of how the configuration is processed internally, mostly to keep things in the pydantic-model format instead of a dict where it makes sense. However, I did not change the fundamental method used to generate the final configuration (stuffing everything into a single god dict). This was both out of fear of making breaking changes, and because I didn't want to break mbed_app.json compatibility (since, as I said, this file was the total Wild West that could override any configuration setting). I called this PR "part 1" because eventually (years down the road), I'd like to go through and replace the god dictionary (config.Config) with a proper data model that stores each thing individually. But that will have to wait until users have gotten used to the new schema rules (and we've made any changes to the schema that we end up wanting!).

Oh, also, since I was in the guts of this code anyway, I took the chance to conquer one of the smaller evils of Mbed programming: the lack of any naming standard for config settings. I am defining here and now that all settings shall be in lowercase-skewer-case, and may not contain uppercase letters or underscores. Mbed will now print a warning if it sees an underscore in a setting name, and transform it into a hyphen. This means it's no longer possible to screw up your configuration by writing target.application_profile instead of target.application-profile and the like. This removes one of the easiest ways to make config mistakes and the biggest thing that kept me from remembering these names without having to check each time.

Impact of changes

  • mbed_lib.json5 and custom_targets.json5 files (if they exist in your project) are now required to validate against their schemas
  • mbed_app.json5 is also validated against the schema, but only a warning is printed if validation fails. If you are getting this warning and think that your mbed_app.json should be OK, please contact us ASAP as this warning will become an error in a future version.
  • Bug fixed where declaring overrides and target_overrides in the same JSON file would conflict with each other. This could cause settings that were previously not being applied to now be applied.
  • Config settings with underscores are now converted to use hyphens. This means that if you had misspelled settings in your JSON files that weren't getting applied, they may now be applied
    • Also, if you defined two different config settings which differed only by hyphen vs underscore in name (in which case I hate you!), that will break completely as they will be considered the same option.

Migration actions required

Documentation


Pull request type

[] Patch update (Bug fix / Target update / Docs update / Test update / Refactor)
[] Feature update (New feature / Functionality change / New API)
[X] Major update (Breaking change E.g. Return code change / API behaviour change)

Test results

[] No Tests required for this change (E.g docs only update)
[X] Covered by existing mbed-os tests (Greentea or Unittest)
[] Tests / results supplied as part of this PR

"target.c_lib": "small",

"target.application-profile": "bare-metal",

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on my reverse engineering of the code, using overrides and target_overrides at the same time used to be unsafe (though this should be fixed now)

{
"name": "rtos",
"config": {
"present": 1,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing these legacy config defines everywhere.

@@ -1,4 +1,3 @@
{
"name": "cordio-stm32wb",
"requires": ["cordio", "ble"]
Copy link
Collaborator Author

@multiplemonomials multiplemonomials Nov 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am removing all remaining vestiges of this "requires" system in this PR. This dates back from Mbed CLI 1 and could be used to specify dependencies between libraries. Now, library dependencies should be, and are, specified via CMake, and this system was serving only to exclude non-required libraries' mbed_lib.json5 files from processing if the user added "requires" to their mbed_app.json5 file.

This does mean that users no longer have the option to speed up JSON processing / reduce the number of MBED_CONF_xxx defines via using requires, but this option was not used by default and I am quite doubtful that anyone knew it even worked in Mbed CLI 2.

"color-theme": {
"help": "Set color theme. 0 for readable, 1 for unobtrusive.",
"options": [0, 1],
"accepted_values": [0, 1],
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I could tell, there was no accepted way to specify the range of legal values for an option until now. So, people added stuff like this as "wishful thinking" / for documentation only (remember this was json so you couldn't have comments).

This PR turns this wish into a reality with the new accepted_values, value_max, and value_min options.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no idea what this file was for as it's in an empty folder, I removed it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made heavy changes to this file in this PR in order to clean up over a decade worth of deprecated stuff that has been happily ignored by the modern configuration system (until now!). I will leave comments on each thing explaining why it has been removed.

"extra_labels": [],
"supported_form_factors": [],
"components": [],
"is_disk_virtual": false,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -1,37 +1,22 @@
{
"Target": {
"core": null,
"trustzone": false,
"default_toolchain": "ARM",
"supported_toolchains": null,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mbed CE has only 1 supported toolchain at present, GCC_ARM

"macros": [],
"device_has": [],
"features": [],
"detect_code": [],
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"public": false,
"c_lib": "std",
"bootloader_supported": false,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mbed CE only supports MCUBoot-based bootloaders that the user builds themselves, not the old precompiled bootloaders (which is what I think this is for). Anyway, whether the bootloader is supported depends on mcuboot support for it and whether the linker script has support, not anything related to targets.json5.

"public": false,
"c_lib": "std",
"bootloader_supported": false,
"static_memory_defines": true,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No idea what this was even for...

"tfm_bootloader_supported": "",
"tfm_default_toolchain": "ARMCLANG",
"tfm_supported_toolchains": null,
"tfm_delivery_dir": "",
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lots of TFM targets have these attributes, but I did some digging and could not find anything in the python config scripts that uses them, so they must all have been moved to CMake.

message(FATAL_ERROR
"The full profile is not supported for this Mbed target")
endif()
endfunction()
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really see the value in having this check, personally. I don't understand how there could be a target that would support only RTOS and not baremetal, because they basically share the same init process, just that when using the RTOS we start the RTOS from the bare-metal main function. And if there is a target that doesn't have enough RAM/flash to cleanly support RTOS, then that's something the user can discover fairly easily through linker script errors and/or the post-build output (and they can probably squeeze RTOS on there if they want via stuff like the small C library, toolchain updates, etc). So I don't want to artificially prevent anyone from using baremetal or RTOS if they want to.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember something, try to look at this - ARMmbed#13099

get_property(FINALIZE_BUILD_CALLED GLOBAL PROPERTY MBED_FINALIZE_BUILD_CALLED SET)
if("${FINALIZE_BUILD_CALLED}")
message(WARNING "Mbed: Deprecated: mbed_finalize_build() is now automatically called, so you don't need to call it in CMakeLists.txt")
return()
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixing the issue that @zhiyong-ft ran into where this would make the build error due to generating the same file twice.

app_data = source.from_file(
mbed_app_file, default_name="app", target_filters=FileFilterData.from_config(config).labels
)
_get_app_filter_labels(app_data, config)
Copy link
Collaborator Author

@multiplemonomials multiplemonomials Nov 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic gets a fair bit simpler because of removing requires: :D

return parsed_file
except ValueError as json5_ex:
logger.error(f"Failed to decode JSON5 data in the file located at '{path}': {json5_ex!s}")
raise json5_ex from None
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is based on an idea that @VictorWTang had some time ago. pyjson5's error messages suck, but it's way faster. So, we can use pyjson5 to parse, but if it fails, we fall back to reparsing with json5. This gets the best possible user experience at the cost of a little more complexity.

for attr_name in NON_INHERITED_ATTRIBUTES:
if attr_name != "inherits":
if attr_name in target_data_as_dict:
target_attributes[attr_name] = target_data_as_dict[attr_name]
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Making a minor behavior change here. Previously, non-inherited attributes are not copied to the result at all, and this causes some minor issues for my target website generator (which wants to know if a target is public or not). Adding this code fixes the issue.

for existing_element, element in combinations_to_check
if _element_matches(element, existing_element)
if (attribute_name == "macros" and _macros_element_matches(element, existing_element))
or (element == existing_element)
Copy link
Collaborator Author

@multiplemonomials multiplemonomials Nov 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Making a minor fix here. Previously the special logic for the "macros" field was used for all removals. This seems like it's generally undesired, so I made it only apply to this field.

@multiplemonomials multiplemonomials changed the title [draft] Use Pydantic schemas to validate Mbed's JSON files (part 1) Use Pydantic schemas to validate Mbed's JSON files (part 1) Nov 17, 2025
self.assertEqual(_extract_target_attributes(all_targets_data, "Target_1", True), {"attribute1": "something"})


class TestGetTargetAttributes(TestCase):
Copy link
Collaborator Author

@multiplemonomials multiplemonomials Nov 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really detest (heh) tests like this, as it basically is just checking that one function calls a bunch of other functions, without looking at the actual logic or data. This kind of test is extremely fragile (as it can easily be broken by benign changes like updating a data type or refactoring code into a new function), and yet also is pretty unlikely (IMO) to catch bugs as it relies on the test writer having a perfect understanding of what each function being mocked does. In a language that's as easy to read and write as Python, a test as simple as "does X function call Y function and Z function" should just be done via code review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants