-
Notifications
You must be signed in to change notification settings - Fork 28
Use Pydantic schemas to validate Mbed's JSON files (part 1) #516
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
| "target.c_lib": "small", | ||
|
|
||
| "target.application-profile": "bare-metal", | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on my reverse engineering of the code, using overrides and target_overrides at the same time used to be unsafe (though this should be fixed now)
| { | ||
| "name": "rtos", | ||
| "config": { | ||
| "present": 1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removing these legacy config defines everywhere.
| @@ -1,4 +1,3 @@ | |||
| { | |||
| "name": "cordio-stm32wb", | |||
| "requires": ["cordio", "ble"] | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am removing all remaining vestiges of this "requires" system in this PR. This dates back from Mbed CLI 1 and could be used to specify dependencies between libraries. Now, library dependencies should be, and are, specified via CMake, and this system was serving only to exclude non-required libraries' mbed_lib.json5 files from processing if the user added "requires" to their mbed_app.json5 file.
This does mean that users no longer have the option to speed up JSON processing / reduce the number of MBED_CONF_xxx defines via using requires, but this option was not used by default and I am quite doubtful that anyone knew it even worked in Mbed CLI 2.
| "color-theme": { | ||
| "help": "Set color theme. 0 for readable, 1 for unobtrusive.", | ||
| "options": [0, 1], | ||
| "accepted_values": [0, 1], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I could tell, there was no accepted way to specify the range of legal values for an option until now. So, people added stuff like this as "wishful thinking" / for documentation only (remember this was json so you couldn't have comments).
This PR turns this wish into a reality with the new accepted_values, value_max, and value_min options.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no idea what this file was for as it's in an empty folder, I removed it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made heavy changes to this file in this PR in order to clean up over a decade worth of deprecated stuff that has been happily ignored by the modern configuration system (until now!). I will leave comments on each thing explaining why it has been removed.
| "extra_labels": [], | ||
| "supported_form_factors": [], | ||
| "components": [], | ||
| "is_disk_virtual": false, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not used anymore (even in Mbed CLI 1) per https://os.mbed.com/docs/mbed-os/v6.16/program-setup/adding-and-configuring-targets.html
| @@ -1,37 +1,22 @@ | |||
| { | |||
| "Target": { | |||
| "core": null, | |||
| "trustzone": false, | |||
| "default_toolchain": "ARM", | |||
| "supported_toolchains": null, | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mbed CE has only 1 supported toolchain at present, GCC_ARM
| "macros": [], | ||
| "device_has": [], | ||
| "features": [], | ||
| "detect_code": [], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| "public": false, | ||
| "c_lib": "std", | ||
| "bootloader_supported": false, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mbed CE only supports MCUBoot-based bootloaders that the user builds themselves, not the old precompiled bootloaders (which is what I think this is for). Anyway, whether the bootloader is supported depends on mcuboot support for it and whether the linker script has support, not anything related to targets.json5.
| "public": false, | ||
| "c_lib": "std", | ||
| "bootloader_supported": false, | ||
| "static_memory_defines": true, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No idea what this was even for...
| "tfm_bootloader_supported": "", | ||
| "tfm_default_toolchain": "ARMCLANG", | ||
| "tfm_supported_toolchains": null, | ||
| "tfm_delivery_dir": "", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lots of TFM targets have these attributes, but I did some digging and could not find anything in the python config scripts that uses them, so they must all have been moved to CMake.
| message(FATAL_ERROR | ||
| "The full profile is not supported for this Mbed target") | ||
| endif() | ||
| endfunction() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really see the value in having this check, personally. I don't understand how there could be a target that would support only RTOS and not baremetal, because they basically share the same init process, just that when using the RTOS we start the RTOS from the bare-metal main function. And if there is a target that doesn't have enough RAM/flash to cleanly support RTOS, then that's something the user can discover fairly easily through linker script errors and/or the post-build output (and they can probably squeeze RTOS on there if they want via stuff like the small C library, toolchain updates, etc). So I don't want to artificially prevent anyone from using baremetal or RTOS if they want to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I remember something, try to look at this - ARMmbed#13099
| get_property(FINALIZE_BUILD_CALLED GLOBAL PROPERTY MBED_FINALIZE_BUILD_CALLED SET) | ||
| if("${FINALIZE_BUILD_CALLED}") | ||
| message(WARNING "Mbed: Deprecated: mbed_finalize_build() is now automatically called, so you don't need to call it in CMakeLists.txt") | ||
| return() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixing the issue that @zhiyong-ft ran into where this would make the build error due to generating the same file twice.
| app_data = source.from_file( | ||
| mbed_app_file, default_name="app", target_filters=FileFilterData.from_config(config).labels | ||
| ) | ||
| _get_app_filter_labels(app_data, config) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This logic gets a fair bit simpler because of removing requires: :D
| return parsed_file | ||
| except ValueError as json5_ex: | ||
| logger.error(f"Failed to decode JSON5 data in the file located at '{path}': {json5_ex!s}") | ||
| raise json5_ex from None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is based on an idea that @VictorWTang had some time ago. pyjson5's error messages suck, but it's way faster. So, we can use pyjson5 to parse, but if it fails, we fall back to reparsing with json5. This gets the best possible user experience at the cost of a little more complexity.
| for attr_name in NON_INHERITED_ATTRIBUTES: | ||
| if attr_name != "inherits": | ||
| if attr_name in target_data_as_dict: | ||
| target_attributes[attr_name] = target_data_as_dict[attr_name] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Making a minor behavior change here. Previously, non-inherited attributes are not copied to the result at all, and this causes some minor issues for my target website generator (which wants to know if a target is public or not). Adding this code fixes the issue.
| for existing_element, element in combinations_to_check | ||
| if _element_matches(element, existing_element) | ||
| if (attribute_name == "macros" and _macros_element_matches(element, existing_element)) | ||
| or (element == existing_element) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Making a minor fix here. Previously the special logic for the "macros" field was used for all removals. This seems like it's generally undesired, so I made it only apply to this field.
| self.assertEqual(_extract_target_attributes(all_targets_data, "Target_1", True), {"attribute1": "something"}) | ||
|
|
||
|
|
||
| class TestGetTargetAttributes(TestCase): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really detest (heh) tests like this, as it basically is just checking that one function calls a bunch of other functions, without looking at the actual logic or data. This kind of test is extremely fragile (as it can easily be broken by benign changes like updating a data type or refactoring code into a new function), and yet also is pretty unlikely (IMO) to catch bugs as it relies on the test writer having a perfect understanding of what each function being mocked does. In a language that's as easy to read and write as Python, a test as simple as "does X function call Y function and Z function" should just be done via code review.
…s now being used)
Summary of changes
Ever since I started using Mbed, one of the biggest pain points has been working with JSON configuration files. Mbed uses these files as its configuration system, and they are used to store information such as the list of known targets and their properties, the options that can be configured for each macro and library, and the memory bank info for each target. This system isn't bad in concept, and basically any project as large and configurable as Mbed OS needs something like it.
However, the implementation has never been super solid: Mbed relies on python scripts that load tons of these JSON files, combine them together and munge the data in various ways into a single dict with every concievable setting in it, and then pass that dict off to the code that generates build flags and other configuration (which used to also be in python, but is now in CMake). This implementation meant that the valid properties for each JSON file, and what effect that they'd have within the configuration system, was in large part a mystery to everyone who did not directly work on the Mbed build system.
ARM did at least make some attempt to document the valid settings, but looking at the real JSON files shows that there were many more undocumented ones that are not mentioned in those pages. When you factor in that the way the config system merges JSONs means that target JSON attributes can be put in mbed_app.json and will "work" (though they might override other attributes!), and the fact that it has never issued warnings for unknown/unrecognized attributes, AND the fact that all of this stuff was JSON until recently and didn't allow comments, you get... a system made almost radioactive by a decade of cruft that no one wants to change for fear of breaking something. Even to relatively seasoned users, like me a few years ago, there were plenty of JSON settings that seemed like borderline magic incantations -- you put em in and something happens but you have no idea why,
Well, I am here to tell you that this ends today. Well, mostly. I've had the pleasure of using pydantic at my day job, and it's a super cool library -- it basically lets you define a schema for structured data as a python class, and then use that schema to parse, validate, and dump data. Me, and @VictorWTang as well, thought that this would be a great use for it.
For this PR, I read through many, many existing JSON files as well as nearly all the existing config code, and reverse engineered a schema for all of Mbed's JSON files. This schema should cover the large, large majority of existing use cases for these files, but removes all the legacy attributes which haven't had meaning since Mbed CLI 1 was being used. It also provides, at last, real documentation for every single legal field of each JSON file (right now it's in the form of a Python class, but we also have options to convert this to a JSON schema and, from there, markdown docs if we'd like to). Right now, the schema is being enforced for all mbed_lib.json and targets/custom_targets.json files, as these are mostly used only within Mbed, mbed_app.json, meanwhile, is validated against the schema but validation errors only are treated as a warning. This way, compatibility will not be broken for projects using mbed_app.json in unexpected ways.
For this PR, I did some refactoring of how the configuration is processed internally, mostly to keep things in the pydantic-model format instead of a dict where it makes sense. However, I did not change the fundamental method used to generate the final configuration (stuffing everything into a single god dict). This was both out of fear of making breaking changes, and because I didn't want to break mbed_app.json compatibility (since, as I said, this file was the total Wild West that could override any configuration setting). I called this PR "part 1" because eventually (years down the road), I'd like to go through and replace the god dictionary (
config.Config) with a proper data model that stores each thing individually. But that will have to wait until users have gotten used to the new schema rules (and we've made any changes to the schema that we end up wanting!).Oh, also, since I was in the guts of this code anyway, I took the chance to conquer one of the smaller evils of Mbed programming: the lack of any naming standard for config settings. I am defining here and now that all settings shall be in lowercase-skewer-case, and may not contain uppercase letters or underscores. Mbed will now print a warning if it sees an underscore in a setting name, and transform it into a hyphen. This means it's no longer possible to screw up your configuration by writing
target.application_profileinstead oftarget.application-profileand the like. This removes one of the easiest ways to make config mistakes and the biggest thing that kept me from remembering these names without having to check each time.Impact of changes
overridesandtarget_overridesin the same JSON file would conflict with each other. This could cause settings that were previously not being applied to now be applied.Migration actions required
Documentation
Pull request type
Test results