-
-
Notifications
You must be signed in to change notification settings - Fork 5.7k
Implement mechanism for strict mode
#60180
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: kf/syntaxevolution
Are you sure you want to change the base?
Conversation
# Motivation There are several corner cases in the Julia syntax that are essentially bugs or mistakes that we'd like to possibly remove, but can't due to backwards compatibility concerns. Similarly, when adding new syntax features, there are often cases that overlap with valid (but often nonsensical) existing syntax. In the past, we've mostly done judegement calls of these being "minor changes", but as the package ecosystem grows, so does the chance of someone accidentally using these anyway and our "minor changes" have (subjectively) resulted in more breakages recently. Fortunately, all the recent work on making the parser replacable, combined with the fact that JuliaSyntax already supports parsing multiple revisions of Julia syntax provides a solution here: Just let packages declare what version of the Julia syntax they are using. That way, packages would not break if we make changes to the syntax and they can be upgraded at their own pace the next time the author of that particular package upgrades to a new julia version. # Core mechanism The way this works is simple. Right now, the parser function is always looked up in `Core._parse`. With this PR, it is instead looked up as `mod._internal_julia_parse` (slightly longer name to avoid conflicting with existing bindings of the name in downstream packages), or `Core._parse` if no such binding exists. Similar for `_lower`. There is a macro `@Base.Experimental.set_syntax_version v"1.xx"` that will set the `_internal_julia_parse` (and inte the future the _lower version) to one that propagates the version to the parser, so users are not expected to manipulate the binding directly. # Versioned package loading The loading system is extended to look at a new `syntax.julia_version` key in Project.toml (and Manifest for explicit environments). If no such key exists, it defaults to the minimum allowed version of the Julia compat. If no compat is defined, it defaults to the current Julia version. This is technically slightly less backwards compatible than defaulting this to Julia 1.13, but I think it will be less suprising in the future for the default syntax to match what is in the REPL. Most julia packages do already define a julia compat. Note that as a result of this, the code for parse compat ranges moves from Pkg to Base. # Syntax changes This introduces two parser changes: 1. `@VERSION` (and similar macrocall forms of a macro named `VERSION`) are now special and trigger the parser to push its version information into the source location field of the macrocall. Note that because this is in the parser, this affects all macros with the name. However, there is also logic on the macrocall side that discards this again if the macro cannot accept it. This special mechanism is used by the `Base.Experimental.@VERSION` macro to let users detect the parse version. 2. The `module` syntax form gains a syntax version argument that is automatically populated with the parser's current version. This is the mechanism to propagate syntax information from the parser to the core mechanism above. Note that these are only active if a module has opted into 1.14 syntax, so macros that process `:module` exprs will not see these changes unless and until the calling module opts into 1.14 syntax via the above mentioned mechanisms (which is the primary advantage of this scheme). # Final words I should emphasize that I'm not proposing using this for any big syntax revolutions or anything. I would just like to start cleaning up a few corners of the syntax that I think are universally agreed to be bad but that we've kept for backwards compatibility. This way, by the time we get around to making a breaking revision, our entire ecosystem will have already upgraded to the new syntax.
This implements support for the `strict mode` mechanism proposed in #54903. There was a desire to see an implementation of this along side #60018 since that are conceptually adjacent. However, other than some shared infrastructure, this version is farther from #60018 than the original proposal for reasons mentioned below. The motivation and basic idea are largely unchanged. # Implemented semantics The user facing interface to this mechanism is `@strict` (currently in `Base.Experimental`, but expected to be moved out in the course of the 1.14 release process. Rather than paraphrase, let me just provide the first part of the docstring that describes the semantics: ``` Set the `strict` mode for the current module to all flags specified in `flags`. Each `strict` mode flag is an independent option that disallows certain syntax or semantics that may be undesirable in *some* contexts. Package authors should consider which flags are appropriate for their package and set them at the top of the applicable module. # General semantics and philosophy Note that additional strict mode flags are not necessarily *safer* or *better* in any way; they are a reflection of of the reality that different users have different tradeoffs for their codebases and what may be a sensible restriction in a mature package could be very annoying in the REPL. As designed, there are several general guidelines that apply to strict mode flags. To the extent possible, they should be kept for future flag additions. 1. Code that evaluates without error in strict mode should also evaluate without error under the ordinary julia execution semantics. 2. Strict mode should not affect parsing. If it is desirable to disallow a particular syntax pattern, it should be recognized at the lowering stage. If this is currently not possible, the parser should be modified to emit an appropriate marker that can be checked at lowering time. 3. Strict mode is not intended for for issues that are clearly bugs. Those should instead use the syntax versioning mechanism (see [`Base.Experimental.@set_syntax_version`](@ref)). However, `strict` mode flags that gain widespread adoption may eventually be considered as candidates for syntax evolution. Strict mode flags are automatically inherited by submodules, but can be overriden by an explicit `@strict` invocation in the submodule. Strict mode flags are partitioned by world age. # Specifying flags The `flags` expression is runtime-evaluated and should evaluate to a collection of `Symbols` as specified below. In addition, nested collections of symbols are allowed and will be flattened. This is intended to support specifying strict mode flags in a central location and enforcing them across multiple dependents. ``` # Module vs Project.toml opt-ins Of particular note is that I ended up deciding on a per-module opt-in rather than a Project.toml opt in (like was originally proposed in #54903, and is implemented for syntax evolution in #60018). This is for the following reasons: 1. The #60018 experience has shown that project.toml opt ins are semantically somewhat awkward and need to be implemented both in the language and the package manager. This was fine for the syntax version, but strict mode is richer (and potentially much richer in the future) and adding this complexity into code loading seems undesirable. 2. One of the design objectives is to allow user-defined collections of strict mode flags enforced centrally across multiple packages. In this design this is easy by having a MyOrganizationBase package that defines a variable with the set of flags to enable. Doing something like this in Project.toml opens a whole can of worms on how to represent that. 3. I believe the concern about wanting to enable parse-time strict mode can be adequately addressed by having the parser emit a special marker that can then get picked up and checked against the strict mode by lowering (such marker addition possibly making use of the syntax evolution mechanism). If this is not how it works, the parser would need additional input state specifying the strict mode flags. #60018 has shown that changing parser state flags dynamically is undesirable, because people don't have a good sense of what the parse unit is. As such, I don't want the parser to look at strict mode flags at all. 4. As implemented here `@strict` inherits binding-world-age semantics. Since these are now well defined as of 1.12, this addresses a lot of the ordering concerns that were brought up in the discussion of #54903. 5. I think it may be useful to opt into certain strict mode flags for some modules in a package only (unlike the syntax version, where I don't expect this to be common). E.g. packages may define modules that define their core API or segregate their core algorithms from support code and may want more strict coding styles for such core modules. There remains a bit of a concern that this is less friendly to IDEs. I'm sympathetic to that, but the analysis required to compute the strict mode is a lot simpler than other analyses (so a language server should easily be able to do that) and I think it's outweighed particularly by the desire for user-definable collections (which requires the IDE to do some sort of analysis of however that is specified anyway). Given that, I think this mechanism is as IDE friendly as it gets, since the required capability is simply to compute the value of a constant obtained from a macro expansion (so no special strict-mode specific analysis required). # Implementation details The core of the implementation is simple, to determine the active strict mode flags, we simply look up the `_internal_module_strict_flags` binding in the appropriate module and see which flags are set. The exact types and values of this binding are explicitly and intentionally implementation details and `@strict` decides how to set it. This is inteded to allow flexibility of implementation in the future here. To faciliate the above described semantics of `@strict`, this binding has a couple of special features: 1. It gets automatically imported from a module's parent module upon module creation. 2. Unlike bindings created through syntax, invalidations from imports to `const` is permitted. Otherwise the mechanism behaves as an ordinary binding, including obeying world age semantics, and being Revise-able, etc. In particular, if you Revise the `@strict` setting in a top-level module it will automatically (through binding invalidation) be updated in all submodules. It was important to me that doing this would not leave the settings inconsistent. # Implemented strict mode flags Two strict mode flags are implemented in this PR, but they should largely be considered straw-man implementations to show how to access the flags set from within either the runtime or lowering. Everything works end-to-end, but we may want to do some extra work refining the precise semantics of these flags once we've merged the core mechanism. The reason for choosing these flags is simply that they were easy to implement. Several of the other proposed flags would require additional analysis in lowering, which should go in their own PRs. Implemented flags are as follows: ``` * `typeimports` This flag turns the 1.12 warning for implicit import of types into an error. Note that the implicit import default may be removed in a future Julia syntax iteration, in which case this flag will become a no-op for such versions. ```jldoctest julia> @Base.Experimental.strict :typeimports julia> String(x) = 1 ERROR: `@strict :typeimports` disallows extending types without explicit import in TypeImports: function Base.String must be explicitly imported to be extended ``` * `:nointliteraliterators` Disallows (at the lowering stages) literal integers as iterators in `for` loops. This protects against expressions like `for i in 10` which are commonly intended to be `for i in 1:10`. ```jldoctest julia> for i in 10 println(i) end 10 julia> @Base.Experimental.strict :nointliteraliterators julia> for i in 10 println(i) end ERROR: syntax: `@strict :nointliteraliterators` disallows integer literal iterators here around none:1 Stacktrace: [1] top-level scope @ none:1 ```
|
This was discussed extensively in triage. Summary of my best recollection: @StefanKarpinski expressed skepticism at fine grained opt-ins, preferring a single opt in into the "good version" of the language. My thoughts on the matter are that things are not that simple and different user groups do have differing requirements. @mlechu objected to too many fine-grained opt-ins on interpretability grounds. It seemed universally agreed that we should prefer to pick correct defaults and use syntax evolution to enforce them rather than making opt-ins for too many syntax cases. However, some opt-ins proposed in #54903 (like Based on this, there was a significant discussion on the difference between a lint and a mandatory policy enforcement mechanism like this. The general conclusion was that there is significant overlap. In the end, it seemed like there was general agreement that it didn't make sense to both have a single level strict mode as well as a fine grained lint (anything clearly bad should just be syntax evolution, anything else too subjective to be grouped). Major concerns were about the accuracy of a lint (in order to lint correctly, you essentially need to evaluate, but JETLS is doing this anyway) and that casual contributors to a package would not be running the lint by default causing these issues to only be caught in CI. There was a general discussion of which sorts of enforcements are appropriate for this mechanism. The general conclusion was that this mechanism is appropriate for information lexically determinable at the module level, but not for things that would require any sort of dynamic introspection of other modules. Some flexibility on the term "lexically determinable" there. There was a question of whether this must affect the runtime or could be confined to lowering. As implemented in this PR, the runtime is querying the module flags. However, we think that it's feasible to confine this to lowering (possibly by adding additional flags to runtime intrinsics and having lowering emit those). As a follow up then, the suggestion was made to use the #60018 lowering-evolution mechanism to allow a package to substitute in a different lowering entry point to perform this enforcement (either by calling ordinary JuliaLowering with a flag or by performing a pre- or post-processing pass). This mechanism was thought to be more scalable, since it could potentially extend to analyses too complicated to live in base. A related concern was that similar to concerns in #59995 the existence of this mechanism in base would suggest to package developers that they are expected to use it as opposed to being a tool that makes the language more annoying (but potentially saves some bugs). The proposed path forward is thus:
|
ade1dc4 to
2b907dd
Compare
This implements support for the
strict modemechanism proposed in #54903. There was a desire to see an implementation of this along side #60018 since that are conceptually adjacent. However, other than some shared infrastructure, this version is farther from #60018 than the original proposal for reasons mentioned below. The motivation and basic idea are largely unchanged.Implemented semantics
The user facing interface to this mechanism is
@strict(currently inBase.Experimental, but expected to be moved out in the course of the 1.14 release process. Rather than paraphrase, let me just provide the first part of the docstring that describes the semantics:Module vs Project.toml opt-ins
Of particular note is that I ended up deciding on a per-module opt-in rather than a Project.toml opt in (like was originally proposed in #54903, and is implemented for syntax evolution in #60018). This is for the following reasons:
The Provide mechanism for Julia syntax evolution #60018 experience has shown that project.toml opt ins are semantically somewhat awkward and need to be implemented both in the language and the package manager. This was fine for the syntax version, but strict mode is richer (and potentially much richer in the future) and adding this complexity into code loading seems undesirable.
One of the design objectives is to allow user-defined collections of strict mode flags enforced centrally across multiple packages. In this design this is easy by having a MyOrganizationBase package that defines a variable with the set of flags to enable. Doing something like this in Project.toml opens a whole can of worms on how to represent that.
I believe the concern about wanting to enable parse-time strict mode can be adequately addressed by having the parser emit a special marker that can then get picked up and checked against the strict mode by lowering (such marker addition possibly making use of the syntax evolution mechanism). If this is not how it works, the parser would need additional input state specifying the strict mode flags. Provide mechanism for Julia syntax evolution #60018 has shown that changing parser state flags dynamically is undesirable, because people don't have a good sense of what the parse unit is. As such, I don't want the parser to look at strict mode flags at all.
As implemented here
@strictinherits binding-world-age semantics. Since these are now well defined as of 1.12, this addresses a lot of the ordering concerns that were brought up in the discussion of Addstrictmechanism for opting into stricter subsets of the language #54903.I think it may be useful to opt into certain strict mode flags for some modules in a package only (unlike the syntax version, where I don't expect this to be common). E.g. packages may define modules that define their core API or segregate their core algorithms from support code and may want more strict coding styles for such core modules.
There remains a bit of a concern that this is less friendly to IDEs. I'm sympathetic to that, but the analysis required to compute the strict mode is a lot simpler than other analyses (so a language server should easily be able to do that) and I think it's outweighed particularly by the desire for user-definable collections (which requires the IDE to do some sort of analysis of however that is specified anyway). Given that, I think this mechanism is as IDE friendly as it gets, since the required capability is simply to compute the value of a constant obtained from a macro expansion (so no special strict-mode specific analysis required).
Implementation details
The core of the implementation is simple, to determine the active strict mode flags, we simply look up the
_internal_module_strict_flagsbinding in the appropriate module and see which flags are set. The exact types and values of this binding are explicitly and intentionally implementation details and@strictdecides how to set it. This is inteded to allow flexibility of implementation in the future here.To faciliate the above described semantics of
@strict, this binding has a couple of special features:constis permitted.Otherwise the mechanism behaves as an ordinary binding, including obeying world age semantics, and being Revise-able, etc. In particular, if you Revise the
@strictsetting in a top-level module it will automatically (through binding invalidation) be updated in all submodules. It was important to me that doing this would not leave the settings inconsistent.Implemented strict mode flags
Two strict mode flags are implemented in this PR, but they should largely be considered straw-man implementations to show how to access the flags set from within either the runtime or lowering. Everything works end-to-end, but we may want to do some extra work refining the precise semantics of these flags once we've merged the core mechanism. The reason for choosing these flags is simply that they were easy to implement. Several of the other proposed flags would require additional analysis in lowering, which should go in their own PRs. Implemented flags are as follows: