diff --git a/proposals/p6333.md b/proposals/p6333.md new file mode 100644 index 0000000000000..91defc24a719b --- /dev/null +++ b/proposals/p6333.md @@ -0,0 +1,488 @@ +# CLI and separate compilation + + + +[Pull request](https://github.com/carbon-language/carbon-lang/pull/6333) + + + +## Table of contents + +- [Abstract](#abstract) +- [Problem](#problem) +- [Background](#background) + - [Look-and-feel](#look-and-feel) + - [Bazel rule design](#bazel-rule-design) +- [Proposal](#proposal) +- [Details](#details) + - [Command changes](#command-changes) + - [Compile command](#compile-command) + - [Build command](#build-command) + - [Link command](#link-command) + - [Mapping packaging directives to filenames](#mapping-packaging-directives-to-filenames) + - [Support for other packages](#support-for-other-packages) + - [Disallow ambiguous library names](#disallow-ambiguous-library-names) +- [Example interaction with Bazel](#example-interaction-with-bazel) + - [carbon_library and carbon_binary](#carbon_library-and-carbon_binary) + - [Indirect API exposure](#indirect-api-exposure) + - [Core package rules](#core-package-rules) +- [Future work](#future-work) + - [Caching checked IR, C++ AST, and other possible compile artifacts](#caching-checked-ir-c-ast-and-other-possible-compile-artifacts) +- [Rationale](#rationale) +- [Alternatives considered](#alternatives-considered) + - [Naming of commands and rules](#naming-of-commands-and-rules) + - [Support a full-fledged build system](#support-a-full-fledged-build-system) + - [Don't support packaging directive to filename mappings](#dont-support-packaging-directive-to-filename-mappings) + - [Distribute pre-compiled versions of Core files](#distribute-pre-compiled-versions-of-core-files) + - [Create an explicit mapping from packaging directives to files](#create-an-explicit-mapping-from-packaging-directives-to-files) + + + +## Abstract + +- Change the look-and-feel of the `carbon` compilation command set to use + `compile`, `link`, and `build`. +- Build library-to-file discovery for `Core`, but support it in a general + manner. + +## Problem + +The current command line is still a prototype, and lacks support for regular +use. For example: + +- `carbon compile` produces one object file per input file. When + `--output-file` is specified and there are multiple inputs, the output is + repeatedly overwritten. +- `carbon compile` doesn't provide a trivial way to produce object files for + the prelude. The `carbon_binary` rule is, behind the scenes, separately + compiling all the prelude files individually and doing its own custom + linking with those. +- When writing a small test program (for example "hello world") it would be + nice to have a single command to run to produce a program. Right now, + `carbon compile` and `carbon link` must be used in combination. + +Essentially, we have a decent setup for testing, but not one that's easy to use +in real-world situations. + +## Background + +In C++, `clang++ main.cpp -o program` is a way to produce `program`. This is +trying to reach a similar goal to make it easy to build and test small programs. + +Key commands related to this proposal are `carbon compile`, `carbon clang`, and +`carbon link`. The end result will likely compose multiple command elements in +order to build the output. + +### Look-and-feel + +Note the goal here is to align on look-and-feel of separate compilation. +Although the `carbon` CLI is important to the language, most details aren't +necessary to address through the proposal process. For example, we want to get +flag names right here, but also we wouldn't expect a proposal for flag name +changes. + +### Bazel rule design + +This is a proposal for the command line. Bazel rules are mentioned because it +can help illustrate interactions with build systems. However, this proposal is +not intended to decide Bazel design, and the existing Bazel rules have not been +through the proposal process. + +## Proposal + +Restructure compilation into: + +- `carbon compile`: Take a single input to build, and produce a single output + `.o`. +- `carbon build`: Take multiple inputs in order to produce a linked binary. + - Overlaps with `carbon compile` and `carbon link`. + +These are intended to accept flexible inputs: + +- Support passing in standard C++ file extensions to any of these for + compilation. +- For `carbon build` in particular, it should not be necessary to pass in + `Core` files that are required. + - We will require a correlation between library names inside `Core` and + directory structure. For example, `prelude/types` + [maps to](#mapping-packaging-directives-to-filenames) + `core/prelude/types.carbon`. + - The same strict correlation will be supported for other packages. + +At the end, it should be possible to: + +- Run `carbon build main.carbon` (or even just `carbon build`) with + non-prelude `Core` imports, and get an executable program. +- Have Bazel rules that mix C++ code and Carbon code. For example: + + ```bazel + carbon_library( + name = "foo", + srcs = ["foo.cpp", "foo.impl.carbon"], + apis = ["foo.carbon"], + ) + carbon_binary( + name = "bar", + srcs = ["main.cpp"], + deps = [":carbon_library"], + ) + ``` + +## Details + +### Command changes + +#### Compile command + +The `carbon compile` command is intended to be a straightforward single input, +single output command. Dependencies will be provided through a combination of: + +- Given a package name to directory mapping, a + [filename mapping](#mapping-packaging-directives-to-filenames) based on the + library name. +- Explicitly provided input files. + +As part of supporting a mix of C++ and Carbon files, we will support +`carbon compile foo.cpp` with results similar to `carbon clang -- -c foo.cpp`. + +#### Build command + +The `carbon build` command will be the new, simple way to compile, as a +replacement for `carbon compile`. It will: + +- Load provided files. +- For packages with directory mappings, particularly `Core`, figure out the + right files to load. +- Do something similar to the appropriate series of `carbon compile` + invocations. + - A key divergence is that we should avoid re-checking files that would be + used across multiple `carbon compile` invocations. +- Run the equivalent of `carbon link` over produced inputs. + +While the build command will default to providing an executable program, we may +also want it to be capable of producing `.a` and `.so` files. However, we can +decide whether `carbon build` should be required for these kinds of outputs as +an implementation detail. + +#### Link command + +The `carbon link` command will change to make the following work: + +```sh +carbon compile foo.carbon -o foo.o +carbon link foo.o -o program +``` + +This requires implicit `compile`-equivalent invocations for `Core` files (not +just prelude), and includes their object files in output. It should be possible +to opt out of this, for example so that the Bazel `carbon_binary` rule can use +`carbon link` while also providing its own `Core` object files. However, it +should be on-by-default. + +### Mapping packaging directives to filenames + +When we need a file for a packaging directive: + +- The package name will correspond to a root directory. For example, + `package Core ...` could correspond to `lib/carbon/core/...`. +- The library name will correspond to a path under that, suffixed by + `.carbon`. For example, `package Core library "prelude/types;` could + correspond to `lib/carbon/core/prelude/types.carbon`. + - The default library will use the name `default.carbon`. For example, + `package Core;` could correspond to `lib/carbon/core/default.carbon`. + +Suppose we have some command line `carbon compile a.carbon`, and in `a.carbon`, +it does `import Core library "map";`. This needs to load `core/map.carbon`, and +without parsing every file matching `core/**/*.carbon`. + +In order to achieve this: + +- The `compile` command will have a built-in directory mapping for the `Core` + package, for example to `/usr/lib/carbon/core` (when installed to the `/usr` + prefix). +- The `map` library name will need to match the filename, so + `/usr/lib/carbon/core/map.carbon`. + - Slashes may be provided in the library name, for subdirectories. +- If `map.carbon` has other `Core` imports, they will be recursively loaded + once parsed. + - Note that checking isn't required to process imports from a file. + +#### Support for other packages + +Because we'll build this for Core, it would probably be straightforward to +expose this for other packages, too. So for example, we could support +`--package-path=MyPackage:/my/package` for getting API files. However, that is +secondary to the `Core` behavior, so any support may become more of an +implementation detail for what makes sense. + +#### Disallow ambiguous library names + +For imports which rely on the implicit mapping (not in general), we will +disallow ambiguous library names. This includes: + +- Any name with a period in it. + - For example of the ambiguity, given a file `foo.bar.impl.carbon`, it + would be unclear if this is an `impl` file for library "foo.bar" or one + of multiple `impl` files for library "foo". Under this rule, the library + name "foo.bar" is invalid, so meaning the second interpretation is the + only valid interpretation. +- An explicit `library "default"` string name which can be ambiguous with the + implicit `default` library. + +## Example interaction with Bazel + +### carbon_library and carbon_binary + +The Bazel build rules will expose `carbon compile` and `carbon link` behaviors +in a slightly more Bazel-idiomatic way. For example, given: + +```bazel +carbon_library( + name = "lib", + srcs = ["a.impl.carbon", "b.impl.carbon", "b.carbon"], + apis = ["a.carbon"], +) +carbon_binary( + name = "bin", + srcs = ["main.carbon"], + deps = [":lib"], +) +``` + +The way this will approximately work is: + +- `carbon_library` will have an implicit dependency on a set of `Core` + libraries (such as a build target `//carbon/lang:core`). + - This will have a network of `carbon_library` rules, some of which may + look like `lib`. +- For `lib`: + - Invoke `carbon compile` four times, producing a `.o` file for each + input. + - The API files will be additional inputs to the `impl` file compilations. +- For `bin`: + - Source files will be compiled similarly to `lib`. + - The `deps` means `a.carbon` and `b.carbon` will be additional + inputs, but it should ideally be an error if `b.carbon` is imported + directly. This is required because `a.carbon` can expose `b.carbon` + on the import boundary, meaning an indirect import of `b.carbon` + must work. + - Link object files into an executable. + +It's possible that we may use `carbon build` where `carbon compile` is +mentioned, but if so, it should not make a significant difference in the +user-visible behavior. + +For both, there should be an implicit dependency on the full Core package, not +just the prelude. This is because we want the Core package to be easy to access. + +#### Indirect API exposure + +The `apis` attribute is suggested to support only _direct_ dependencies. For +example: + +```bazel +carbon_library( + name = "a", + apis = ["a.carbon"], +) +carbon_library( + name = "b", + apis = ["b.carbon"], + deps = [":a"], +) +carbon_library( + name = "c", + srcs = ["c.carbon"], + deps = [":b"], +) +``` + +If `c.carbon` imports `a.carbon`, the build should error that `a.carbon` +requires a direct dependency. We should allow forwarding, so that the same could +compile without requiring `c` to have a direct dependency on `a`. This should +look like `exports = [":a"]`, added to `b` (and superseding the need to list +`:a` in `deps`). + +This feature may see frequent use, for example in `Core` to allow writing it as +multiple libraries instead of one large glob. But it's probably also something +that can be delayed a little, because we can just use a big glob and force +direct dependencies. + +#### Core package rules + +In the `core/` directory, we will set up corresponding `carbon_library` rules. +These will need to pass flags to opt-out of normal behaviors, in particular the +dependency on the prelude library. + +## Future work + +### Caching checked IR, C++ AST, and other possible compile artifacts + +As designed, every time any of the `build`, `compile`, or `link` commands are +used, all prelude files and possibly more of the `Core` package will be +re-checked, along with C++ ASTs being reproduced. + +Instead, Carbon could serialize checked IR, store produced C++ ASTs, and so on. +C++ ASTs in particular could be substantially constructed based on parsed Carbon +state, rather than checked Carbon state, allowing more build parallelism. In +distributed or cached build systems, being able to reuse portions of the build +may increase performance. + +The specific build outputs we want to store may substantially affect how we +would set up a build process. The absence of a decision may lead to the +implementation diverging from what's actually needed, meaning parts will be +reimplemented later. This isn't expected to be too high cost. + +There are also ways to improve build performance without taking these steps. +[Clang modules](https://clang.llvm.org/docs/Modules.html) might be used for +improving Clang compile performance without significant support from Carbon. + +For now we will rely on whatever caching Bazel does for the `.a` output of a +`carbon_library`. No other outputs will be made available. That may change, but +leads want to spend our limited development and review time on other features +for the 0.1 milestone. + +## Rationale + +- [Language tools and ecosystem](/docs/project/goals.md#language-tools-and-ecosystem) + - `carbon build` should support easy experimentation with Carbon, and also + small projects. + - Other build support is intended to scale up for larger codebases. +- [Interoperability with and migration from existing C++ code](/docs/project/goals.md#interoperability-with-and-migration-from-existing-c-code) + - The intent is to be able to migrate a CMake, Makefile, or other build at + relatively low cost. An invocation to `clang` can typically be replaced + with `carbon clang`, linking a binary becomes `carbon link`, and so on. + - Similarly, `carbon_library` and `carbon_binary` are important to us for + Bazel support and a migration from `cc_library` and `cc_binary`. + +## Alternatives considered + +### Naming of commands and rules + +For `carbon compile` and `carbon build`, this is trying to split apart concepts. +Some considered alternatives are: + +- Merge `compile`, and possibly also `link`, into `build`. Flags could be used + to differentiate between the versions desired, rather than subcommand names. + - We expect that splitting these apart makes it easier to turn them into + replacements in C++ builds, and easier to understand even in + Carbon-specific builds. +- Have `carbon build` produce `a.out` + - `a.out` is the default output of most C++ compilers, but it reflects a + legacy executable file format. Using the legacy name may reflect + backwards compatibility that Carbon doesn't plan. + - Changing the default output name is probably low-cost, and people will + get used to it. + +### Support a full-fledged build system + +The `build` command as proposed here is intended to be sufficient for quick +testing and simple tools. However, it's not intended to be flexible with custom +rules, plugins, and so on. These are features offered by systems such as CMake +or Bazel. + +Instead, we could provide a full build system. Multiple other languages have +gone in that direction: + +- In Rust, `cargo` combines a + [build system](https://doc.rust-lang.org/cargo/commands/cargo-build.html) + and package manager. +- In Swift, + [SwiftPM](https://www.swift.org/documentation/server/guides/building.html) + provides a similar offering as to `cargo`. +- In Zig, there are + [multiple build system](https://ziglang.org/learn/build-system/) commands. + +Carbon's +[project goal](/docs/project/goals.md#interoperability-with-and-migration-from-existing-c-code) +is migration of existing C++ developers, particularly "This means integrating +into the existing C++ ecosystem by supporting incremental migration from C++ to +Carbon." + +The expectation is that C++ users will already be using a fully featured build +system, such as CMake. Migration should be easier if users can retain their +existing build system, particularly since a typical migration can be expected to +mix both Carbon and C++ code. + +While Carbon could provide _both_ a separate compilation system _and_ a fully +featured build system, a build system is a substantial undertaking and we expect +C++ developers to already have one. + +### Don't support packaging directive to filename mappings + +Instead of making a mapping from packaging directives to filenames, we could +generate a list specific to the `Core` package, and not expose that for other +packages. + +We shouldn't manually maintain a mapping for the `Core` package; it should be +automated. It's likely that whatever we do in this space, however we would +support a mapping, would be of interest to small projects. It will probably be +low cost for us to build support for things other than `Core`, so we should just +do that. + +### Distribute pre-compiled versions of Core files + +Instead of building object files for `Core` on demand, we could distribute them +as part of Carbon. The upside of this is it would make builds a little faster; +the downside is that we'd end up in more of a situation where supported target +platforms were enumerated, or perhaps where special platforms could be built +on-demand in a bespoke manner. + +We can probably add limited caching where it'd help, and support all platforms +using similar logic that way with little performance penalty. + +### Create an explicit mapping from packaging directives to files + +The current `package` and `library` directive design means a given `api` file +may have 0 or more `impl` files. + +We could make it clear from the declaration in an `api` file what `impl` files +exist. This would require a split to describe the possible situations. For +example: + +- `library "foo";`: The common case of 1 `impl` file. +- `library "foo" api_only;`: Add a single keyword that indicates this is a + library with no `impl` file. +- `library "foo" multi_impl 3;`: Indicates this is an unusual library with 3 + `impl` files. + - Multiple impl files are expected to be rare. + - We could require numbered filenames (such as `a.impl.carbon`, + `a.1.impl.carbon`, `a.2.impl.carbon`), but even knowing how many exist + would allow compiles to do validation. If we didn't do this, then it may + be equivalent to not require specifying the number of `impl` files (in + the example, `multi_impl;` instead of `multi_impl 3;`). + +Some advantages are: + +- In the common cases of API-only or 1 impl file, we could avoid scanning the + file system for more files. In other words, it reduces file I/O for better + performance. +- Changes most "missing definition" failures from linker errors to + compile-time. + - For example at present, if a forward declaration is in an `api` file, + then even if we find an `impl` file that is missing the definition we + don't know if there's another `impl` file that contains the definition. + With this feature, we could diagnose while compiling the common 0 or 1 + `impl` file cases. +- Allows diagnosing unexpected or missing `impl` files, which can indicate a + developer mistake in the build. +- If multi-`impl` filenames were constrained to be numbered, we could: + - When building, look for specific filenames, instead of doing a file + system glob for `impl` filenames. + - Loosen the ambiguity constraint on library names to only disallow + library names ending with `\.\d+`. + +Some disadvantages are: + +- Adds more keywords to the packaging declaration. +- Requires updating the API file's declaration in order to modify the number + of `impl` files. + +This has been discussed in the past, but does not seem to be outlined in any +proposals as a considered alternative, and this proposal adds new trade-offs for +file mappings. Leads have declined this option in order to keep packaging +directives simple.