Next Steps 2023.05.15 #4

bretbrownjr · 2023-05-16T01:44:09Z

bretbrownjr
May 16, 2023
Maintainer

At C++Now, there were some discussions about next steps for packaging standards involving the following people: @bretbrownjr, @ruoso, @daminetreg, and @billhoffman

To my recollection, the current thinking is that there are two parallel avenues of work that can be pursued as a next step, both involving clarification of data models for the purposes of packaging interoperability.

One data model is for describing releases of projects. The data included would need to be able to describe:

the unique identity of a project, such as a name
the version(s) of a project that are available
the ownership of a project
- identification of owner(s)
- authentication of owner(s)
locations of archives for given versions of a project
...and other useful information

Thinking among the group at C++Now is that some sort of federated or federation-friendly arrangement would be interesting given the amount of closed-source and otherwise vendor-provided projects that would need to operate in this environment. It seems that clear identification of source releases will be essential to ensure that various packaging systems, monorepos, and build systems all communicate clearly.

The other data model would describe projects as installed on a system. The data included would need to be able to describe:

the unique identity of a project
the source release the project was created from (see above)
interfaces that are provided by the library, if any
- header files
- module interfaces
- interfaces for other languages
binaries provided by the library, if any
build requirements of the library, if any
dependencies of the library, if any
runtime requirements of the library, if any

There seems to be a desire to also describe built packages that are not libraries, such as tools to be used in build processes. If someone wants to elaborate on use cases for those, I would appreciate it.

The metadata about installed projects would provide context to build and packaging systems in consistent ways. Ideally, this metadata would replace pkg-config metadata and hardcoded information in CMake scripts used in find_package workflows.

Given C++ is used consistently in polyglot environments, the above systems would by polyglot systems, especially providing support for C, Fortran, C++, etc., but not necessarily limited to those. To the extent that there are compelling use cases, providing support for projects in language-specific packaging ecosystems (pypi, crates, etc.) would be ideal.

daminetreg · 2023-05-16T08:11:33Z

daminetreg
May 16, 2023

Thanks for summing it up @bretbrownjr ! Nicely done !

Here is the paper we wrote with Yannic for the reference: package and modules registry.pdf .

@pysco68 from @tipi-build to the discussion.

0 replies

GabrielDosReis · 2023-05-16T09:06:10Z

GabrielDosReis
May 16, 2023
Maintainer

It is essential that even in the release project model, we don't limit to only released sources, unless source here is not meant literally as "program or library source". There ought to be a way to describe a package (not yet installed) that is a mixture of pre-build and source files in the "release project model".

12 replies

DarkArc May 16, 2023

Is the use case precompiled libraries used as third-party because until today there isn't everyone using a package manager or especially for releases of closed source components ?

There are at least three use cases (in no particular order) that I can think of for distributing binaries in a package management solution:

Proprietary software
Keeping build times down on fresh "checkouts" ("morally equivalent" to Arch Linux vs Gentoo)
- This is especially true when your dependency has a lot of dependencies
Allowing internal builds of products to leverage common build infrastructure (e.g., instead of a company having to build library X for every developer it's built once on CI and pulled in to the dev machines by package management)

mwoehlke-kitware May 16, 2023

I think the reality here is that most Linux distributions are not suddenly going to turn into Gentoo. I'm a bit confused, actually, as consuming already built artifacts seems to be the primary and historic use case. The need to consume sources is something of a new requirement brought on by C++ Modules.

Put differently, I am confused. To me, "projects as installed on a system" implies binary artifacts. Perhaps the problem is the use of the term "installed", which has a very unclear meaning, and in particular, carries connotations of the package being located in a well-defined location. That's something we should avoid, and talking about "installed" packages is perhaps sub-optimal. (I'm not sure if "distributed" is the correct replacement, or even just "pre-built".) I absolutely want and expect to be able to point to a package specification in a build tree and be able to consume the package that way. The only difference "installed" means to me is that I don't have to tell the consumer's build system where to find the package specification.

GabrielDosReis May 16, 2023
Maintainer

I think the reality here is that most Linux distributions are not suddenly going to turn into Gentoo. I'm a bit confused, actually, as consuming already built artifacts seems to be the primary and historic use case. The need to consume sources is something of a new requirement brought on by C++ Modules.

We didn't have C++ template libraries before C++20?

mwoehlke-kitware May 16, 2023

We didn't have C++ template libraries before C++20?

...that require the consumer to compile provided sources? I don't recall that as a common use case.

Or do you mean headers (and header-only libraries)? When you said "sources", I took it to mean "things that the consumer needs to compile as stand-alone artifacts" (i.e. not "headers", or things that will be consumed as textual inclusions of the consumer's own sources).

Obviously, being able to consume headers is not excluded from being able to consume pre-built libraries, and header-only libraries are just a special case of that.

bretbrownjr May 16, 2023
Maintainer Author

@GabrielDosReis By "release", I was trying to refer to operations like "click the release button on github". That is, a tarball or archive of a project at a point in time.

All styles of dependency management systems, including monorepos and package management systems, need to be able to portably refer to their constituent projects, keep them up to date, provide software bills of material, install build-time dependencies, and perform other operations besides. All of that requires that we can not only describe libraries portably, but that we can describe specific libraries as they evolve.

As to binaries, it's possible that a vendor might provide a "release" of a project that is prebuilt binaries and associated interfaces as well. I would expect most source releases would be raw source built by a package manager or monorepo build system, though.

daminetreg · 2023-05-16T10:07:16Z

daminetreg
May 16, 2023

@bretbrownjr

There seems to be a desire to also describe built packages that are not libraries, such as tools to be used in build processes. If someone wants to elaborate on use cases for those, I would appreciate it.

We are currently working on this at tipi, for us the compiler, linker and everything involved is a dependency and should be declared/declarable as a dependency, even cmake is a dependency that we build.

Widespread example of this are Yocto Buildroot that builds everything from sources from a first bootstrapping host compiler and differentiates between build time dependencies: static library for target or host- specific tools and runtime dependencies that are supposed to be shipped with on the target for the program to run, like shared libraries.

Similarly some of your dependencies might need another build system like GN from google, one would like to specify it's a dependency of the build so it can be used to build the packages you need.

Another typical example is anything that requires code generators, like protocol-buffer, when you depend on gRPC as library, you also want protoc as tool for your host system to be used to generate the C++ files from the .proto, hence the package that provides .proto sources should have as dependency the protoc compiler executable, otherwise one can't build the sources from this package.

Hence you need to be able to specify that this package is a dependency of yours that it is a build time dependency or a target dependency.

For example when cross-compiling for arm32 I still require an host tool x86 "protoc" to be able to generate the files that will be used in the build process, but I don't want to build or ship protoc for the arm32 target.

Our solution so far in .tipi/deps is to be able to specify :

{
  "ninja-build/ninja" : { 
      "@" : "v1.11.1"
    , "provides_host_tools" : {  "PATH" : ["bin/"] }
    , "as_target_dep" : false
  }
}

That means that ninja will be built for the host system when building the packages from source but won't be linked into the final dependent binary nor shipped with on the target system.

2 replies

grafikrobot May 16, 2023
Maintainer

Conan deals generally with tool dependency equivalently. It generally categorizes dependencies (tools and libraries) depending on which stage it's needed (during build of a package, during usage of a package, etc).

alexreinking May 17, 2023

Halide is an even more complex example of this... such build-time tools are built during the downstream build and used to produce object files that eventually get shipped as part of a library or executable.

grafikrobot · 2023-05-16T13:38:19Z

grafikrobot
May 16, 2023
Maintainer

Thank you for the summary. I wish I was there to chat. We can chat at some other conference as to why I don't go to C++Now :)

To my recollection, the current thinking is that there are two parallel avenues of work that can be pursued as a next step, both involving clarification of data models for the purposes of packaging interoperability.

I think we are missing the other side of interoperability with this. What you describe here only answers one question "How does one use an existing package?". Which is great if you only want to consume packages. But it doesn't answer the other side of "How does one build a package?". In other words.. How does a package manager invoke the build system of a library to generate a package that it (and others) can publish and use.

Having said that though.. It is fine if having an answer for this other side is not a goal for this effort. Hence I'm wondering if supporting both routes are goals or not?

5 replies

grafikrobot May 16, 2023
Maintainer

For reference of the areas I think we need see https://wg21.link/p1177

mwoehlke-kitware May 16, 2023

I very much view "how do I build this" as something that needs to be kept as orthogonal as possible. Reality is that there are many build systems, and part of the objective is to establish a lingua franca that these can use so that build tool A can consume the outputs of build tool B. Package managers have no role in that.

That's not to say that what you're proposing is a bad goal, just that it should remain orthogonal. Trying to conflate them is going to be difficult without the effort turning into an attempt to impose a single, monolithic ecosystem, and that effort is almost surely doomed to failure. (If nothing else, there are many competitors with a vested interest in their own slice of that pie.)

So, yeah, I think "an answer for this other side is not a goal for this effort" is correct.

grafikrobot May 16, 2023
Maintainer

Okay good. I agree on those grounds also. Just wanted to see how others felt about it. There's a lot of ecosystem to cover. And want to make sure we aren't loosing track of where things fit.

GabrielDosReis May 16, 2023
Maintainer

Okay good. I agree on those grounds also. Just wanted to see how others felt about it. There's a lot of ecosystem to cover. And want to make sure we aren't loosing track of where things fit.

At the CppCon 2022 meeting, me explicitly called out that the packaging description has to be build-system-agnostic: however the package was built should be orthogonal to its description and how it is consumed.

bretbrownjr May 16, 2023
Maintainer Author

@grafikrobot Given that paper, I guess I'm proposing we start with:

7.3.3 - Universal Package Identier
- ...though within the context of package identification services
- I'm also generalizing 'package' to 'project' since I believe monorepo users also have these dependency management issues; designing to be able to incorporate more allies into the work seems wise.
7.2 - Build System API
- Though not an entire end-to-end process
- For now, we'd start with how one build system would communicate with another build system about artifacts and requirements via metadata files as in @mwoehlke-kitware's CPS proposal, pkg-config, or something along those lines.

Again, in both of those, I'd like the first tiny step to be consolidation on what a minimal data model for each would look like. As in name fields, version identifiers, and so on.

bretbrownjr · 2023-05-16T14:58:31Z

bretbrownjr
May 16, 2023
Maintainer Author

Lots of great comments here so far! Rather than respond in a couple subthreads at once, I'll make another top-level comment to clarify my thinking on this.

Motivation

To clarify my motivation for this work, the 2023 Annual C++ Developer Survey "Lite", five of the top six most common frustrations about C++ (question 6) are directly or indirectly about dependency management:

Managing libraries my application depends on
Build times
Setting up a continuous integration pipeline from scratch (automated builds, tests, ...)
Managing CMake projects
Setting up a development environment from scratch (compiler, build system, IDE, ...)

First Steps

Again, I think initially we can start with data models (entity-component diagrams, perhaps), though I do think we'll want more coherent design documents of some sort. Relevant portions of those documents would be appropriate to present to the ISO Tooling WG for inclusion in a C++ Ecosystem Standard document.

To clarify the two avenues we identified in Aspen at C++Now:

How do we consistently identify a project in abstract? We need meaningful names!
How do we consistently identify a prebuilt project installed on disk? We need build/packaging interop to build source code!

Future Work

There are certainly other avenues to consider. In my mind, they can be tackled later, and so the priority list in my head has them categorized as important goals for future investment. For instance (still not an exhaustive list!):

How do we describe how to build a project?
How do we describe the compatibility surface of the source release of a project?
Can we provide standards leveraging relevant information for software supply chain use cases?
Can we provide portable tools for identifying security issues within a codebase?
Can we provide portable tools for updating projects within a codebase?
How do we describe the compatibility surface of a built project?
Can we provide portable tools to detect and clearly describe common mistakes?
Can we provide portable tools to fix common mistakes?

How do we consistently identify a prebuilt project installed on disk?

Basically we need build systems to communicate at a distance through dependency management systems. See CPS by @mwoehlke. A library built by meson or even Makefile should be possible to be consumed by CMake or bazel. A "header-only" or "module-interface-only" library should be able to clearly describe dependencies.

There are plenty of reasons we have always needed this kind of metadata. For instance, a future "link what you include" tool cannot really work without some information about what dependencies are possible and what interfaces they provide. pkg-config sort of provides most of this data in a nonportable way, but that's clearly not good enough. The best technology we have so far, per-project integrations expressed in the CMake language, sort of requires all projects move to CMake if we want a consistent and standard product.

More, as I look at how modules can be correctly delivered and consumed, it's becoming clear to me that C++ modules really want a healthy packaging ecosystem to exist in. Modular interfaces need to come with structured metadata describing module mappings, parse instructions, transitive dependencies, and so on.

How do we consistently identify a project in abstract?

As we consider built package metadata, we quickly run into identity problems.

Should zlib be spelled z, libz, zlib, or all of the above?
How do we trace built libraries on disk back to the original project?
What's to stop someone from providing a z or a zlib-json filled with malware?
How is it clearly communicated when certain versions of zlib are deprecated, contain CVEs, or are prohibitively buggy?
How does a codebase discover and incorporate new versions of zlib?

Note that all these use cases also exist for closed-source or otherwise prebuilt libraries. And the use cases apply across languages in the sort of polyglot codebases that C++ seems to consistently target (i.e., C, python, Fortran, and C++ all in the same shared library). So it seems like we need a way to attribute identity and ownership with a flexibility that most existing package managers tend to avoid.

So some of us are coming to the conclusion is that we need at least a project name registry with some identity management and authentication features. @ruoso thinks an organization like IANA might be a good fit for governance (and hosting?) for these sorts of operations. I'm not familiar with how to engage IANA or similar organizations, but it sounds like a place to start.

But without waiting on that sort of research, we can come up with a data model describing how one would identify a project and its ownership. Possibly we could also add interesting metadata to those projects like mandatory dependencies, supported environments, and so on.

1 reply

andrey-zherikov May 17, 2023

How do we consistently identify a project in abstract?

I think we should have optional namespace in project identity. Consider a case when a company consumes some OSS package (e.g. openssl) not "as is" but with some patches that make this package usable in this company's infrastructure. So they have to differentiate whether a package is local (e.g. company::openssl) or comes from somewhere else (e.g. oss::openssl).

So some of us are coming to the conclusion is that we need at least a project name registry with some identity management and authentication features.

This sounds like a right way but I believe ecosystem should allow multiple registries. For example, there should be one public registry that is available to everyone and contains info about public packages. Another one is a private registry that can be used by a company to share private packages within company. Besides that, large companies might use multiple registries (due to permissions control, for example).

Combining these two thoughts, we can use registry name as a namespace so a package name is not required to be unique world-wide but within a registry only. In case of multiple registries, the following cases are possible when a package refers another package (just an idea):

If reference contains registry name then the target package is taken from specified registry.
If reference does not contain registry name then the target package is taken from the registry where referrer resides.

andrey-zherikov · 2023-05-17T02:09:05Z

andrey-zherikov
May 17, 2023

The other data model would describe projects as installed on a system. The data included would need to be able to describe:

the source release the project was created from (see above)

This should include exact versions of all dependencies that were used to built a specific version of "installed" package. Something similar to conan's lockfile or npm's package-lock.json.

0 replies

Next Steps 2023.05.15 #4

Uh oh!

Uh oh!

bretbrownjr May 16, 2023 Maintainer

Replies: 6 comments · 20 replies

Uh oh!

Uh oh!

GabrielDosReis May 16, 2023 Maintainer

Uh oh!

Uh oh!

Uh oh!

Uh oh!

GabrielDosReis May 16, 2023 Maintainer

Uh oh!

Uh oh!

bretbrownjr May 16, 2023 Maintainer Author

Uh oh!

Uh oh!

Uh oh!

grafikrobot May 16, 2023 Maintainer

Uh oh!

Uh oh!

grafikrobot May 16, 2023 Maintainer

Uh oh!

grafikrobot May 16, 2023 Maintainer

Uh oh!

Uh oh!

grafikrobot May 16, 2023 Maintainer

Uh oh!

GabrielDosReis May 16, 2023 Maintainer

Uh oh!

bretbrownjr May 16, 2023 Maintainer Author

Uh oh!

bretbrownjr May 16, 2023 Maintainer Author

Motivation

First Steps

Future Work

How do we consistently identify a prebuilt project installed on disk?

How do we consistently identify a project in abstract?

Uh oh!

Uh oh!

bretbrownjr
May 16, 2023
Maintainer

Replies: 6 comments 20 replies

GabrielDosReis
May 16, 2023
Maintainer

GabrielDosReis May 16, 2023
Maintainer

bretbrownjr May 16, 2023
Maintainer Author

grafikrobot May 16, 2023
Maintainer

grafikrobot
May 16, 2023
Maintainer

grafikrobot May 16, 2023
Maintainer

grafikrobot May 16, 2023
Maintainer

GabrielDosReis May 16, 2023
Maintainer

bretbrownjr May 16, 2023
Maintainer Author

bretbrownjr
May 16, 2023
Maintainer Author