-
Notifications
You must be signed in to change notification settings - Fork 775
descriptor: add a license for each descriptor #506
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Currently the media type method of saying that one layer is distributable or not doesn't have enough detail for practical usage. In particular, it is still not clear what the license of a layer is (there is a wide variety of licenses that allow distribution but have other restrictions). This will cause many issues with distributions attempting to package images without being able to verify that the image author did actually intend for layer XYZ to be under license ABC. Signed-off-by: Aleksa Sarai <[email protected]>
|
If people are okay with this, we could even consider dropping the two different |
|
Can we just use an annotation for this? |
|
My concern with using annotations is that it means that you cannot specify the license of different layers separately. While it might seem trivial, if you take a GPLv2 distribution (such as openSUSE) and then plop an Apache 2.0 project in a new layer, it is not correct to call the entire thing GPLv2 or Apache licensed -- it's more complicated than that. Annotations would be nice, but we'd need to add annotations to descriptors. And I don't really want to do that, because then we'd have 4 different places where annotations are defined. |
|
That is what I was talking about. #440 |
|
@jonboulle Ah, okay. Yeah, adding annotations to descriptors might work. Though I'm not sure if it's overkill or whether there are other valid usecases for such annotations. |
|
On Mon, Dec 19, 2016 at 06:06:15AM -0800, Aleksa Sarai wrote:
Though I'm not sure if it's overkill or whether there are other
valid usecases for such annotations.
Another valid use case for descriptor annotations is naming and
discovery once refs/ gets dropped in favor of a manifest-list index
[1].
[1]: #438 (comment)
And the next few comments in that PR.
|
|
Descriptors really aren't the right place for this. This needs to be handled by an out of band metadata system. The issue is that an entire image is unlikely to represent a single-licensed entity. Images are going to be made of many files from many projects. Their licenses need to be inside the container, pointing at the requisite files, or outside the container, summing up the content of the container. |
|
On Mon, Dec 19, 2016 at 01:35:45PM -0800, Stephen Day wrote:
The issue is that an entire image is unlikely to represent a
single-licensed entity. Images are going to be made of many files
from many projects. Their licenses need to be inside the container,
pointing at the requisite files, or outside the container, summing
up the content of the container.
With license (and source: #71, #498) in descriptor annotations (some
future reroll of #438), you could attach them to individual layers,
which gives you the option to be as granular as you like. If you want
to support objects with multiple licenses, we could have the license
field (wherever it lives) support arrays (of SPDX identifiers [1]?).
But I expect there will still be cases too complicated to fit, so an
unset licensing annotation should always mean “the referenced object
has complicated/unspecified licensing”.
[1]: #216 (comment)
|
|
@wking Please let the conversation breath. By immediately commenting after I've said something you are making my point less effective unless I comment again. The problem is not putting the license here or somewhere else. The issue here is trying to coerce the container image system into a packaging system, which it is not. Without a single place for this metadata and declaring the relationship with competing metadata, add this field will only contribute to the spread of the problem, rather than a solution. Content descriptors are for machines. They declare things that machines can understand and this particular field proposal doesn't fall into that, nor does it describe how a machine should consume it. |
You're right. My main concern is that given the history of packaging ( Does this apply to container images? IMO yes, because images are collective works (works that combine multiple works) much like distributions (that also have licenses). So it's definitely something we need to consider -- though I understand the argument that some external source should track things (even though I'd prefer the information be stored in the image). @stevvooe What if we reserve |
|
@cyphar While the analogy with packaging systems is a common one, it generally falls apart very quickly. Most container images will ship an entire underlying OS, which will have tons of different licenses. To properly represent this, you'd need something that could specify exactly which files are under which license. Either way, placing this on descriptor is wildly incorrect. Descriptors tell you what to fetch and maybe how. Their role does not include metadata. |
|
Is there some Linux Foundation Collaborative Project working on defining well-known locations for machine-friendly license MetaData? |
|
This is a good discussion, but I don't think this needs a field. I'd rather keep this in annotations. At best a reference to any mnemonic codes for license short names. |
|
I think using an SPDX identifier (https://spdx.org/licenses/) like say GPL-2.0 would be a good idea, they have done a lot of work in defining about these things and they are used in many places (like even on GitHub already when they detect the license in a repo). I'll loop in LF Legal and the SPDX team who have been interested in this topic in the past. |
|
(I'm pro baking this in as a field) |
|
@caniszczyk This field makes no sense on this object. It is completely out of context and completely antithetical to the purpose of this object. The property of a descriptor is that you can take the binary content, hash, takes its size, apply a type and get the descriptor. It is a simple process that doesn't require any prior knowledge about the targeted content (other than intended type) and can be generated repeatedly. By adding the license field, we get rid of that property. I can see an argument for |
|
@stevvooe point noted, I believe that it's important to be able to specify the license of different layers separately, whether that is accomplished via annotations or some other mechanism is fine, but it's important to the point that @cyphar alluded to around fulfilling "legality of distributing free software and license requirements" |
|
@caniszczyk I understand that necessity, but descriptors aren't really the right place to do this nor does this proposal provide the right level of granularity to appropriately identify the licenses for content in a layer. #501 is a better proposal in this direction but we really need to get a handle on how this will work in practice. There are also implementation concerns about extracting licensing so late in the process. By the time a descriptor is generated, it is unlikely to have access to any information about licensing or what is inside. We should close this in favor of #501. |
|
I'd be willing to close this in favour of #501, but @caniszczyk wanted to loop in legal to see what they think about free-form strings defining licenses for different objects. I'm 👍 on making it an annotation, as long as it means you can define a license for every referenced blob -- because different blobs can have different licenses. |
@cyphar I am not even sure if that is sufficient. A single blob could be made up of different licensed components. Placing licensing at this granularity will make it dependent on an implementation detail (layers) rather than on actual content. Much of this data could actually be lost at build time. The |
|
I think this should be closed. An individual "license" for a whole layer is a very complex thing. |
|
@gregkh In the case of distributions, distributions actually have their own licenses (openSUSE for example is GPL) but that's a separate issue to be fair. 😉 |
|
@cyphar if only all distros were as sane as openSUSE with their license, this wouldn't be an issue :) |
|
@stevvooe No problem! 😸 |
Currently the media type method of saying that one layer is
distributable or not doesn't have enough detail for practical usage. In
particular, it is still not clear what the license of a layer is (there
is a wide variety of licenses that allow distribution but have other
restrictions). This will cause many issues with distributions attempting
to package images without being able to verify that the image author did
actually intend for layer XYZ to be under license ABC.
Signed-off-by: Aleksa Sarai [email protected]