Skip to content

Conversation

@cyphar
Copy link
Member

@cyphar cyphar commented Dec 19, 2016

Currently the media type method of saying that one layer is
distributable or not doesn't have enough detail for practical usage. In
particular, it is still not clear what the license of a layer is (there
is a wide variety of licenses that allow distribution but have other
restrictions). This will cause many issues with distributions attempting
to package images without being able to verify that the image author did
actually intend for layer XYZ to be under license ABC.

Signed-off-by: Aleksa Sarai [email protected]

Currently the media type method of saying that one layer is
distributable or not doesn't have enough detail for practical usage. In
particular, it is still not clear what the license of a layer is (there
is a wide variety of licenses that allow distribution but have other
restrictions). This will cause many issues with distributions attempting
to package images without being able to verify that the image author did
actually intend for layer XYZ to be under license ABC.

Signed-off-by: Aleksa Sarai <[email protected]>
@cyphar
Copy link
Member Author

cyphar commented Dec 19, 2016

If people are okay with this, we could even consider dropping the two different MediaTypes for layers. But I recognise that some people might not be happy with that idea.

@jonboulle
Copy link
Contributor

Can we just use an annotation for this?

@cyphar
Copy link
Member Author

cyphar commented Dec 19, 2016

My concern with using annotations is that it means that you cannot specify the license of different layers separately. While it might seem trivial, if you take a GPLv2 distribution (such as openSUSE) and then plop an Apache 2.0 project in a new layer, it is not correct to call the entire thing GPLv2 or Apache licensed -- it's more complicated than that.

Annotations would be nice, but we'd need to add annotations to descriptors. And I don't really want to do that, because then we'd have 4 different places where annotations are defined.

@jonboulle
Copy link
Contributor

That is what I was talking about. #440

@cyphar
Copy link
Member Author

cyphar commented Dec 19, 2016

@jonboulle Ah, okay. Yeah, adding annotations to descriptors might work. Though I'm not sure if it's overkill or whether there are other valid usecases for such annotations.

@wking
Copy link
Contributor

wking commented Dec 19, 2016 via email

@stevvooe
Copy link
Contributor

Descriptors really aren't the right place for this. This needs to be handled by an out of band metadata system.

The issue is that an entire image is unlikely to represent a single-licensed entity. Images are going to be made of many files from many projects. Their licenses need to be inside the container, pointing at the requisite files, or outside the container, summing up the content of the container.

@wking
Copy link
Contributor

wking commented Dec 19, 2016 via email

@stevvooe
Copy link
Contributor

@wking Please let the conversation breath. By immediately commenting after I've said something you are making my point less effective unless I comment again.

The problem is not putting the license here or somewhere else. The issue here is trying to coerce the container image system into a packaging system, which it is not.

Without a single place for this metadata and declaring the relationship with competing metadata, add this field will only contribute to the spread of the problem, rather than a solution.

Content descriptors are for machines. They declare things that machines can understand and this particular field proposal doesn't fall into that, nor does it describe how a machine should consume it.

@cyphar
Copy link
Member Author

cyphar commented Dec 21, 2016

Content descriptors are for machines. They declare things that machines can understand and this particular field proposal doesn't fall into that, nor does it describe how a machine should consume it.

You're right. My main concern is that given the history of packaging (rpm, deb, etc) all package managers have a field for the license of a particular package. The reason for this is related to the legality of distributing free software and license requirements, as well as making sure that users are aware of what licenses apply to software on your machine.

Does this apply to container images? IMO yes, because images are collective works (works that combine multiple works) much like distributions (that also have licenses). So it's definitely something we need to consider -- though I understand the argument that some external source should track things (even though I'd prefer the information be stored in the image).

@stevvooe What if we reserve org.opencontainers.license which allows for a URI to the license for the image (and it could be defined on Manifests and ManifestLists)? It would work in most cases and wouldn't require modifying descriptors.

@stevvooe
Copy link
Contributor

stevvooe commented Jan 6, 2017

@cyphar While the analogy with packaging systems is a common one, it generally falls apart very quickly. Most container images will ship an entire underlying OS, which will have tons of different licenses. To properly represent this, you'd need something that could specify exactly which files are under which license.

Either way, placing this on descriptor is wildly incorrect. Descriptors tell you what to fetch and maybe how. Their role does not include metadata.

@RobDolinMS
Copy link
Collaborator

Is there some Linux Foundation Collaborative Project working on defining well-known locations for machine-friendly license MetaData?

@vbatts
Copy link
Member

vbatts commented Jan 18, 2017

This is a good discussion, but I don't think this needs a field. I'd rather keep this in annotations. At best a reference to any mnemonic codes for license short names.

@caniszczyk
Copy link
Contributor

I think using an SPDX identifier (https://spdx.org/licenses/) like say GPL-2.0 would be a good idea, they have done a lot of work in defining about these things and they are used in many places (like even on GitHub already when they detect the license in a repo).

I'll loop in LF Legal and the SPDX team who have been interested in this topic in the past.

@caniszczyk
Copy link
Contributor

(I'm pro baking this in as a field)

@stevvooe
Copy link
Contributor

@caniszczyk This field makes no sense on this object. It is completely out of context and completely antithetical to the purpose of this object.

The property of a descriptor is that you can take the binary content, hash, takes its size, apply a type and get the descriptor. It is a simple process that doesn't require any prior knowledge about the targeted content (other than intended type) and can be generated repeatedly. By adding the license field, we get rid of that property.

I can see an argument for annotations on descriptor, but I think we need to be clear about what those mean for the consumer of the annotated object. Right now, it is very unclear.

@caniszczyk
Copy link
Contributor

@stevvooe point noted, I believe that it's important to be able to specify the license of different layers separately, whether that is accomplished via annotations or some other mechanism is fine, but it's important to the point that @cyphar alluded to around fulfilling "legality of distributing free software and license requirements"

@stevvooe
Copy link
Contributor

@caniszczyk I understand that necessity, but descriptors aren't really the right place to do this nor does this proposal provide the right level of granularity to appropriately identify the licenses for content in a layer.

#501 is a better proposal in this direction but we really need to get a handle on how this will work in practice. There are also implementation concerns about extracting licensing so late in the process. By the time a descriptor is generated, it is unlikely to have access to any information about licensing or what is inside.

We should close this in favor of #501.

@cyphar cyphar mentioned this pull request Jan 18, 2017
@cyphar
Copy link
Member Author

cyphar commented Jan 18, 2017

I'd be willing to close this in favour of #501, but @caniszczyk wanted to loop in legal to see what they think about free-form strings defining licenses for different objects. I'm 👍 on making it an annotation, as long as it means you can define a license for every referenced blob -- because different blobs can have different licenses.

@stevvooe
Copy link
Contributor

I'm 👍 on making it an annotation, as long as it means you can define a license for every referenced blob -- because different blobs can have different licenses.

@cyphar I am not even sure if that is sufficient. A single blob could be made up of different licensed components. Placing licensing at this granularity will make it dependent on an implementation detail (layers) rather than on actual content. Much of this data could actually be lost at build time.

The config is a much better place to deal with licensing. It deals with the aggregate rootfs and could specify licensing details independent of the serialization format of the rootfs.

@gregkh
Copy link
Member

gregkh commented Jan 20, 2017

I think this should be closed. An individual "license" for a whole layer is a very complex thing.

@cyphar
Copy link
Member Author

cyphar commented Jan 20, 2017

@gregkh In the case of distributions, distributions actually have their own licenses (openSUSE for example is GPL) but that's a separate issue to be fair. 😉

@gregkh
Copy link
Member

gregkh commented Jan 20, 2017

@cyphar if only all distros were as sane as openSUSE with their license, this wouldn't be an issue :)

@stevvooe
Copy link
Contributor

@cyphar I'm going to close this one in favor of #501. Let me know if that isn't okay.

@stevvooe stevvooe closed this Jan 20, 2017
@cyphar cyphar deleted the descriptor-license-uris branch January 20, 2017 20:32
@cyphar
Copy link
Member Author

cyphar commented Jan 20, 2017

@stevvooe No problem! 😸

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants