Skip to content

AbstractPackage (was: Software Component)#1044

Open
zvr wants to merge 13 commits intospdx:developfrom
zvr:sw-component
Open

AbstractPackage (was: Software Component)#1044
zvr wants to merge 13 commits intospdx:developfrom
zvr:sw-component

Conversation

@zvr
Copy link
Member

@zvr zvr commented Jul 5, 2025

This introduces the idea of "Software Components", an abstract view of pieces of software.

The existing Package class records information about specific software packages, such as "OpenSSL v3.0.1 distributed by Ubuntu" or "OpenSSL v3.1.1 distributed by Debian." However, when storing data, this approach can lead to redundancy and inefficiencies, particularly when dealing with licensing information and other metadata that is common across multiple versions and distributions of the same software. This PR introduces the concept of a Component as an abstract reference to a piece of software, distinct from a Package, which as mentioned before represents a specific version of the software distributed by a particular supplier.

By adding this distinction, there is now a way to record relationships between different parts of the software ecosystem. For example, both "OpenSSL v3.0.1 distributed by Ubuntu" and "OpenSSL v3.1.1 distributed by Debian" Packages can be linked to the abstract Component "OpenSSL". This relationship-based approach not only enhances the clarity and organization of SPDX data but also leads to significant storage savings. Common information, such as the licensing terms of a component, can be stored once and referenced across multiple packages, eliminating redundancy.

For more information and some real-world numbers on the efficiency gains, one can see a presentation in this year's FOSDEM SBOM devroom.

This PR adds a new class named Component in the Software profile and a new RelationshipType to be used for expressing these relationships. No new properties are added; the new class re-uses some properties already present.

zvr added 2 commits July 5, 2025 23:24
Signed-off-by: Alexios Zavras (zvr) <zvr+git@zvr.gr>
Signed-off-by: Alexios Zavras (zvr) <zvr+git@zvr.gr>
@zvr zvr requested a review from goneall July 5, 2025 22:37
@zvr zvr added the Profile:Software Software profile and related matters label Jul 5, 2025
@zvr zvr added this to the 3.1 milestone Jul 5, 2025
Signed-off-by: Alexios Zavras (zvr) <zvr+git@zvr.gr>
Copy link
Member

@goneall goneall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was expecting to see several properties in the Component class that would be shared with the Packages which were instances of the Component. I'll need a bit more context on how this would work in practice - I'll go back and re-look at the presentation referenced.

@goneall
Copy link
Member

goneall commented Jul 6, 2025

I noticed there is an isPackagedBy relationship type - how would that be used in "relation to" the instancedOf relationship type?

@goneall
Copy link
Member

goneall commented Jul 6, 2025

I noticed there is an isPackagedBy relationship type - how would that be used in "relation to" the instancedOf relationship type?

In looking back at the presentation, most of the examples are relationships (e.g. license information), the exception being the copyrightText and attributionText. Should we add these properties to the Component? We may even want to consider making the component a subclass of SoftwareArtifact since many of the properties apply.

If we do add these as properties, we should specify the precedence if both the Component and Package specify the same property.

@zvr
Copy link
Member Author

zvr commented Jul 7, 2025

Thanks for the comments, @goneall. I'll try to answer everything in a single comment.

a component to be an instance of another component

Yes, this is needed, because a Component has an optional version. So, one might be talking about "Curl" in general, or "Curl 7.8.1" more specifically. And we want to have a relationship showing that "Curl 7.8.1" is an instance of "Curl". In this example, a Package would be "Curl 7.8.1 as distributed by Ubuntu", for example.

This is even more important in case where some of the meta-information changes according to the version. The typical example is OpenSSL, where all versions 1.x are under the OpenSSL license, while all versions 3.x are under Apache-2.0.

Let me upload a couple of diagrams I have, showing examples. They show different classes by shapes and different relationships by arrow line style.

most of the examples are relationships (e.g. license information), the exception being the copyrightText and attributionText. Should we add these properties to the Component ?

Ah, you are correct. I was under the impression that everything was relationships; I completely forgot that we ended up with properties for the texts. And I did not see them in the Package definition, so I completely missed them. I will add them as properties.

We may even want to consider making the component a subclass of SoftwareArtifact

No, I don't think this would be good idea, since the whole point of the Component is that it should be considered as an abstract, idealized view of some software. Definitely not an artifact.

If we do add these as properties, we should specify the precedence if both the Component and Package specify the same property.

You are absolutely correct. The precedence rule is that every attribute of a more specific entity overwrites attribute values of a more general entity. This way, property values of a Package are always valid; if they do not exist and the package is an instanceOf a Component, then the properties of this Component are taken. And this continues as long as there are "parent" components and no values have been specified.

@zvr
Copy link
Member Author

zvr commented Jul 7, 2025

Here are a couple of example diagrams. They show different classes by shapes and different relationships by arrow line style.

They also illustrate two different approaches, one using a single RelationshipType and one using two different ones.

The first one is about Curl: two versions, released by two suppliers (so four Packages), all under the same license. Note that isInstanceOf is used between Components, but isPackageOf for Component-Package. Yes, I know that SPDX has the reverse relationship type isPackagedBy, but I had the diagram ready and did not have time to update it. I am not proposing introducing this new type.

The diagram shows that the licensing information, for example, can be related to the abstract Component "Curl".

curl-ispackage-example

Looking at OpenSSL, the situation is more complicated, since the license is not the same for all versions. So, two more Components are created, one representing all "OpenSSL v1.x" and one for all "OpenSSL v3.x". And the licenses are related to these ones.

In this example, only the isInstanceOf relationship type is used, for all relationships between Components and Packages.

openssl-isinstance-example

There are pros and cons for both approaches (one or two RelationshipType used). I think my current preference is for two distinct types, since this simplifies the handling code somewhat: one automatically knows the type of elements in each side of the relationship and does not have to query.

Apologies for re-using old diagrams here due to lack of time.

@goneall
Copy link
Member

goneall commented Jul 7, 2025

Thanks @zvr for the diagrams and descriptions - I have no more questions and the proposal makes sense to me.

The challenge will be documenting this in a way readers of the spec can easily understand. Perhaps we can add some more text about the expected relationship types in the Package and Component descriptions. We could also add some example JSON-LD text.

@zvr
Copy link
Member Author

zvr commented Jul 7, 2025

Thanks, @goneall .

And now that I think about it, we might add some more text that this is not typically expected to be in SBOMs. SBOMs are always about specific Packages; this Component abstraction is a way of helping your SPDX data store to consolidate duplicate information.

Unless your SBOM contains many "Curl" packages, for example, it is not to your advantage to complicate things.

zvr added 2 commits July 8, 2025 14:55
Signed-off-by: Alexios Zavras (zvr) <zvr+git@zvr.gr>
Signed-off-by: Alexios Zavras (zvr) <zvr+git@zvr.gr>
@goneall
Copy link
Member

goneall commented Aug 5, 2025

From the 5 August 2025 tech call, there were 3 high level issues to be followed-up on:

  • Direction of the relationship
  • Balance of exchanged document size vs. consumer implementation complexity
  • Is the relationship between packages and components the same as the relationship between component

@goneall
Copy link
Member

goneall commented Aug 5, 2025

Balance of exchanged document size vs. consumer implementation complexity

Three possible solutions were raised during the 5 August, 2025 tech call

  1. Not allow components within documents - Note: there was a strong desire to allow components in the documents since it will save significant space - reference Yocto SBOMs
  2. Allow components, but don't allow component hierarchies in the document
  3. Create a component profile - this doesn't solve the complexity when components are used, but it does allow producers to communicate the requirement to implement components to the consumer. The consumer application can then "fail fast" if it receives a document containing components and doesn't implement support for components.

@zvr zvr modified the milestones: 3.1, 3.0.2 Aug 19, 2025
@goneall
Copy link
Member

goneall commented Sep 2, 2025

From tech call on 2 Sept 2025:

  • Proposal to merge this in - the outstanding issues can be resolved in a how-to guide
  • Could merge this into the 3.0 ISO version
  • Some concerns about whether this should be in core and some concerns about naming
  • We'll continue discussion in this PR as comments and make a decision by EOD Friday 5 Sept 2025

@stevenc-stb
Copy link
Collaborator

My only thought is that class name of Component might be better off as SoftwareComponent. Even is it redundant to have Software is will to easier to see on model image and understanding.

@zvr
Copy link
Member Author

zvr commented Sep 2, 2025

@stevenc-stb I also find it redundant, but from the discussion today it seems that other areas refer to components.

I'll change it -- this will not be the only case where we have redundancy in naming...

Signed-off-by: Alexios Zavras (zvr) <github@zvr.gr>
@goneall
Copy link
Member

goneall commented Sep 2, 2025

@zvr - did you want to consider / respond to @JPEWdev comment on the relationship direction and naming:

I'd recommend flipping the relationship around, since it's more likely that multiple Packages will map to a single Component. Perhaps abstractDefinitionOf ?

@zvr
Copy link
Member Author

zvr commented Sep 2, 2025

@JPEWdev you are correct that, because our Relationships have one from but may have multiple to, having a relationship going "upwards" might seem to make more sense.
[I am using "upwards" and "downwards" with references to the diagrams above only; of course one can diagram it in any way they want.]

However, since our Elements (and therefore the Relationships) are immutable, think of what will happen in a typical flow:

  • you already have a SoftwareComponent "curl" that is connected (somehow) to "curl v7.81" and "curl v8.9.1".
  • a new version "curl v8.15.0" appears
  • you want to represent the connection between the existing "curl" and this new object

If the links go upwards, you would need to add a new to to the Relationship. But our objects are immutable, so you either have to (a) create a new one (with all the tos and the new one) and retire the old one; or (b) create a new Relationship with the single new to. I think almost always (b) would be the right choice.

If the links go downwards, you know you have to create a new Relationship from the new curl version to the existing "curl".

In both ways ((b) in the upwards arrows case or in downwards arrows) you end up with a new Relationship with a single to. And whenever you search you will always do "find all Relationships..." instead of "find all tos from a single Relationship".

Does this make sense? @goneall , your thoughts?
I am not adamant that the arrows should be "downwards" -- it's just that I think there will be no difference anyway.

@JPEWdev
Copy link
Contributor

JPEWdev commented Sep 2, 2025

@JPEWdev you are correct that, because our Relationships have one from but may have multiple to, having a relationship going "upwards" might seem to make more sense. [I am using "upwards" and "downwards" with references to the diagrams above only; of course one can diagram it in any way they want.]

However, since our Elements (and therefore the Relationships) are immutable, think of what will happen in a typical flow:

* you already have a SoftwareComponent "curl" that is connected (somehow) to "curl v7.81" and "curl v8.9.1".

* a new version "curl v8.15.0" appears

* you want to represent the connection between the existing "curl" and this new object

If the links go upwards, you would need to add a new to to the Relationship. But our objects are immutable, so you either have to (a) create a new one (with all the tos and the new one) and retire the old one; or (b) create a new Relationship with the single new to. I think almost always (b) would be the right choice.

If the links go downwards, you know you have to create a new Relationship from the new curl version to the existing "curl".

In both ways ((b) in the upwards arrows case or in downwards arrows) you end up with a new Relationship with a single to. And whenever you search you will always do "find all Relationships..." instead of "find all tos from a single Relationship".

Does this make sense? @goneall , your thoughts? I am not adamant that the arrows should be "downwards" -- it's just that I think there will be no difference anyway.

Yes. And that is exactly our use case. We (yocto) would never actually use the "fan-out" of having multiple to because of the way we construct our SBoM where we don't want them grouped together regardless of which way they point. However, it does at least seem prudent to consider other use cases and make the "fan-out" match the commonly expected direction, in the event that someone does find it useful.

@JPEWdev
Copy link
Contributor

JPEWdev commented Sep 2, 2025

And also there are plenty of examples of the relationship not necessarily matching the "conceptual" direction of the arrow

@zvr
Copy link
Member Author

zvr commented Sep 2, 2025

OK, I don't mind reversing the direction of the Relationship.

As I wrote above, we already have the RelationshipType for "curl 8.9.1" —packagedBy→"Ubuntu curl 8.9.1" (the last level, where from are Packages).

How should we name the RelationshipType name for "curl" —???→"curl 8.9.1" (between SoftwareComponents) ?
Does hasInstance sound OK?

(if we were talking about arbitrary groupings, we might even say hasMember, but this might be useful in the more general case, not here.)

Signed-off-by: Alexios Zavras (zvr) <github@zvr.gr>
Copy link
Member

@goneall goneall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM

@swinslow
Copy link
Member

swinslow commented Sep 4, 2025

@zvr I don't think I have a particular concern about the idea of having "Component" as a generic concept of a collection of non-versioned / undifferentiated related Packages.

That said, I guess I'm not clear about which Package-related fields or relationships would be appropriate to link to Components. Looking at licenses (whether declared or concluded) and using OpenSSL as you mentioned, for example: I'm not sure it really makes sense to have metadata saying "what is the license of the general concept of OpenSSL?" Depending on which specific version of OpenSSL you mean, it could be OpenSSL or it could be Apache-2.0.

Given that, I would recommend that non-versioned, non-specific Components should not be used in connection with the declared license / concluded license values (again, assuming I'm understanding it correctly).

@bact
Copy link
Collaborator

bact commented Sep 4, 2025

If we want to have a "has license" relationship between SoftwareComponent and a license,
do we need to update the from class in these relationship types as well?

  • hasConcludedLicense: The from SoftwareArtifact is concluded by the SPDX data creator to be governed by each to license.
  • hasDeclaredLicense: The from SoftwareArtifact was discovered to actually contain each to license, for example as detected by use of automated tooling.

Currently the proposed SoftwareComponent is a subclass of Element.

@zvr
Copy link
Member Author

zvr commented Sep 5, 2025

Ah, but @swinslow did you see the second diagram I've put on the comment above ?

As you say, there is no way to associate a single license with "OpenSSL". But one can associate the OpenSSL license with a SoftwareComponent "OpenSSL 1.x" and the Apache-2.0 license with a different SoftwareComponent "OpenSSL 3.x". Keep in mind that a SoftwareComponent is an abstract representation of "something", a grouping that you define however you want.

As we know, license changes on version changes do not happen very frequently, so the general case (SoftwareComponent "Curl" associated with the curl license) would cover 99% of the cases.
I mean, in my own data, I also have historical things, like "gcc versions before 4.2.1 is under GPL-2.0-or-later while every newer one is under GPL-3.0-or-later", but I don't see much practical use in current SPDX data for versions before mid-2007. But one can record this information, if one thinks it's valuable.

@zvr
Copy link
Member Author

zvr commented Sep 5, 2025

@bact you are correct; the descriptions of these RelationshipTypes will have to be updated.

@goneall
Copy link
Member

goneall commented Oct 10, 2025

@zvr - Can you update the descriptions per your above comment?

@kestewart
Copy link
Contributor

kestewart commented Oct 10, 2025

@zvr - can you help me understand when we should be using component vs. package in SBOMs? Possibly the package definition should be updated to make it clear if you want to go forward with this.

@zvr zvr modified the milestones: 3.0.2, 3.1 Nov 11, 2025
@zvr
Copy link
Member Author

zvr commented Nov 14, 2025

@kestewart the main use case is not in SBOMs; in this case, in a specific software release you have a specific curl (for example, version 8.12.1-3ubuntu1 supplied by Canonical).

But in the (graph) data that you keep, for all your software, you have many curl packages (the one above, five other versions from the same supplier, a dozen of other versions by other suppliers, ...). You save a lot of space if you keep, for example, the licensing information not for all these 50 packages but only once, for a curl package (no version, no supplier).

Specifically for SBOMs, @JPEWdev mentioned in a tech call that this would save a lot of space in theirs (IIRC).

@kestewart
Copy link
Contributor

@JPEWdev - have your comments been addressed? do you approve?

@kestewart
Copy link
Contributor

kestewart commented Jan 19, 2026

ping @JPDEWdev, you good with this now?

Alexios, I'd rather this go into 3.1-rc2 after we have have more eyes on the differences between Package and Component. I'm still worried we'll have people getting confused.

I'm also wanting to make sure that we have considered how the Hardware Components (as well as the other profiles) and how they should be interacting. Hardware Component is a common term, and restricting it to just software could become problematic.

Signed-off-by: Arthit Suriyawongkul <arthit@gmail.com>
@gregshue
Copy link

During today's spdx-tech meeting @kestewart raised this PR for awareness. There was some preliminary discussion around this concept of "component" conflicting with definitions and concepts in existing standards. Here are some definitions that SPDX must contend with:

From SEVOCAB standards search for 'component':

component. (1) entity with discrete structure, such as an assembly or software module, within a system considered at a particular level of analysis (ISO/IEC 25019:2023, Systems and software engineering � Systems and software Quality Requirements and Evaluation (SQuaRE) � Quality-in-use model, 3.1.3) (2) one part that makes up a system (IEEE 1012-2024 IEEE Standard for System, Software, and Hardware Verification and Validation, 3.1) (3) object that encapsulates its own template, so that the template can be interrogated by interaction with the component (ISO/IEC 10746-2:2009 Information technology -- Open Distributed Processing -- Reference Model: Foundations, 9.26) (4) object with a discrete information type that is stored in a component content management system, such as a topic, prerequisite, section, image, or video (ISO/IEC/IEEE 26531:2023 Systems and software engineering -- Content management for product lifecycle, user and service management information for users, 3.1.3) (5) product used as a constituent in an assembled product, system or plant (IEC/IEEE 82079-1:2019 Preparation of information for use (instructions for use) of products: Part 1: Principles and general requirements, 3.4) Note: A component can be hardware or software and can be subdivided into other components. Component refers to a part of a whole, such as a component of a software product or a component of a software identification tag. The terms module, component, and unit are often used interchangeably or defined to be subelements of one another in different ways depending upon the context. The relationship of these terms is not standardized. A component can be independently managed or not from the end-user or administrator's point of view. See Also: element, unit

It seems to me the definitions above equate "component" to "composable unit". These units will always have versions with them, have specified interfaces to them, and are verifiable.

On a quick read over the top-level comments of this PR it seems like the goal is to support extracting/refactoring subsets of the SPDX information (data, metadata, etc.) to reduce the size of the SPDX file. It is a worthwhile goal that applies to HW as well as SW. Please notice how many times a resistor (electrical component) of the same specification appears on a circuit board. ;-)

AFAICT, we need to:

  1. use terms and solutions that scale well across the Systems problem space (and "component" is already standardized and widely used in that problem space);
  2. provide a structure that supports each composition unit being a separate package;
  3. does not conflict with or ambiguate the information that must be tracked for the package Supply Chain content;
  4. does not conflict with the Systems Engineering best practice of recursive (hierarchical) decomposition.

Perhaps this only requires using a different term?

@zvr
Copy link
Member Author

zvr commented Jan 23, 2026

It seems to me the definitions above equate "component" to "composable unit". These units will always have versions with them, have specified interfaces to them, and are verifiable.

I disagree that they "will always have versions". As the definition that was pasted says, they are to be considered at a particular level of analysis. There are definitely use cases to analyze systems without caring about the exact version of all the software pieces (license analysis being the most common one).

@gregshue
Copy link

I disagree that they "will always have versions". As the definition that was pasted says, they are to be considered at a particular level of analysis. There are definitely use cases to analyze systems without caring about the exact version of all the software pieces (license analysis being the most common one).

That's an interesting observation. Please consider that an analysis like that covers a range of implementations (or rather, a set of versions) that may be valid for integration. I believe each implementation will have a specific version (even if it is only identified by a SHA). This unique identification is part of the information that must be included in the SBOM.

@zvr
Copy link
Member Author

zvr commented Jan 24, 2026

@gregshue I completely agree. The proposed SoftwareComponent objects should never appear in an SBOM. They are "abstract" generalized views, while an SBOM must always have concrete Package objects (with specific version and supplier).

@zvr zvr changed the title Software Component AbstractPackage (was: Software Component) Mar 6, 2026
Copy link
Member

@goneall goneall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

The chain may continue further to more AbstractPackages,
as long as there are "parent" AbstractPackage and no values have been specified.

Every Package should be an instance of no more than one AbstractPackages.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Every Package should be an instance of no more than one AbstractPackages.
Every Package shall be an instance of no more than one AbstractPackage.

If we want this to be a requirement, use "shall".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Profile:Software Software profile and related matters

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants