Skip to content

Conversation

@harrysarson
Copy link
Contributor

The configuration allows applying filers based on include, exclude and include-orphans whilst staging dependencies rather than when assembling the artifact. This is useful to avoid overlapping files errors.

The configuration allows applying filers based on `include`, `exclude`
and `include-orphans` whilst staging dependencies rather than when
assembling the artifact. This is useful to avoid overlapping files
errors.
@harrysarson harrysarson force-pushed the harry/compose-filter-whilst-staging branch from 8b8f97a to 4e0500e Compare May 22, 2025 10:52
@gtristan
Copy link
Contributor

gtristan commented May 22, 2025

I think this needs some deeper thought.

Right now compose does:

         stage dependencies in /
                  |
                  V
        run integration commands
                  |
                  V
         apply splits to resulting
filesystem after integration commands
                  |
                  V
  place selected files into %{install-root}

The problem you are trying to solve, is that this approach conflicts with overlaps, in the case that initially staged artifact dependency chains have multiple artifacts which provide the same groups of files.

In what ways can we address this ?

  • It seems to me that we would have this overlap problem even if it was a build element and not a compose element, so the question of why are these files overlapping is relevant
  • If we address this in the way that you propose
    • We render the integration command file capturing possibly useless in cases with the proposed option added
    • We are applying splits for different reasons
    • It may be that what we want is to have a different set of include/exclude elements in the staging process, that molds the input in advance of integration commands and before the original set of include/exclude rules are employed to curate the output of the compose element.
  • In the case of handling overlap errors raised in compose's Element.stage() implementation, I wonder if it makes sense to carve out an escape hatch of sorts ?
    • I.e. if it is valid to have overlap errors enabled in a project, does it still make sense to not enforce overlap whitelisting ?
    • If we can deem that it is indeed a case worthy of a special case escape hatch for the specific case of compose elements, would it make sense to (optionally) catch the exception raised by Element.stage_dependency_artifacts() and cause it to be non-fatal in this case ?
  • Would it change something if, for example, we handled the case where integration commands are not going to run differently, and just do the Element.stage_dependency_artifacts() with the include/exclude/orphans options, directly into %{install-root}, and circumvent the issue in this case ?

If we're going to add configuration here, we should consider what will be maximally useful for different possible use-cases.

Also, I think it would be good to avoid breaking expectations of setting overlap warnings as fatal warnings (as mentioned above, it seems telling that this error you are trying to circumvent would still be occurring if this were a build element and not a compose element).

@harrysarson
Copy link
Contributor Author

* It seems to me that we would have this overlap problem even if it was a build element and not a compose element, so the question of _why are these files overlapping_ is relevant
  
  * Does it make more sense for the overlapping element to use the [overlap whitelist](https://docs.buildstream.build/master/format_public.html#overlap-whitelist) ?

There are two elements that provide the same file, one from FDSDK (so I cannot edit it) and one in my project (so I can edit it). If I add overlap whitelist to my element that would allow the version of the file from my element to replace the one from FDSDK. However, I want to do the oposite. I want the version of the file from my element to be replaced by the one from FDSDK.


For context my work around is to add the following to the element in my project:


config:
  (>):
  # This file causes overlapping issues with trace from perf.bst
  - |
    rm %{install-root}/usr/bin/trace

@gtristan
Copy link
Contributor

[...]

For context my work around is to add the following to the element in my project:


config:
  (>):
  # This file causes overlapping issues with trace from perf.bst
  - |
    rm %{install-root}/usr/bin/trace

Not shipping the file which conflicts with perf seems to be a perfectly sound approach. This, or distributing your trace program under a different name if it is a different tool.

@harrysarson
Copy link
Contributor Author

Not shipping the file which conflicts with perf seems to be a perfectly sound approach. This, or distributing your trace program under a different name if it is a different tool.

With the split-rule + compose approach (that requires the changes in this PR) we give the element that integrates the elements the choice on whether to remove the offending file, if perf isn't going into the disk image then it would make sense to be able to keep the other version of trace.

@abderrahim
Copy link
Contributor

FWIW, I've been thinking about this "filter while staging" idea. I feel that it makes more sense as a dependency configuration that is more generically applicable rather than being specific to compose.

This would make it usable for other use cases as well.

@gtristan
Copy link
Contributor

@harrysarson:

With the split-rule + compose approach (that requires the changes in this PR) we give the element that integrates the elements the choice on whether to remove the offending file, if perf isn't going into the disk image then it would make sense to be able to keep the other version of trace.

Sure. I think that the approach you currently taking is sensible.

That said, I'm not happy with the implementation you propose, and I've asked for more thinking in my above comment #2012 (comment) to find an API that is more all around generally useful.

One approach I've suggested there, would be to instead have a separate splitting at staging time (e.g. it could be stage-include/stage-exclude/stage-orphans kind of API to be applied at staging, before integration occurs).

However, I rather like @abderrahim's line of thinking too:

FWIW, I've been thinking about this "filter while staging" idea. I feel that it makes more sense as a dependency configuration that is more generically applicable rather than being specific to compose.

This would make it usable for other use cases as well.

The way the code is structured, at least some duplication is needed in base Element classes to do the dependency configuration support, that said I agree it can make sense to do some kind of filtering at staging time for script, compose and build elements (probably with the same consistent API).

Expressing the dependency on partial artifacts is also reminiscent of past suggestions from @sstriker, the idea you propose makes me wonder if just having this data expressed might leave some window open to improve build avoidance in the future (i.e. "If the part of my dependency artifact which I use, has not changed, I don't need to rebuild"), while this is a bit far fetched given the nature of cache keys, it's at least worth noting in the context of this discussion.

Orthogonal to this (but somewhat relevant to this conflicting "/usr/bin/perf" program), something that I've been mulling over, is; why do we limit ourselves to filtering ? I think that elements like compose and filter which do these inclusions/exclusions, might also benefit from the ability to do relocations, i.e. renaming/relocating files on a per-split or per-file basis.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants