Skip to content

Conversation

parker-snyk
Copy link
Contributor

@parker-snyk parker-snyk commented Aug 8, 2025

We were using .wh. to look for whited out files, but we didn't escape the ., so it was a match-any character

We also weren't following the OpenContainer spec: https://github.com/opencontainers/image-spec/blob/main/layer.md#whiteouts

.wh. needs to be the prefix of the filename. So it either needs to prefix everything (no slashes, just filename) or it needs to prefix the filename after the last slash (full path)

Our current implementation just looks for .wh. to appear anywhere, so we are incorrectly whiting out files. This includes libraries, so all of the node 'which' packages were being skipped, for example

@parker-snyk parker-snyk force-pushed the CN-272-whiteout-regex-bug branch 3 times, most recently from 7f7dd67 to f59fe4a Compare August 8, 2025 19:30
@parker-snyk parker-snyk marked this pull request as ready for review August 8, 2025 19:34
@parker-snyk parker-snyk requested a review from a team as a code owner August 8, 2025 19:34
@parker-snyk parker-snyk enabled auto-merge (squash) August 8, 2025 19:35
@parker-snyk parker-snyk disabled auto-merge August 8, 2025 20:37
This was referenced Aug 11, 2025
cursor[bot]

This comment was marked as outdated.

cursor[bot]

This comment was marked as outdated.

@parker-snyk parker-snyk force-pushed the CN-272-whiteout-regex-bug branch from ea113ba to ef6a033 Compare August 18, 2025 18:18
@parker-snyk parker-snyk requested a review from bgardiner August 19, 2025 13:37
export function isWhitedOutFile(filename: string) {
return filename.match(/.wh./gm);
const lastSlashIndex = filename.lastIndexOf("/");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: would it be cleaner to just use the builtin path.basename. maybe not, just pointing out in case you didn't know about it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Much cleaner, thank you! I did not know about this

export function removeWhiteoutPrefix(filename: string): string {
// Replace .wh. that appears at the start or after the last slash,
// and ensure no slashes come after .wh.
return filename.replace(/^(.*\/)?\.wh\.([^\/]*)$/, "$1$2");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you think this should be "+" instead? I guess we expect 0 times to be a bug in the generator (i.e., impossible) but we should still remove it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(The comment applied to the * in [^\/]*)

export function isWhitedOutFile(filename: string) {
return filename.match(/.wh./gm);
const lastSlashIndex = filename.lastIndexOf("/");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this assumes unix path separator -- do we need to handle windows too?

Copy link
Contributor Author

@parker-snyk parker-snyk Aug 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just updated to use path.basename so it should account for this now - https://nodejs.org/api/path.html#windows-vs-posix
I didn't know about this function until you mentioned it, so let me know if I'm incorrect

* prefixing the filename with a .wh.
* https://www.madebymikal.com/interpreting-whiteout-files-in-docker-image-layers
* https://github.com/opencontainers/image-spec/blob/main/layer.md#whiteouts
*/
export function isWhitedOutFile(filename: string) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this function also need to account for opaque whiteouts: https://github.com/opencontainers/image-spec/blob/main/layer.md#opaque-whiteout?

In other words, check that the filename is not .wh..wh..opq? Looking above, it's not clear our extractor logic even handles opaque whiteouts at all 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think as long as it starts with .wh it should account for it, it looks like out logic checks if a file is or within any whited out file or directory, so if a directory starts with .wh, any files under it should be excluded. I can manually verify though

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I misunderstood how .wh..wh..opq worked, I'll fix this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants