Skip to content

Conversation

@william-billaud
Copy link
Contributor

use the _get_path api to allows using the plugin in direct access mode.

As the plugins uses multiple files, IMO the more simple ways to uses the plugins is by providing the folder as input.

But this require case insensitive match on files. I'm not sure if this should be the plugins problem (managing case insensitive match) or direct loader vfs should be case insensitive.

closes #1418

@Schamper
Copy link
Member

Schamper commented Dec 9, 2025

But this require case insensitive match on files. I'm not sure if this should be the plugins problem (managing case insensitive match) or direct loader vfs should be case insensitive.

@twiggler what are your thoughts?

@william-billaud
Copy link
Contributor Author

william-billaud commented Dec 9, 2025

Another option is to leave the choice up to the user (with an option such as --direct / --insentive-direct), which add more complexity to cli.

@twiggler
Copy link
Contributor

Expanding on an idea by @Schamper to create two virtual filesystems in the direct loader, one case sensitive and one insensitive, perhaps we can augment the get_paths interface to take an additional boolean case_insensitive to select on the desired filesystem.

An open question is how to represent the case insensitive filesystem in the target.
Perhaps it could be a property such as ifs.

@william-billaud
Copy link
Contributor Author

So it would be partly the plugins responsibility to choice between a case sensitive or a case insensitive file system?
Each plugins will have a variable member such as DIRECT_MODE_REQUIRE_CASE_INSENSITIVE, and the get_path would use this variable? This look good to me.
Would you prefer to implement it in another PR (let me know if you need help) and close this PR after?

@twiggler
Copy link
Contributor

twiggler commented Dec 15, 2025

So it would be partly the plugins responsibility to choice between a case sensitive or a case insensitive file system?

Yes, I think even fully.

Each plugins will have a variable member such as DIRECT_MODE_REQUIRE_CASE_INSENSITIVE, and the get_path would use this variable? This look good to me.

Maybe, I was thinking more among the lines of adding a parameter to get_paths to control case sensitivity, which is more flexible but perhaps less convenient. But seems more of a detail.

Would you prefer to implement it in another PR (let me know if you need help) and close this PR after?

Yeah that would be best, if this PR can wait until next year. The additional target property with the case insensitive file system (Target::ifs or something) might be considered something of a wart though. However, I don´t think it is easy or desirable to let loader behavior depend on properties of the plugin, so arguably we need to create two virtual filesystems.

Open to better ideas :).

@william-billaud
Copy link
Contributor Author

Yes, I think even fully.

By partially I mean plugin logic won't have to manually make insensitive file search (using iglob or something equivalent), but I get your point.

@twiggler
Copy link
Contributor

twiggler commented Dec 18, 2025

Yes, I think even fully.

By partially I mean plugin logic won't have to manually make insensitive file search (using iglob or something equivalent), but I get your point.

An alternative to creating an case-insensitive filesystem in target would be to create a case-insensitive file system implementation which adapts a case sensitive Filesystem / FilesystemEntry, and implement all queries to be case insensitive.

@william-billaud
Copy link
Contributor Author

william-billaud commented Dec 18, 2025

Another thought is that maybe having both case insensitive and case sensitive file system would be a bit over engineering.
For most plugins --direct mode will only contains a list of files, so case sensitivity is not a big deal.
For some others, such as CIM plugins, or some other working with e.g SQLITE3 database, plugins will need to iterate over files or find a specific file, but you won't expect having two database to process with the same name (but different case) in the same folder.

Furthermore from a user (and dfir) perspective:
Files target in --direct mode will mostly be few files collected from another system, thus the case sensitivity may have already been lost during data transfer (unless you expect usage of --direct mode in live forensics investigation?).

Maybe it would be easier (for users and in term of code complexity) to just use case insensitive vfs in direct mode by default and

  • Allows user to override this setting (using config or a flag)
  • Raise a warning if there is an overlap (two file with the same name but a different case) by checking if the file/folder does not exist in the Directloader before mapping file/dir in the vfs?

@twiggler
Copy link
Contributor

twiggler commented Dec 18, 2025

Another thought is that maybe having both case insensitive and case sensitive file system would be a bit over engineering. For most plugins --direct mode will only contains a list of files, so case sensitivity is not a big deal. For some others, such as CIM plugins, or some other working with e.g SQLITE3 database, plugins will need to iterate over files or find a specific file, but you won't expect having two database to process with the same name (but different case) in the same folder.

Furthermore from a user (and dfir) perspective: Files target in --direct mode will mostly be few files collected from another system, thus the case sensitivity may have already been lost during data transfer (unless you expect usage of --direct mode in live forensics investigation?).

Maybe it would be easier (for users and in term of code complexity) to just use case insensitive vfs in direct mode by default and

* Allows user to override this setting (using config or a flag)

* Raise a warning if there is an overlap (two file with the same name but a different case) by checking if the file/folder does not exist in the Directloader before mapping file/dir in the vfs?

Yeah this more or less corresponds with the cli option you suggested earlier above.
I think @Miauwkeru also favors this approach.

What I like is that it seems to be the simplest approach, which we arguably should try first. If it causes unforeseen problems, we can always go with a more sophisticated solution. Since the direct mode is arguably architecturally a bit unsound, it is preferable best not to turn it into a jenga tower.

@Poeloe do you have any issues / concerns from a UX and technical perspective with making the virtual file system in direct mode case insensitive by default, and adding an CLI argument to override (see discussion above).

@twiggler
Copy link
Contributor

I don't see any objections so let's go for the CLI switch.

Probably will look again in January.
Merry Christmas and a Happy New Year.

@twiggler
Copy link
Contributor

twiggler commented Dec 19, 2025

Btw I have added a ticket to describe the idea to cut out parsers: #1471

@william-billaud
Copy link
Contributor Author

I have modified this PR to add the --direct-insentive flag. As this PR is a good use case for this feature, I feel it was easier to make it in the same PR, especially for test.

options:
  --direct              treat TARGETS as paths to pass to plugins directly (default: False)
<TRUNC>

Advanced options:
  --direct-sensitive    Same as --direct, but paths will be case sensitive (default: False)

I think that using an "advanced option" argument group ease readability, as it allows to split common args and other flag that you only need in really specific situation. This is useful for new users.

A check is made to ensure the case insensitive vfs will not cause file overlap. This use rglob, which may be costly but I think that we don't expect usage of --direct mode with thousands of files in sub directories.

 uv run --python 3.12 --refresh --extra dev --extra full target-query --direct -f example_namespace tests/_data/loaders/direct/overlap
2025-12-22T10:14:47.286288Z [warning  ] Direct mode used in case insensitive mode, but this will cause files overlap, consider using --direct-sensitive [dissect.target.loaders.direct]
<example/descriptor hostname=None domain=None field_a='namespace_example' field_b='record'>

Let me know if you prefer make these changes in another PR, or make it yourself, I can revert the commit.
Feel free to edit the PR, especially regarding flag name/option group.

Merry Christmas and Happy New Year back to you too 🎄

@twiggler
Copy link
Contributor

twiggler commented Jan 6, 2026

I have modified this PR to add the --direct-insentive flag. As this PR is a good use case for this feature, I feel it was easier to make it in the same PR, especially for test.

options:
  --direct              treat TARGETS as paths to pass to plugins directly (default: False)
<TRUNC>

Advanced options:
  --direct-sensitive    Same as --direct, but paths will be case sensitive (default: False)

I think that using an "advanced option" argument group ease readability, as it allows to split common args and other flag that you only need in really specific situation. This is useful for new users.

A check is made to ensure the case insensitive vfs will not cause file overlap. This use rglob, which may be costly but I think that we don't expect usage of --direct mode with thousands of files in sub directories.

 uv run --python 3.12 --refresh --extra dev --extra full target-query --direct -f example_namespace tests/_data/loaders/direct/overlap
2025-12-22T10:14:47.286288Z [warning  ] Direct mode used in case insensitive mode, but this will cause files overlap, consider using --direct-sensitive [dissect.target.loaders.direct]
<example/descriptor hostname=None domain=None field_a='namespace_example' field_b='record'>

Let me know if you prefer make these changes in another PR, or make it yourself, I can revert the commit. Feel free to edit the PR, especially regarding flag name/option group.

Merry Christmas and Happy New Year back to you too 🎄

Nice, I will take a look later this week.

@codecov
Copy link

codecov bot commented Jan 6, 2026

Codecov Report

❌ Patch coverage is 81.53846% with 12 lines in your changes missing coverage. Please review.
✅ Project coverage is 80.74%. Comparing base (3c63ec2) to head (48719bd).

Files with missing lines Patch % Lines
dissect/target/loaders/direct.py 86.11% 5 Missing ⚠️
dissect/target/plugins/os/windows/cim.py 77.27% 5 Missing ⚠️
dissect/target/tools/utils/cli.py 33.33% 2 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main    #1437   +/-   ##
=======================================
  Coverage   80.74%   80.74%           
=======================================
  Files         394      394           
  Lines       34614    34658   +44     
=======================================
+ Hits        27948    27986   +38     
- Misses       6666     6672    +6     
Flag Coverage Δ
unittests 80.74% <81.53%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@codspeed-hq
Copy link

codspeed-hq bot commented Jan 6, 2026

Merging this PR will not alter performance

✅ 11 untouched benchmarks


Comparing william-billaud:cim_add_direct_support (48719bd) with main (3c63ec2)

Open in CodSpeed


def check_case_insensitive_overlap(self) -> bool:
"""Verify if two differents files will have the same path in a case-insensitive fs"""
all_files_list = list(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am considering if the case insensitivity check should not be an optional check in VirtualDirectory::add.

It should be more efficient and easier to implement there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not familiar enough with these component but probably.

Nevertheless maybe this implementation could be considered a good enough for the --direct use case, and improvement could be managed in a dedicated issue (as it required more modification of internals).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nevertheless maybe this implementation could be considered a good enough for the --direct use case

Sure, we can defer to another ticket.

I will add a suggestion to make the overlap check more efficient tomorrow, along with another review

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been looking at the implementation of yield_all_file_recursively and check_case_insensitive _overlap. I noticed the comment mentioning that rglob isn't case-sensitive until Python 3.12. While that's a valid point to consider, the actual case-insensitive check in the current code happens when the paths are converted to lowercase.

My proposed refactoring keeps this exact same logic but simplifies the first step of gathering the files. Instead of using a custom recursive function, we can use rglob to get all the file paths and then perform the same lowercase comparison. This works correctly on any OS, regardless of the filesystem's native case sensitivity, because the check is done in Python after collecting the files.

Here’s how we could refactor it:

# No longer needed
# def yield_all_file_recursively(self, base_path: Path, max_depth: int = 7) -> Iterator[Path]:

def check_case_insensitive_overlap(self) -> bool:
    """Verify if two different files will have the same path in a case-insensitive fs."""

    def get_files(path: Path):
        if not path.exists():
            return []
        if path.is_file():
            return [path]
        # Recursively find all files in the directory
        return list(path.rglob("*"))

    # Create a flat list of all file paths from all input directories
    all_paths = chain.from_iterable(get_files(p) for p in self.paths)
    # Filter out directories, keeping only files
    all_files = [p for p in all_paths if p.is_file()]

    # Compare the count of all files with the count of unique, lowercased file paths
    return len({str(p).lower() for p in all_files}) != len(all_files)

(Untested)
What do you think of this approach?

Copy link
Contributor Author

@william-billaud william-billaud Jan 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's close to my initial approach.
unfortunately it does not work on with python <3.12 when target is on a case sensitive FS on windows. I was not able to catch it in test (it maybe related to the cpython implemention) using the Virtual Filesystem.
my comment is misleading as in fact this is not related to the case_sensitive parameter, but probably to some inner works when adding this option

Here is the output from a windows VM, where z: is virtualbox share with your suggestion and o has the following content

❯ tree o
o
├── u
│   ├── a
│   │   └── 1
│   └── A
│       └── 1
└── U
    ├── a
    │   └── 1
    └── A
        └── 1

6 directories, 4 files                                                 

With your suggestion : Warning only raised in 3.12

PS C:\Users\RaptorSniper\dissect.target>  uv run --python 3.12 --refresh --extra dev --extra full target-query --direct -f example_namespace Z:\o
2026-01-13T16:09:49.617953Z [warning  ] Direct mode used in case insensitive mode, but this will cause files overlap, consider using --direct-sensitive [dissect.target.loaders.direct]
<example/descriptor hostname=None domain=None field_a='namespace_example' field_b='record'>
PS C:\Users\RaptorSniper\dissect.target>  uv run --python 3.10 --refresh --extra dev --extra full target-query --direct -f example_namespace Z:\o
Using CPython 3.10.19
<example/descriptor hostname=None domain=None field_a='namespace_example' field_b='record'>
PS C:\Users\RaptorSniper\dissect.target>  uv run --python 3.11 --refresh --extra dev --extra full target-query --direct -f example_namespace Z:\o
Using CPython 3.11.14
<example/descriptor hostname=None domain=None field_a='namespace_example' field_b='record'>
PS C:\Users\RaptorSniper\dissect.target>        

With previous implementation : Warning raised with all version

PS C:\Users\RaptorSniper\dissect.target>  uv run --python 3.10 --refresh --extra dev --extra full target-query --direct -f example_namespace Z:\o
Using CPython 3.10.19
2026-01-13T16:18:52.905258Z [warning  ] Direct mode used in case insensitive mode, but this will cause files overlap, consider using --direct-sensitive [dissect.target.loaders.direct]
<example/descriptor hostname=None domain=None field_a='namespace_example' field_b='record'>
PS C:\Users\RaptorSniper\dissect.target>  uv run --python 3.11 --refresh --extra dev --extra full target-query --direct -f example_namespace Z:\o
Using CPython 3.11.14
2026-01-13T16:19:26.408962Z [warning  ] Direct mode used in case insensitive mode, but this will cause files overlap, consider using --direct-sensitive [dissect.target.loaders.direct]
<example/descriptor hostname=None domain=None field_a='namespace_example' field_b='record'>
PS C:\Users\RaptorSniper\dissect.target>  uv run --python 3.12 --refresh --extra dev --extra full target-query --direct -f example_namespace Z:\o
Using CPython 3.12.10 interpreter at: C:\Users\RaptorSniper\AppData\Local\Microsoft\WindowsApps\PythonSoftwareFoundation.Python.3.12_qbz5n2kfra8p0\python.exe
2026-01-13T16:20:40.384049Z [warning  ] Direct mode used in case insensitive mode, but this will cause files overlap, consider using --direct-sensitive [dissect.target.loaders.direct]
<example/descriptor hostname=None domain=None field_a='namespace_example' field_b='record'>

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have implemented a logic to define a different function for this edge case.

This will allows to easily delete it in the (not near) future when support for python 3.11 will be dropped.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah that's unfortunate, there indeed appears to be a bug in python 3.11 on Windows (python/cpython#94537)

I think it is nice we can remove the workaround when python 3.11 support is dropped.

@twiggler
Copy link
Contributor

I will do a QA check tomorrow.

twiggler
twiggler previously approved these changes Jan 15, 2026
Copy link
Contributor

@twiggler twiggler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

QA LGTM

Thank you for your work @william-billaud.

One more question, you mentioned this in of the comments:

Problems solved, this was related to a conversion from VFS to string by the direct loader. Thus, the tests iterated over my entire file system, which is quite large.

Can you elaborate on why the string conversion was a problem? According to the typings, both LayerFilesystem::map_dir and LayerFilesystem::map_file should be able to take realpath as a str

vfs.map_file(str(path), str(path))
vfs.map_file(str(path), path)
elif path.is_dir():
vfs.map_dir(str(path), str(path))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@twiggler regarding

Can you elaborate on why the string conversion was a problem? According to the typings, both LayerFilesystem::map_dir and LayerFilesystem::map_file should be able to take realpath as a str

As the path variable (that containes the VFS) was converted to string, the vfs object was lost and vfs.map_file tried to map the string '/' (thus my root fs)

Copy link
Contributor

@twiggler twiggler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the windows unit test are failing for python < 3.12

@william-billaud
Copy link
Contributor Author

@twiggler, should be fixed with f1e01c4

twiggler
twiggler previously approved these changes Jan 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

CIM usage

3 participants