-
Notifications
You must be signed in to change notification settings - Fork 75
Add support for --direct access for cim plugins #1437
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Add support for --direct access for cim plugins #1437
Conversation
@twiggler what are your thoughts? |
|
Another option is to leave the choice up to the user (with an option such as --direct / --insentive-direct), which add more complexity to cli. |
|
Expanding on an idea by @Schamper to create two virtual filesystems in the direct loader, one case sensitive and one insensitive, perhaps we can augment the An open question is how to represent the case insensitive filesystem in the target. |
|
So it would be partly the plugins responsibility to choice between a case sensitive or a case insensitive file system? |
Yes, I think even fully.
Maybe, I was thinking more among the lines of adding a parameter to
Yeah that would be best, if this PR can wait until next year. The additional target property with the case insensitive file system ( Open to better ideas :). |
By partially I mean plugin logic won't have to manually make insensitive file search (using iglob or something equivalent), but I get your point. |
An alternative to creating an case-insensitive filesystem in target would be to create a case-insensitive file system implementation which adapts a case sensitive |
|
Another thought is that maybe having both case insensitive and case sensitive file system would be a bit over engineering. Furthermore from a user (and dfir) perspective: Maybe it would be easier (for users and in term of code complexity) to just use case insensitive vfs in direct mode by default and
|
Yeah this more or less corresponds with the cli option you suggested earlier above. What I like is that it seems to be the simplest approach, which we arguably should try first. If it causes unforeseen problems, we can always go with a more sophisticated solution. Since the direct mode is arguably architecturally a bit unsound, it is preferable best not to turn it into a jenga tower. @Poeloe do you have any issues / concerns from a UX and technical perspective with making the virtual file system in direct mode case insensitive by default, and adding an CLI argument to override (see discussion above). |
|
I don't see any objections so let's go for the CLI switch. Probably will look again in January. |
|
Btw I have added a ticket to describe the idea to cut out parsers: #1471 |
Add --direct-insensitive flag.
|
I have modified this PR to add the I think that using an "advanced option" argument group ease readability, as it allows to split common args and other flag that you only need in really specific situation. This is useful for new users. A check is made to ensure the case insensitive vfs will not cause file overlap. This use rglob, which may be costly but I think that we don't expect usage of Let me know if you prefer make these changes in another PR, or make it yourself, I can revert the commit. Merry Christmas and Happy New Year back to you too 🎄 |
Nice, I will take a look later this week. |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #1437 +/- ##
=======================================
Coverage 80.74% 80.74%
=======================================
Files 394 394
Lines 34614 34658 +44
=======================================
+ Hits 27948 27986 +38
- Misses 6666 6672 +6
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
dissect/target/loaders/direct.py
Outdated
|
|
||
| def check_case_insensitive_overlap(self) -> bool: | ||
| """Verify if two differents files will have the same path in a case-insensitive fs""" | ||
| all_files_list = list( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am considering if the case insensitivity check should not be an optional check in VirtualDirectory::add.
It should be more efficient and easier to implement there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not familiar enough with these component but probably.
Nevertheless maybe this implementation could be considered a good enough for the --direct use case, and improvement could be managed in a dedicated issue (as it required more modification of internals).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nevertheless maybe this implementation could be considered a good enough for the --direct use case
Sure, we can defer to another ticket.
I will add a suggestion to make the overlap check more efficient tomorrow, along with another review
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been looking at the implementation of yield_all_file_recursively and check_case_insensitive _overlap. I noticed the comment mentioning that rglob isn't case-sensitive until Python 3.12. While that's a valid point to consider, the actual case-insensitive check in the current code happens when the paths are converted to lowercase.
My proposed refactoring keeps this exact same logic but simplifies the first step of gathering the files. Instead of using a custom recursive function, we can use rglob to get all the file paths and then perform the same lowercase comparison. This works correctly on any OS, regardless of the filesystem's native case sensitivity, because the check is done in Python after collecting the files.
Here’s how we could refactor it:
# No longer needed
# def yield_all_file_recursively(self, base_path: Path, max_depth: int = 7) -> Iterator[Path]:
def check_case_insensitive_overlap(self) -> bool:
"""Verify if two different files will have the same path in a case-insensitive fs."""
def get_files(path: Path):
if not path.exists():
return []
if path.is_file():
return [path]
# Recursively find all files in the directory
return list(path.rglob("*"))
# Create a flat list of all file paths from all input directories
all_paths = chain.from_iterable(get_files(p) for p in self.paths)
# Filter out directories, keeping only files
all_files = [p for p in all_paths if p.is_file()]
# Compare the count of all files with the count of unique, lowercased file paths
return len({str(p).lower() for p in all_files}) != len(all_files)
(Untested)
What do you think of this approach?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's close to my initial approach.
unfortunately it does not work on with python <3.12 when target is on a case sensitive FS on windows. I was not able to catch it in test (it maybe related to the cpython implemention) using the Virtual Filesystem.
my comment is misleading as in fact this is not related to the case_sensitive parameter, but probably to some inner works when adding this option
Here is the output from a windows VM, where z: is virtualbox share with your suggestion and o has the following content
❯ tree o
o
├── u
│ ├── a
│ │ └── 1
│ └── A
│ └── 1
└── U
├── a
│ └── 1
└── A
└── 1
6 directories, 4 files
With your suggestion : Warning only raised in 3.12
PS C:\Users\RaptorSniper\dissect.target> uv run --python 3.12 --refresh --extra dev --extra full target-query --direct -f example_namespace Z:\o
2026-01-13T16:09:49.617953Z [warning ] Direct mode used in case insensitive mode, but this will cause files overlap, consider using --direct-sensitive [dissect.target.loaders.direct]
<example/descriptor hostname=None domain=None field_a='namespace_example' field_b='record'>
PS C:\Users\RaptorSniper\dissect.target> uv run --python 3.10 --refresh --extra dev --extra full target-query --direct -f example_namespace Z:\o
Using CPython 3.10.19
<example/descriptor hostname=None domain=None field_a='namespace_example' field_b='record'>
PS C:\Users\RaptorSniper\dissect.target> uv run --python 3.11 --refresh --extra dev --extra full target-query --direct -f example_namespace Z:\o
Using CPython 3.11.14
<example/descriptor hostname=None domain=None field_a='namespace_example' field_b='record'>
PS C:\Users\RaptorSniper\dissect.target>
With previous implementation : Warning raised with all version
PS C:\Users\RaptorSniper\dissect.target> uv run --python 3.10 --refresh --extra dev --extra full target-query --direct -f example_namespace Z:\o
Using CPython 3.10.19
2026-01-13T16:18:52.905258Z [warning ] Direct mode used in case insensitive mode, but this will cause files overlap, consider using --direct-sensitive [dissect.target.loaders.direct]
<example/descriptor hostname=None domain=None field_a='namespace_example' field_b='record'>
PS C:\Users\RaptorSniper\dissect.target> uv run --python 3.11 --refresh --extra dev --extra full target-query --direct -f example_namespace Z:\o
Using CPython 3.11.14
2026-01-13T16:19:26.408962Z [warning ] Direct mode used in case insensitive mode, but this will cause files overlap, consider using --direct-sensitive [dissect.target.loaders.direct]
<example/descriptor hostname=None domain=None field_a='namespace_example' field_b='record'>
PS C:\Users\RaptorSniper\dissect.target> uv run --python 3.12 --refresh --extra dev --extra full target-query --direct -f example_namespace Z:\o
Using CPython 3.12.10 interpreter at: C:\Users\RaptorSniper\AppData\Local\Microsoft\WindowsApps\PythonSoftwareFoundation.Python.3.12_qbz5n2kfra8p0\python.exe
2026-01-13T16:20:40.384049Z [warning ] Direct mode used in case insensitive mode, but this will cause files overlap, consider using --direct-sensitive [dissect.target.loaders.direct]
<example/descriptor hostname=None domain=None field_a='namespace_example' field_b='record'>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have implemented a logic to define a different function for this edge case.
This will allows to easily delete it in the (not near) future when support for python 3.11 will be dropped.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah that's unfortunate, there indeed appears to be a bug in python 3.11 on Windows (python/cpython#94537)
I think it is nice we can remove the workaround when python 3.11 support is dropped.
|
I will do a QA check tomorrow. |
twiggler
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
QA LGTM
Thank you for your work @william-billaud.
One more question, you mentioned this in of the comments:
Problems solved, this was related to a conversion from VFS to string by the direct loader. Thus, the tests iterated over my entire file system, which is quite large.
Can you elaborate on why the string conversion was a problem? According to the typings, both LayerFilesystem::map_dir and LayerFilesystem::map_file should be able to take realpath as a str
| vfs.map_file(str(path), str(path)) | ||
| vfs.map_file(str(path), path) | ||
| elif path.is_dir(): | ||
| vfs.map_dir(str(path), str(path)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@twiggler regarding
Can you elaborate on why the string conversion was a problem? According to the typings, both LayerFilesystem::map_dir and LayerFilesystem::map_file should be able to take realpath as a str
As the path variable (that containes the VFS) was converted to string, the vfs object was lost and vfs.map_file tried to map the string '/' (thus my root fs)
twiggler
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like the windows unit test are failing for python < 3.12
Co-authored-by: Erik Schamper <[email protected]>
Co-authored-by: Erik Schamper <[email protected]>
Co-authored-by: Erik Schamper <[email protected]>
Co-authored-by: Erik Schamper <[email protected]>
Co-authored-by: Erik Schamper <[email protected]>
use the _get_path api to allows using the plugin in direct access mode.
As the plugins uses multiple files, IMO the more simple ways to uses the plugins is by providing the folder as input.
But this require case insensitive match on files. I'm not sure if this should be the plugins problem (managing case insensitive match) or direct loader vfs should be case insensitive.
closes #1418