Skip to content

Pre linux.pagecache.recoverfs support#1561

Merged
ikelos merged 14 commits intovolatilityfoundation:developfrom
Abyss-W4tcher:pre_linux_pagecache_recoverfs_support
Jan 19, 2025
Merged

Pre linux.pagecache.recoverfs support#1561
ikelos merged 14 commits intovolatilityfoundation:developfrom
Abyss-W4tcher:pre_linux_pagecache_recoverfs_support

Conversation

@Abyss-W4tcher
Copy link
Contributor

@Abyss-W4tcher Abyss-W4tcher commented Jan 19, 2025

Hi,

To prepare for the new linux.pagecache.recoverfs plugin, I figured it would be better to split changes non directly related to it in a separate PR.

A few readability tweaks were operated, as well as splitting the inode content extraction and writing process to allow for better flexibility.
The InodeSize column was added to Files output, as I think it is a valuable piece of information. It will also complement very well linux.pagecache.RecoverFs Recovered FileSize column, as the following snippet demonstrates (scroll right to the last columns) :

$ python3 vol.py -f sample.bin -r pretty linux.pagecache.RecoverFs
  | SuperblockAddr |               MountPoint | Device |   InodeNum |      InodeAddr | FileType |  InodePages | CachedPages |   FileMode |                     AccessTime |               ModificationTime |                     ChangeTime |                                                                                                                                                                         FilePath |       InodeSize | Recovered FileSize
* | 0x89388c64c800 |                        / |    8:1 |      72981 | 0x89388cd19378 |      REG |          51 |          24 | -rw-r----- | 2024-11-15 12:40:53.900001 UTC | 2024-11-15 14:59:14.121585 UTC | 2024-11-15 14:59:14.121585 UTC |                                                                                                                                                                  /var/log/syslog |          207810 |             207810
* | 0x89388c64c800 |                        / |    8:1 |      72605 | 0x89388cd185b0 |      REG |           2 |           2 | -rw-rw-r-- | 2024-10-02 14:31:24.629028 UTC | 2024-11-15 14:58:43.880001 UTC | 2024-11-15 14:58:43.880001 UTC |                                                                                                                                                                    /var/log/wtmp |            6144 |               6144
* | 0x89388c64c800 |                        / |    8:1 |      72606 | 0x89388cd18118 |      REG |           0 |           0 | -rw-rw---- | 2024-10-02 14:31:24.629028 UTC | 2024-10-02 14:33:48.019670 UTC | 2024-10-02 14:36:11.362212 UTC |                                                                                                                                                                    /var/log/btmp |               0 |                  0
* | 0x89388c64c800 |                        / |    8:1 |      80281 | 0x89388cdeeac0 |      DIR |           1 |           0 | drwx------ | 2024-11-15 12:40:45.328000 UTC | 2024-11-15 12:40:45.328000 UTC | 2024-11-15 12:40:45.328000 UTC |                                                                                                                                                                 /var/log/private |            4096 |                N/A
* | 0x89388c64c800 |                        / |    8:1 |      80249 | 0x89388cd04a98 |      DIR |           1 |           0 | drwxr-sr-x | 2024-11-15 12:40:58.716001 UTC | 2024-11-15 12:40:44.876000 UTC | 2024-11-15 12:40:44.876000 UTC |                                                                                                                                                                 /var/log/journal |            4096 |                N/A
* | 0x89388c64c800 |                        / |    8:1 |      80275 | 0x89388cd033a0 |      DIR |           1 |           0 | drwxr-sr-x | 2024-11-15 12:40:58.716001 UTC | 2024-11-15 12:40:55.952001 UTC | 2024-11-15 12:40:55.952001 UTC |                                                                                                                                /var/log/journal/c602156d161745d29b98f977eb66a96d |            4096 |                N/A
* | 0x89388c64c800 |                        / |    8:1 |      72930 | 0x89388cd073f0 |      REG |        2048 |         292 | -rw-r----- | 2024-11-15 14:58:37.228001 UTC | 2024-11-15 14:58:42.764001 UTC | 2024-11-15 14:58:42.764001 UTC |                                                                                                              /var/log/journal/c602156d161745d29b98f977eb66a96d/user-1000.journal |         8388608 |            1196032
* | 0x89388c64c800 |                        / |    8:1 |      72437 | 0x89388cd00118 |      REG |        2048 |         732 | -rw-r----- | 2024-11-15 14:58:37.228001 UTC | 2024-11-15 14:59:14.401585 UTC | 2024-11-15 14:59:14.401585 UTC |                                                                                                                 /var/log/journal/c602156d161745d29b98f977eb66a96d/system.journal |         8388608 |            2998272

As you can see, this will provide easy lookups to compare the recovered file size to the announced size. It also arguments the choice of placing InodeSize at the very end. Happy to discuss it however !

Versioning

linux.pagecache.Files

  • Minor bump:
    • get_inodes: add "follow_symlinks" argument
    • rendering: add "InodeSize" column

linux.pagecache.InodePages

  • Minor bump:
    • add "write_inode_content_to_stream" method

Copy link
Member

@ikelos ikelos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, generally looks good, just the broken-out function doesn't stick to the general convention we've got going for plugin API methods and I'm trying to stick to it for as long as possible...

Also, there's an int cast that I think is unnecessary... 5;)

access_time_dt = self.inode.get_access_time()
modification_time_dt = self.inode.get_modification_time()
change_time_dt = self.inode.get_change_time()
inode_size = int(self.inode.i_size)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this being cast to an int? Is it not already an int? Please check, I'm really try to avoid people including unnecessary casts because they don't expect something to be an int, because then other people see it and they think they need to cast to an int, then some people do it out of caution, and before you know it the whole codebase is bursting full of pointless casts...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was explained by gcmoreira, in the comment right before the added line:

Ensure all types are atomic immutable. Otherwise, astuple() will take a long time doing a deepcopy of the Volatility objects.

I was also able to verify that not casting it to int, like the other properties, results in nested deepcopies.


@staticmethod
def write_inode_content_to_stream(
inode: interfaces.objects.ObjectInterface,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is starting to make use of more complex data types (not too bad) and break away from the standard format for methods (first param context). I'd really prefer this take a context and a layer_name rather than just a layer.

I don't have very good reason for sticking with convention, originally it was because everything you'd do needed a context (and if you wanted to extend this in the future, you'll likely want the context there to do it). It was also the thought in the back of my mind of one day serializing such calls for some reason, although I no longer remember what (perhaps parallelization). Regardless, I think it'd be a good convention to maintain unless you have strong objections...

Copy link
Contributor Author

@Abyss-W4tcher Abyss-W4tcher Jan 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I basically followed the write_inode_content_to_file design, as it was the accepted way of doing it.

I also prefer to pass a context and a layer name, happy to revert it that way !

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes please, I try not to be too nitpicky but it's hard keeping everything in order as it comes in so quickly. So it's be MINOR bump if we're just adding write_inode_content_to_stream and MAJOR bump if we're changing the signature of write_inode_content_to_file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I proceeded to update write_inode_content_to_file as well.

@ikelos
Copy link
Member

ikelos commented Jan 19, 2025

Ok, looks good, thanks!

@ikelos ikelos merged commit 00a69c9 into volatilityfoundation:develop Jan 19, 2025
13 checks passed
@Abyss-W4tcher Abyss-W4tcher deleted the pre_linux_pagecache_recoverfs_support branch January 19, 2025 17:00
@gcmoreira
Copy link
Contributor

Hey @ikelos, the changes in this PR are good. However, the timing is challenging, as @atcuno and I are currently focused on stabilizing several core APIs: Page Cache, Mountinfo, VMA, tasks, etc with related PRs still under review.

Could we consider temporarily reverting this PR and revisiting it once the pending parity release PRs are merged and @atcuno confirms that the changes have passed the full regression tests? Additionally, I would greatly appreciate it, and it seems prudent, to delay the merging of any other pending or future PRs that impact this code until the parity release PRs have been finalized. Thanks


@staticmethod
def format_symlink(symlink_source: str, symlink_dest: str) -> str:
return f"{symlink_source} -> {symlink_dest}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not too fan of this format_symlink staticmethod. I'm not sure about the benefit of including a method in the inode data class that simply formats two external strings to 's1 -> s2'?
I think it would make more sense, and it's my mistake, but _follow_symlink should ideally be a public method in the inode object extension

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A future implementation will require to format symlinks given an inode as well. Unifying it to a function allows to prevent code duplication, even if it might seem futile for one line indeed.

I put it under InodeUser as it is in fact only needed to make it more user readable after rendering 👍.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants