Skip to content

Missing files when using the local loader #49

@twiggler

Description

@twiggler

Introduction

When acquiring live windows hosts, occasionally the registry plugin cannot find registry hives in c:\windows\system32\config.

However, a VMDK image of one of the machines show the registry files are present. Because the system is still running, one hypothesis is the occurrence of a "temporal smear", where for example read mft / index data is stale. This is corroborated by several .evtx containing misplaced data.

Unfortunately, reproducing the issue is impossible because non-resident index data is not captured by acquire. Although ASDF would give us potentially more insights in the future, at this point of time we cannot fully rule out issues with the file lookup code of NTFS.

Inspection of the code has uncovered the following areas of concern:

Collation mismatch

(found by studying other implementations)

In

def _cmp_filename(entry: IndexEntry, value: str) -> Match:
we convert the filename on disk to uppercase and rely on standard Python string comparison. However, NTFS uses a file called $Upcase to map every Unicode character to its "upper case" equivalent for sorting. When special characters are involved like the german sharp S (ß) , this might cause files to be not found, because .upper() maps to SS, while $Upcase is a 1-1 mapping

Interpreting size of a data run as a signed integer

Imagine a data run length of 130 clusters (0x82), stored in 1 byte.

  1. fh.read(1) gets b'\x82'.
  2. varint sees 0x82 & 0x80 is True.
  3. It pads with 0xff: b'\x82\xff\xff\xff\xff\xff\xff\xff'.
  4. struct.unpack('<q') interprets this as -126.

(I did not find dataruns with negative sizes though)

B- or B+

(More of a question)

I am under the impression that NTFS uses B+ trees for indexing, where only leaf nodes contain data. However, at this line in index.py we also seem to stop searching if we have a match but are not on a leaf node (not entry.is_end and cmp(entry, search_value) == Match.Equal). Are these intermediary nodes known to have authoritative metadata?

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions