Skip to content
This repository was archived by the owner on Nov 20, 2025. It is now read-only.

Add error handling to decode_text function#48

Merged
Schamper merged 3 commits intofox-it:mainfrom
JSCU-CNI:fix-record-text-encoding
Jul 31, 2025
Merged

Add error handling to decode_text function#48
Schamper merged 3 commits intofox-it:mainfrom
JSCU-CNI:fix-record-text-encoding

Conversation

@JSCU-CNI
Copy link
Contributor

@JSCU-CNI JSCU-CNI commented Jul 28, 2025

This PR demonstrates that the EseDB parser is unable to parse the LongText value from the column 4625-System_Search_AutoSummary in the record with WorkID=1017 in the table SystemIndex_PropertyStore.

The value passed to dissect.esedb.c_esedb.decode_text seems to be partially utf-16-le. Perhaps the record is not parsed correctly?

Could you weigh in here @Schamper?

# breakpoint in decode_text
ipdb> hexdump(buf)
00000000  48 00 6f 00 6e 00 67 00  20 00 4b 00 6f 00 6e 00   H.o.n.g. .K.o.n.
00000010  67 00 20 00 53 00 43 00  53 00 20 00 41 00 64 00   g. .S.C.S. .A.d.
00000020  6f 00 62 00 65 00 4d 00  69 00 6e 00 67 00 53 00   o.b.e.M.i.n.g.S.
...
00000130  30 00 78 00 38 00 38 00  3a 00 20 00 c0 31 c1 31   0.x.8.8.:. ..1.1
00000140  c2 31 c3 31 c4 31 40 d8  0c dd c5 31 40 d8 d1 dc   .1.1.1@....1@...
00000150  40 d8 cd dc c6 31 c7 31  40 d8 cb dc 47 d8 e8 df   @....1.1@...G...
00000160  c8 31 40 d8 ca dc c9 31  ca 31 cb 31 cc 31 40 d8   .1@....1.1.1.1@.
00000170  0e dd cd 31 ce 31 00 01  c1 00 cd 01 c0 00 12 01   ...1.1..........
00000180  c9 00 1a 01 c8 00 4c 01  d3 00 d1 01 d2 00 25 f3   ......L.......%.
00000190  be 1e 27 f3 c0 1e ca 00  01 01 e1 00 ce 01 e0 00   ..'.............
000001a0  51 02 13 01 e9 00 1b 01  e8 00 2b 01 ed 00 d0 01   Q.........+.....
000001b0  ec 00 4d 01 f3 00 d2 01  f2 00 6b 01 fa 00 d4 01   ..M.......k.....
000001c0  f9 00 d6 01 d8 01 da 01  dc 01 fc 00 44 f3 bf 1e   ............D...
000001d0  46 f3 c1 1e ea 00 61 02  da 23 db 23 fd ff fd ff   F.....a..#.#....
000001e0  fd ff fd ff fd ff fd ff  fd ff fd ff fd ff fd ff   ................
000001f0  fd ff fd ff fd ff fd ff  fd ff fd ff fd ff fd ff   ................
00000200  fd ff fd ff fd ff fd ff  fd ff fd ff fd ff fd ff   ................
00000210  fd ff fd ff fd ff fd ff  fd ff fd ff fd ff fd ff   ................
00000220  fd ff fd ff fd ff fd ff  fd ff fd ff fd ff fd ff   ................
00000230  fd ff fd ff fd ff fd ff  fd ff fd ff fd ff fd ff   ................
00000240  fd ff fd ff fd ff fd ff  fd ff fd ff fd ff fd ff   ................
00000250  fd ff fd ff fd ff fd ff  fd ff fd ff fd ff fd ff   ................
...
00000790  7f 95 e8 95 63 d8 0f de  e6 97 75 98 ce 98 de 98   ....c.....u.....
000007a0  63 99 66 d8 10 dc 7c 9c  1f 9e c4 9e 6f 6b 07 f9   c.f...|.....ok..
000007b0  37 4e 40 d8 87 dc 1d 96  37 62 a2 94 fd ff 20 00   7N@.....7b.... .
000007c0  30 00 78 00 38 00 43 00  3a 00 20 00 3b 50 fe 6d   0.x.8.C.:. .;P.m
000007d0  67 d8 73 dc a6 9f c9 3d  8f 88 50 d8 4e dd 77 70   g.s....=..P.N.wp
000007e0  f5 5c 20 4b 54 d8 cd dd  59 35 57 d8 30 dd 22 61   .\ KT...Y5W.0."a
000007f0  62 d8 32 de a7 8f f6 91  91 71 19 67 ba 73 4c d8   b.2......q.g.sL.
ipdb> up 3
ipdb> !self
<Record WorkID=1017 4631F-System_Search_GatherTime=b'\xefM\xe1\x9b3?\xd9\x01' 13F-System_Size=b'U\xa8\x00\x00\x00\x00\x00\x00' 14F-System_FileAttributes=32 15F-System_DateModified=b'\x00\xf5\xee\xd8r\xbb\xc2\x01' 16F-System_DateCreated=b'\x00\xf5\xee\xd8r\xbb\xc2\x01' 17F-System_DateAccessed=b'\x00\xf5\xee\xd8r\xbb\xc2\x01' 0F-InvertedOnlyMD5=b'M\xbek\xbbIre3\xf1\xb91\xb6\xb3\xdb\xaa\xe5' 4434-System_IsFolder=False 4472-System_MIMEType='application/pdf' ...>

@JSCU-CNI JSCU-CNI changed the title Add test Unable to parse LongText mixed encoded content Jul 28, 2025
@Schamper
Copy link
Member

Schamper commented Jul 29, 2025

Just looks like "corrupt" values to me. I checked with Esent Workbench (which uses the Windows APIs):
image

When I decode with errors="replace" I get pretty much the same hanzi:

ipdb> p buf.decode(CODEPAGE_MAP[encoding], errors='replace')
'Hong Kong SCS AdobeMingStd-Light-Acro-HKscs-B5-H ASCII:  !"#$%&\'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~ 0x88: ㇀㇁㇂㇃㇄𠄌㇅𠃑𠃍㇆㇇𠃋𡿨㇈𠃊㇉㇊㇋㇌𠄎㇍㇎ĀÁǍÀĒÉĚÈŌÓǑÒ\uf325\uf327ỀÊāáǎàɑēéěèīíǐìōóǒòūúǔùǖǘǚǜü\uf344ế\uf346ềêɡ⏚⏛������������������������������������������������������������������������������������ 0x89: 𪎩𡅅�攊��丽滝鵎釟��𧜵撑会伨侨兖兴农凤务动医华发变团声处备夲头学实実岚庆总斉柾栄桥济炼电纤纬纺织经统缆缷艺苏药视设询车轧轮琑糼緍楆竉刧����醌碸酞肼�贋胶𠧧��肟黇䳍鷉鸌䰾𩷶𧀎鸊𪄳㗁�溚舾甙�䤑马骏龙禇𨑬𡷊𠗐𢫦两亁亀亇亿仫伷㑌侽㹈倃傈㑽㒓㒥円夅凛凼刅争剹劐匧㗇厩㕑厰㕓参吣㕭㕲㚁咓咣咴咹哐哯唘唣唨㖘唿㖥㖿嗗㗅 0x8A: 𧶄唥�𠱂𠴕𥄫喐𢳆㧬𠍁蹆𤶸𩓥䁓𨂾睺𢰸㨴䟕𨅝𦧲𤷪擝𠵼𠾴𠳕𡃴撍蹾𠺖𠰋𠽤𢲩𨉖𤓓�𠵆𩩍𨃩䟴𤺧𢳂骲㩧𩗴㿭㔆𥋇𩟔𧣈𢵄鵮頕�䏙𦂥撴哣𢵌𢯊𡁷㧻𡁯𦛚𦜖𧦠擪𥁒𠱃蹨𢆡𨭌𠜱�䠋𠆩㿺塳𢶍�𤗈𠓼𦂗𠽌𠶖啹䂻䎺�䪴𢩦𡂝膪飵𠶜捹㧾𢝵跀嚡摼㹃�𪘁𠸉𢫏𢳉�𡃈𣧂㦒㨆𨊛㕸𥹉𢃇噒𠼱𢲲𩜠㒼氽𤸻��𧕴𢺋𢈈𪙛𨳍𠹺𠰴𦠜羓𡃏𢠃𢤹㗻𥇣𠺌𠾍𠺪㾓𠼰𠵇𡅏𠹌�𠺫𠮩𠵈𡃀𡄽㿹𢚖搲𠾭 0x8B: 𣏴𧘹𢯎𠵾𠵿𢱑𢱕㨘𠺘𡃇𠼮𪘲𦭐𨳒𨶙𨳊閪哌苄喹�𩻃鰦骶𧝞𢷮煀腭胬尜𦕲脴㞗卟𨂽醶𠻺𠸏𠹷𠻻㗝𤷫㘉𠳖嚯𢞵𡃉𠸐𠹸𡁸𡅈𨈇𡑕𠹹𤹐𢶤婔𡀝𡀞𡃵𡃶垜𠸑𧚔𨋍𠾵𠹻𥅾㜃𠾶𡆀𥋘𪊽𤧚𡠺𤅷𨉼墙剨㘚𥜽箲孨䠀䬬鼧䧧鰟鮍𥭴𣄽嗻㗲嚉\uf538\uf539𡯁屮靑𠂆乛亻㔾尣彑忄㣺扌攵歺氵氺灬爫丬犭𤣩罒礻糹罓𦉪㓁�𦍋耂肀𦘒𦥑卝衤见𧢲讠贝钅镸长门𨸏韦页风飞饣𩠐鱼鸟黄歯龜丷𠂇阝户钢� 0x8C: 倻淾𩱳龦㷉袏𤅎灷峵䬠𥇍㕙𥴰愢𨨲辧釶熑朙玺�'

Maybe we should introduce an (optional?) errors argument?

@JSCU-CNI
Copy link
Contributor Author

Maybe we should introduce an (optional?) errors argument?

That makes sense. Implemented in a5e68d7.

@JSCU-CNI JSCU-CNI changed the title Unable to parse LongText mixed encoded content Add error handling to decode_text function Jul 30, 2025
Copy link
Member

@Schamper Schamper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe make this accessible with some changes like:

@JSCU-CNI JSCU-CNI requested a review from Schamper July 31, 2025 08:25
@codecov
Copy link

codecov bot commented Jul 31, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 80.35%. Comparing base (ca05543) to head (7b2d63f).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main      #48      +/-   ##
==========================================
+ Coverage   78.53%   80.35%   +1.82%     
==========================================
  Files          16       16              
  Lines        1444     1410      -34     
==========================================
- Hits         1134     1133       -1     
+ Misses        310      277      -33     
Flag Coverage Δ
unittests 80.35% <100.00%> (+1.82%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@Schamper Schamper merged commit d7ed551 into fox-it:main Jul 31, 2025
25 checks passed
@JSCU-CNI JSCU-CNI deleted the fix-record-text-encoding branch July 31, 2025 08:34
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants