Skip to content

fix #1190 support to display packet content as UTF-8#1382

Open
lotabout wants to merge 1 commit intothe-tcpdump-group:masterfrom
lotabout:feat/support-utf8
Open

fix #1190 support to display packet content as UTF-8#1382
lotabout wants to merge 1 commit intothe-tcpdump-group:masterfrom
lotabout:feat/support-utf8

Conversation

@lotabout
Copy link

@lotabout lotabout commented Nov 2, 2025

This PR adds a new --utf8 option to enable UTF-8 character support.

When UTF-8 support is enabled, tcpdump will detect and display UTF-8 characters in the payload as-is when using the -A option.
Note that in -X mode, if a multi-byte character spans across two lines, it will appear on the first line, and a spaces will be padded on the next line.

Tests

  • Verified with several random PCAP files dumped using -A and -x, ensuring their MD5 checksums remain identical without --utf8.
  • Manually tested using a UTF-8 sample PCAP on macOS.

utf8.pcap.zip

@lotabout lotabout changed the title fix #1190 support to display packet content as UTF-8 [Draft] fix #1190 support to display packet content as UTF-8 Nov 2, 2025
@lotabout lotabout force-pushed the feat/support-utf8 branch 4 times, most recently from 2ecc6b2 to 354fa81 Compare November 2, 2025 16:17
@lotabout lotabout changed the title [Draft] fix #1190 support to display packet content as UTF-8 fix #1190 support to display packet content as UTF-8 Nov 2, 2025
@gvanem
Copy link
Contributor

gvanem commented Nov 4, 2025

I tried this on Windows using MSVC and clang-cl. But there is no wcwidth() here:

print-ascii.c(118,11): error: call to undeclared function 'wcwidth'; ISO C99 and later do not support implicit function declarations
      [-Wimplicit-function-declaration]
  118 |                 int w = wcwidth(wc);
      |                         ^

@infrastation
Copy link
Member

How did it build in Appveyor then?

@gvanem
Copy link
Contributor

gvanem commented Nov 4, 2025

How did it build in Appveyor then?

Since HAVE_WCHAR_T was not detected and used I presume. No diff for CMakeLists.txt AFAICS.

@lotabout
Copy link
Author

lotabout commented Nov 4, 2025

@gvanem Could you please help me give it another try on Windows? I’ve added Markus Kuhn’s implementation as a replacement on Windows.

@gvanem
Copy link
Contributor

gvanem commented Nov 4, 2025

@lotabout Tried it, but I see a lot of junk with with windump.exe --utf8 -Ar utf8.pcap. Like;
image

I assume your utf8.pcap file is based on this or this which displays better in my TCC shell:

image

@lotabout lotabout force-pushed the feat/support-utf8 branch 2 times, most recently from 6629672 to a13a471 Compare November 5, 2025 16:37
@lotabout
Copy link
Author

lotabout commented Nov 5, 2025

@gvanem Please help to try again with latest code, it turns out that locale should be set correctly for mbrtowc to work properly.

(left: tcpdump, right: cat. Compiled on Windows 11 with VS 2022, Shell: PowerShell with $OutputEncoding = [System.Text.Encoding]::UTF8 set)
image

image

@lotabout lotabout closed this Nov 5, 2025
@infrastation
Copy link
Member

tcpdump CI is failing because of my recent changes in libpcap. Please wait until this is fixed.

@lotabout lotabout reopened this Nov 5, 2025
@lotabout
Copy link
Author

lotabout commented Nov 5, 2025

tcpdump CI is failing because of my recent changes in libpcap. Please wait until this is fixed.

@infrastation Got it, Thanks~

@gvanem
Copy link
Contributor

gvanem commented Nov 5, 2025

Work very well now!

@lotabout
Copy link
Author

@fxlb Hi! Sorry for the ping (I saw you're very active here). This is my first PR to this project — is there anything else I should do to help get it merged, apart from passing the CI checks?

@gvanem
Copy link
Contributor

gvanem commented Dec 24, 2025

Why is nobody besides me and @lotabout interested in merging this?



/*
* The blow is_utf8_printable is taken from ngrep
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Presumably that should be "The below is_utf8_printable is taken from ngrep".

}

/* Check if the wide character is printable */
#if defined(_WIN32) || defined(_WIN64)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there any compilers for Windows that define _WIN64, but don't define _WIN32, on "modern" Windows? ("modern" here means "not 16-bit Windows".)

utf8_len = ndo->ndo_utf8 ? is_utf8_printable(cp, length, NULL) : 0;

if (utf8_len > 0) {
/* Valid printable UTF-8 character */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if utf8_len is greater than length - i.e., you have a partial UTF-8 character?

@guyharris
Copy link
Member

The title of the pull request and commit message should probably be just "Add support to display packet content as UTF-8", with "Fix #1190" in the body of the commit message.

@fxlb
Copy link
Member

fxlb commented Dec 25, 2025

  1. The UTF8 support in ngrep is very new: First PR (40) on Oct 31, PR 41 merged on Nov 1, so very few tests.
    (Tested with one file, says the author.)
    Comment in PR 41:
    "given all the crazy things UTF-8 can do (e.g. with combining characters)" ...
    [https://github.com/jpr5/ngrep/]

  2. Currently, I think Unicode handling can wait. We have made and continue to make great efforts to make the code more robust. Now is not the time to weaken it. We could look at this change after the 5.0 release.

@infrastation
Copy link
Member

infrastation commented Dec 26, 2025

I agree that for this new feature it is more important to implement it safely than quickly. In particular, it should be impossible to weaponise it as is sometimes done using ANSI escape codes, see CVE-2025-46394 for one example. So this requires a bit more attention before it is declared safe and ready.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

5 participants