-
Notifications
You must be signed in to change notification settings - Fork 721
Use file content heuristics to decide file reader. #1962
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
Conversation
…sed on the magic number.
…ics detection method.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## dev #1962 +/- ##
========================================
Coverage 83.41% 83.41%
========================================
Files 311 311
Lines 55002 55153 +151
Branches 12098 11857 -241
========================================
+ Hits 45878 46005 +127
- Misses 7889 7919 +30
+ Partials 1235 1229 -6
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
enum class CaptureFileFormat | ||
{ | ||
Unknown, | ||
Pcap, | ||
PcapNG, | ||
Snoop, | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto: this can be an enum inside of CaptureFileFormatDetector
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose. It has internal linkage so it doesn't really matter.
But then we would end up with really long case names: CaptureFileFormatDetector::FileFormat::Pcap
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it's fine? It's all internal anyway...
Tests/Pcap++Test/Tests/FileTests.cpp
Outdated
PTF_ASSERT_NOT_NULL(dynamic_cast<pcpp::PcapNgFileReaderDevice*>(genericReader)); | ||
PTF_ASSERT_TRUE(genericReader->open()); | ||
// ------- IFileReaderDevice::createReader() Factory | ||
// TODO: Move to a separate unit test. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should add the following to get more coverage:
- Open a snoop file
- Open a file that is not any of the options
- Open pcap files with different magic numbers
- Assuming we add a version check for snoop and pcap file: create temp files with bogus data that has the magic number but wrong versions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
3d713ab adds the following tests:
- Pcap, PcapNG, Zst file with correct content + extension
- Pcap, PcanNG file with correct content + wrong extension
- Bogus content file with correct extension (pcap, pcapng, zst)
- Bogus content file with wrong extension (txt)
Haven't found a snoop file to add. Do we have any?
Open pcap files with different magic numbers
Do you mean Pcap content that has just its magic number changed? Because IMO it is reasonable to consider that invalid format and fail as regular bogus data.
Assuming we add a version check for snoop and pcap file: create temp files with bogus data that has the magic number but wrong versions
Pending on #1962 (comment) .
Move it out if it needs to be reused somewhere.
Libpcap supports reading this format since 0.9.1. The heuristics detection will identify such magic number as pcap and leave final support decision to the pcap backend infrastructure.
@Dimi1010 some CI tests fail... |
@seladb I think I found the issue. According to ChatGPT the Winpcap's The NPcap sdk and drivers provide their own wpcap.dll where they do support nanosec precision. The tests we have on nanosecond precision pcap files ("nanosecs.pcap") currently are flawed. We don't use a static base base to compare to. We write a packet with the Due to the above the current nanosecond test in the master branch passes with Winpcap, even though it doesn't actually write / read a nanosecond precision pcap. The new tests use a static copy of Edit: |
Maybe we can remove pcap files that have nanoseconds precision from this PR and replace them with other pcap files to make the tests pass? |
Then, we don't have coverage for the magic number for nanosecond pcap. See #1977. It is a fix for the underlying issue. |
…le-selection # Conflicts: # Pcap++/src/PcapFileDevice.cpp
Nanoseconds are skipped if they aren't supported.
…td compression is unsupported, the format detector will return Unknown.
|
||
// PcapNG backend can support ZstdCompressed Pcap files, so we assume an archive is compressed PcapNG. | ||
// If Zstd is not supported, we cannot read the file anyway, so we return Unknown. | ||
if (isPcapNgFile(content) || (checkZstdSupport() && isZstdArchive(content))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Decided to return Unknown
instead of PcapNG
, if Zst is not supported in the binary.
This will in turn, return nullptr
for reader device. I think that is better than returning a PcapNGDevice
that can't be opened.
Maybe we can also do something similar for PcapReaderDevice's nanosecond format files, if the environment can't read them?
Any thoughts @seladb ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I agree that it's the correct behavior 👍
|
||
// PcapNG backend can support ZstdCompressed Pcap files, so we assume an archive is compressed PcapNG. | ||
// If Zstd is not supported, we cannot read the file anyway, so we return Unknown. | ||
if (isPcapNgFile(content) || (checkZstdSupport() && isZstdArchive(content))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I agree that it's the correct behavior 👍
Tests/Pcap++Test/TestDefinition.h
Outdated
PTF_TEST_CASE(TestIFileReaderDeviceFactory_Pcap_MicroPrecision); | ||
PTF_TEST_CASE(TestIFileReaderDeviceFactory_Pcap_NanoPrecision); | ||
PTF_TEST_CASE(TestIFileReaderDeviceFactory_PcapNG); | ||
PTF_TEST_CASE(TestIFileReaderDeviceFactory_PcapNG_ZST); | ||
PTF_TEST_CASE(TestIFileReaderDeviceFactory_PcapNG_ZST_Unsupported); | ||
PTF_TEST_CASE(TestIFileReaderDeviceFactory_Invalid); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add another test for the snoop format?
Tests/Pcap++Test/Tests/FileTests.cpp
Outdated
} | ||
}; | ||
|
||
PTF_TEST_CASE(TestIFileReaderDeviceFactory_Pcap_MicroPrecision) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In addition to test a real pcap file, maybe we can add syntethic files that have a different magic number to test all options?
We don't have to put them in PcapExample/file_heuristics
, instead we can create vectors with the content std::vector<uint8_t>
and save them to temp files
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, what would be the purpose? Just to test that it returns nullptr
?
Doesn't TestIFileReaderDeviceFactory_Invalid
already handle that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean the other possible magic numbers of a valid pcap file. Since it's not easy to find such pcap files, we can generate synthetic files that are not actually valid, but will look valid for the sake of the test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, you want a spoofed pcap sample for just these:
// Libpcap 0.9.1 and later support reading a modified pcap format that contains an extended header.
// Format reference: https://wiki.wireshark.org/Development/LibpcapFileFormat#modified-pcap
0xa1'b2'cd'34, // Alexey Kuznetzov's modified libpcap format
0x34'cd'b2'a1 // Alexey Kuznetzov's modified libpcap format (byte-swapped)
or for the byte swapped versions of micro and nano too?
Tests/Pcap++Test/Tests/FileTests.cpp
Outdated
// Garbage data, correct extension | ||
constexpr const char* PCAP_BOGUS_FILE_PATH = "PcapExamples/file_heuristics/bogus-content.pcap"; | ||
constexpr const char* PCAPNG_BOGUS_FILE_PATH = "PcapExamples/file_heuristics/bogus-content.pcapng"; | ||
constexpr const char* PCAPNG_ZST_BOGUS_FILE_PATH = "PcapExamples/file_heuristics/bogus-content.zst"; | ||
|
||
// Garbage data | ||
constexpr const char* BOGUS_FILE_PATH = "PcapExamples/file_heuristics/bogus-content.txt"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto: since the extension doesn't matter, do we need all 4 files?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically this verifies that the extension doesn't matter? I guess this can be said about #1962 (comment) too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it's really needed...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm... maybe not all. Would be nice to keep at least one to validate that nullptr
is returned on garbage?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated 07804da. Left only the .txt case, since the others are duplicates.
The PR adds heuristics based on the file content that is more robust than deciding based on the file extension.
The new decision model scans the start of the file for its magic number signature. It then compares it to the signatures of supported file types [1] and constructs a reader instance based on the result.
A new function
createReader
has been added due to changes in the public API of the factory.nullptr
is returned. (previously returnedPcapFileDeviceReader
)std::runtime_error
is thrown. (previously returnedPcapFileDeviceReader
)std::unique_ptr<IFileDeviceReader>
instead ofIFileDeviceReader*
.