We want to upgrade pcap-parser to support richer device identification and behavioral analysis. This involves extracting additional protocol fields (HTTP, DHCP, mDNS/SSDP), resolving MAC OUIs, and flagging connections to advertising domains.
This will support downstream classification tasks.
Features to Extract / Compute
user_agent_info → via http.user_agent
dhcp_hostname → via bootp.option.hostname
oui_friendly → map eth.src to vendor via oui.txt
base_domains → extract from hostnames using tldextract
talks_to_ads → flag if dst_main_domain is in tracker_domains.json
Final output should include the following features per flow:
user_agent_info, oui_friendly, dhcp_hostname, netdisco_info, concatenated_user_labels (external), base_domains, talks_to_ads