Skip to content

[Feature] Add “family-uniform” class_distribution for data generation #389

@romanziske

Description

@romanziske

Is there an existing feature already?

  • Yes, I have checked the existing features.

Description

When training family-level classifiers, the current configuration only supports per-class (signal-level) distributions. This leads to heavy oversampling of large families (e.g. FSK, OFDM) and undersampling of small ones (e.g. OOK, FM), skewing both training and evaluation. It would be very useful to introduce a “family_uniform” mode that automatically assigns equal total probability to each family, regardless of how many individual signals it contains.

category #instances category #instances category #instances
am 2892 ask 3420 chirp 2180
fm 790 fsk 10944 ofdm 8468
ook 748 psk 4036 qam 5718
tone 742
total 39938

I currently use follwoing work around:

metadata["overrides"]["class_list"] = [
        "ook",
        "4ask", "8ask", "16ask", "32ask", "64ask",
        "2fsk", "2gfsk", "2msk", "2gmsk",
        "4fsk", "4gfsk", "4msk", "4gmsk",
        "8fsk", "8gfsk", "8msk", "8gmsk",
        "16fsk", "16gfsk", "16msk", "16gmsk",
        "bpsk", "qpsk", "8psk", "16psk", "32psk", "64psk",
        "16qam", "32qam", "32qam_cross", "64qam", "128qam_cross",
        "256qam", "512qam_cross", "1024qam",
        "ofdm-64", "ofdm-72", "ofdm-128", "ofdm-180",
        "ofdm-256", "ofdm-300", "ofdm-512", "ofdm-600",
        "ofdm-900", "ofdm-1024", "ofdm-1200", "ofdm-2048",
        "fm",
        "am-dsb-sc", "am-dsb", "am-lsb", "am-usb",
        "lfm_data", "lfm_radar", "chirpss",
        "tone",
    ]

    metadata["overrides"]["class_distribution"] = [
        # ook (1 signal ⇒ 0.1)
        0.1,
        # ask (5 signals ⇒ each 0.02)
        0.02, 0.02, 0.02, 0.02, 0.02,
        # fsk (16 signals ⇒ each 1/(10*16)=0.00625)
        *[0.00625]*16,
        # psk (6 signals ⇒ each 1/(10*6)=0.016666666666666666)
        *[0.016666666666666666]*6,
        # qam (8 signals ⇒ each 1/(10*8)=0.0125)
        *[0.0125]*8,
        # ofdm (12 signals ⇒ each 1/(10*12)=0.008333333333333333)
        *[0.008333333333333333]*12,
        # fm (1 ⇒ 0.1)
        0.1,
        # am (4 ⇒ each 1/(10*4)=0.025)
        *[0.025]*4,
        # chirp (3 ⇒ each 1/(10*3)=0.03333333333333333)
        *[0.03333333333333333]*3,
        # tone (1 ⇒ 0.1)
        0.1,
    ]

Now each family is equally sampled:

category #instances category #instances category #instances
am 3796 ask 4098 chirp 4248
fm 4026 fsk 4140 ofdm 3744
ook 3864 psk 4080 qam 4216
tone 4026
total 40238

Metadata

Metadata

Assignees

Labels

type: featureNew feature or enhancement

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions