Simple proof of concept to recreate Ghidra BSIM decompiled functions similarity using BSIM XML signatures files (generated with Ghidra bsim generatesigs command)
This allows to create a function similarity chord diagram and similarity table.
pip install -r requirements.txtpython POC-graph-sig-files.pythe /sigs/ directory contains signatures files from Mirai malwares samples, sharing many functions.
You can generate signatures files with Ghidra's bsim generatesigs command
============================================================================================================================================
DETAILED CROSS-BINARY CORRELATION TABLE
============================================================================================================================================
Score | Binary A | Func A (Feat) | Binary B | Func B (Feat)
--------------------------------------------------------------------------------------------------------------------------------------------
1.0000 | sigs_719d6c26275d3680c855 | FUN_00014f98 (34) | sigs_4c5a7be674be8ec71ac9 | __malloc_largebin_index (34)
1.0000 | sigs_719d6c26275d3680c855 | FUN_00015e48 (43) | sigs_4c5a7be674be8ec71ac9 | __malloc_trim (43)
1.0000 | sigs_32817e09143327d4552c | FUN_0041a810 (41) | sigs_2abe9d933d6821e118fa | FUN_00417214 (41)
1.0000 | sigs_32817e09143327d4552c | FUN_0041d6e0 (56) | sigs_2abe9d933d6821e118fa | FUN_00418ee0 (56)
1.0000 | sigs_32817e09143327d4552c | FUN_0041dd20 (51) | sigs_2abe9d933d6821e118fa | FUN_004190ec (51)
1.0000 | sigs_6c63810565d33b948c19 | FUN_0804d60f (36) | sigs_abccf21a652f8d012fe8 | readdir (36)
1.0000 | sigs_abccf21a652f8d012fe8 | socket_connect (31) | sigs_4c5a7be674be8ec71ac9 | socket_connect (31)
1.0000 | sigs_abccf21a652f8d012fe8 | xdr_bytes (41) | sigs_4c5a7be674be8ec71ac9 | xdr_bytes (41)
1.0000 | sigs_719d6c26275d3680c855 | FUN_00013b54 (132) | sigs_4c5a7be674be8ec71ac9 | __divsi3 (132)
1.0000 | sigs_719d6c26275d3680c855 | FUN_000142f4 (32) | sigs_4c5a7be674be8ec71ac9 | fd_to_DIR (32)
1.0000 | sigs_719d6c26275d3680c855 | FUN_00014538 (34) | sigs_4c5a7be674be8ec71ac9 | readdir (34)
1.0000 | sigs_719d6c26275d3680c855 | FUN_000146a0 (45) | sigs_4c5a7be674be8ec71ac9 | memset (45)
1.0000 | sigs_719d6c26275d3680c855 | FUN_00016728 (35) | sigs_4c5a7be674be8ec71ac9 | random_r (35)
1.0000 | sigs_719d6c26275d3680c855 | FUN_000167b8 (54) | sigs_4c5a7be674be8ec71ac9 | srandom_r (54)
1.0000 | sigs_719d6c26275d3680c855 | FUN_00016d4c (54) | sigs_4c5a7be674be8ec71ac9 | nprocessors_onln (54)
1.0000 | sigs_719d6c26275d3680c855 | FUN_00018e04 (36) | sigs_4c5a7be674be8ec71ac9 | readdir64 (36)
1.0000 | sigs_719d6c26275d3680c855 | FUN_0001a514 (69) | sigs_4c5a7be674be8ec71ac9 | fgetc_unlocked (69)
1.0000 | sigs_719d6c26275d3680c855 | FUN_0001a640 (47) | sigs_4c5a7be674be8ec71ac9 | fgets_unlocked (47)
1.0000 | sigs_719d6c26275d3680c855 | FUN_0001ab90 (50) | sigs_4c5a7be674be8ec71ac9 | strlen (50)
1.0000 | sigs_719d6c26275d3680c855 | FUN_0001abf0 (53) | sigs_4c5a7be674be8ec71ac9 | strchr (53)
1.0000 | sigs_719d6c26275d3680c855 | FUN_0001ace0 (52) | sigs_4c5a7be674be8ec71ac9 | strchrnul (52)
1.0000 | sigs_719d6c26275d3680c855 | FUN_0001b5c0 (30) | sigs_4c5a7be674be8ec71ac9 | __stdio_READ (30)
1.0000 | sigs_32817e09143327d4552c | FUN_0040be90 (38) | sigs_abccf21a652f8d012fe8 | pthread_once (38)
1.0000 | sigs_32817e09143327d4552c | FUN_00414988 (33) | sigs_2abe9d933d6821e118fa | FUN_004150b8 (33)
1.0000 | sigs_32817e09143327d4552c | FUN_0041d930 (65) | sigs_2abe9d933d6821e118fa | FUN_00418f50 (65)
1.0000 | sigs_32817e09143327d4552c | FUN_0041e8f0 (48) | sigs_4c5a7be674be8ec71ac9 | scan_getwc (48)
1.0000 | sigs_32817e09143327d4552c | FUN_00426790 (45) | sigs_2abe9d933d6821e118fa | FUN_0041771e (45)
1.0000 | sigs_32817e09143327d4552c | FUN_0042729c (32) | sigs_2abe9d933d6821e118fa | FUN_0041d51c (32)
1.0000 | sigs_2abe9d933d6821e118fa | FUN_004169d4 (43) | sigs_f56f5bba23702556c68b | memset (43)
1.0000 | sigs_2abe9d933d6821e118fa | FUN_004201c4 (51) | sigs_4c5a7be674be8ec71ac9 | __encode_dotted (51)
1.0000 | sigs_6c63810565d33b948c19 | FUN_0804e68d (43) | sigs_abccf21a652f8d012fe8 | __malloc_trim (43)
1.0000 | sigs_6c63810565d33b948c19 | FUN_0804ebbc (34) | sigs_abccf21a652f8d012fe8 | random_r (34)
1.0000 | sigs_abccf21a652f8d012fe8 | initConnection (38) | sigs_4c5a7be674be8ec71ac9 | initConnection (38)
1.0000 | sigs_abccf21a652f8d012fe8 | print (81) | sigs_f56f5bba23702556c68b | print (81)
1.0000 | sigs_abccf21a652f8d012fe8 | printi (59) | sigs_f56f5bba23702556c68b | printi (59)
1.0000 | sigs_abccf21a652f8d012fe8 | prints (53) | sigs_f56f5bba23702556c68b | prints (53)
1.0000 | sigs_abccf21a652f8d012fe8 | prints (53) | sigs_4c5a7be674be8ec71ac9 | prints (53)
1.0000 | sigs_f56f5bba23702556c68b | prints (53) | sigs_4c5a7be674be8ec71ac9 | prints (53)
1.0000 | sigs_f56f5bba23702556c68b | recvLine (56) | sigs_4c5a7be674be8ec71ac9 | recvLine (56)
1.0000 | sigs_719d6c26275d3680c855 | FUN_00013a40 (111) | sigs_4c5a7be674be8ec71ac9 | __aeabi_uidiv (111)
1.0000 | sigs_719d6c26275d3680c855 | FUN_0001af4c (60) | sigs_4c5a7be674be8ec71ac9 | inet_aton (60)
1.0000 | sigs_6c63810565d33b948c19 | FUN_0804f9c6 (64) | sigs_abccf21a652f8d012fe8 | inet_aton (64)
1.0000 | sigs_abccf21a652f8d012fe8 | sendKILLALL (70) | sigs_4c5a7be674be8ec71ac9 | sendKILLALL (70)
Graph flow is proportionnal to BSIM similarity, but only over a threshold of 0.9 Vectors under 30 features are also ignored

