Skip to content

Poor performance on function boundary detection. #1

@5c4lar

Description

@5c4lar

We evaluated the function boundary detection accuracy of DeepDi on our dataset with the following code:

def process(file):
    output = ""
    with DD.Open(file.encode()) as file_data:
        for sec in file_data.sections.iter(DD.Section):
            if not sec.executable:
                continue
            if sec.end - sec.start > 0:
                text_result = file_data.disassemble(sec.start, sec.end, False)
                for data in text_result.functions.iter(ctypes.c_int64):
                    output += str(data.value) + '\n'
                
    with open(os.path.join('output', os.path.basename(file)), "w") as f:
        f.write(output)

The result is much worse than what you claimed in your paper, with average precision at 0.19743204989853363 and average recall at 0.10910939236737424. We also test the accuracy of ddisasm, which is much better. Both precision and recall are close to 1.0.
Is there anything wrong with my implementation? If so, could you please give an example of the right way to detect the function boundary with DeepDi? Or is this just because your model is overfitting and cannot generalize well?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions