The label extractor module defines the logic of assigning labels to each AST. The selected label type also defines the granularity level of label extraction for the whole pipeline. Currently, 3 types of labels are supported. You can specify only one.
Label extractor config classes are defined in LabelExtractorConfigs.kt.
granularity: files
Use the file name of the source file as a label.
name: file namegranularity: files
Use the name of the parent folder of the source file as a label. May be useful for code classification datasets, e.g., POJ-104.
name: folder namegranularity: functions
Use the name of each function as a label.
name: function nameIf a function name is used as the label, the module additionally processes the AST to avoid data leaks. It looks for all recursive calls of this function and replaces the function name in the
tokenvalue of the respective vertices withMETHOD_NAME.