Hi,
First appreciate your excellent work on ACL and sharing this dataset with us!
I have a question and look forword to your reply. In file CoDesc.json I see original code and preprocessed code. So how can I get non-tokenized code, that is, its comments have been removed, but haven't been tokenized, which is grammatically correct in java and can be parsed into AST successfully?
Hope for your reply and thanks again!