I'm new to this field and want to test my fine tuned model's capability to generate the code, using this CONCODE dataset.
However, I found that there is no ground truth for test.json (https://github.com/microsoft/CodeXGLUE/blob/main/Text-Code/text-to-code/dataset/concode/test.json).
How can I obtain the model's score after inference?
Is there a hidden server for evaluation?