Question about testing CONCODE dataset in text-to-code (code generation)

I'm new to this field and want to test my fine tuned model's capability to generate the code, using this CONCODE dataset. 
However, I found that there is no ground truth for test.json (https://github.com/microsoft/CodeXGLUE/blob/main/Text-Code/text-to-code/dataset/concode/test.json). 
How can I obtain the model's score after inference? 
Is there a hidden server for evaluation?