Thanks for the good work. In the Algorithm 1, there is a attention mask computing process. Then the inference is based on the attention mask. But I do not find the code for attention mask computing process. Can you help to locate the corresponding code. Thanks!
