Hi, I tried both the CTC model and the base model, and it seems that the bug I encountered previously hasn't been fixed yet. The post hoc results are still somewhat better than the original FSQ.
Additionally, the CTC loss model seems to produce clearer pronunciation and overall better performance. Have any specific evaluations or comparisons been done in this part?