**Problem:**The auto_insert_sync pass generates incorrect synchronization logic.
Root Cause: In ascend_sync_insert.cc, the granularity of the inserted PIPE_ALL barrier is too coarse (over-conservative).
Details: The current implementation inserts a PIPE_ALL barrier for every BufferStoreNode, which leads to excessive synchronization and performance degradation.
Involved Code:
void VisitStmt_(const BufferStoreNode* op) override {
result_.push_back(Evaluate(Call(DataType::Handle(), Op::Get("tl.ascend_auto_barrier"), {StringImm("PIPE_ALL")})));
result_.push_back(GetRef<Stmt>(op));
}