You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
# Check the last element of ccl_prefill and ccl_decode to make sure it's not less than ctx_len
139
-
ifccl_prefill[-1] <ctx_len-1:
140
-
ccl_prefill.append(ctx_len)
141
-
ifccl_decode[-1] <ctx_len:
142
-
ccl_decode.append(ctx_len)
141
+
# Check the last element of ccl_prefill and ccl_decode to make sure it's not less than ctx_len
142
+
ifccl_decode[-1] <ctx_len:
143
+
ccl_decode.append(ctx_len)
143
144
144
145
ifprefill_seq_len==1:
145
146
# both prefill and decode ccl can share the same specializations since prefill_seq_len=1. So, a sorted union of both lists can be used for both of them.
# Handling the common values between ccl_prefill and ccl_decode. The elements of these two lists should be unique (COMPILER)
157
-
tmp_prefill=ccl_prefill
158
-
ccl_prefill= []
159
-
forvalintmp_prefill:
160
-
whilevalinccl_decodeorvalinccl_prefill:
161
-
val-=1
162
-
ifval<0:
163
-
break# Prevent negative values
164
-
ifval>=0:
165
-
ccl_prefill.append(val)
166
-
ccl_prefill.sort()
157
+
# This cheking is related to disaggregated serving application since it generates two separate QPCs for prefilling and decoding. So, ccl_prefill will be None in decode QPC and ccl_decode will be None in prefill QPC
158
+
ifccl_prefillandccl_decode:
159
+
# Handling the common values between ccl_prefill and ccl_decode. The elements of these two lists should be unique (COMPILER)
160
+
tmp_prefill=ccl_prefill
161
+
ccl_prefill= []
162
+
forvalintmp_prefill:
163
+
whilevalinccl_decodeorvalinccl_prefill:
164
+
# In case of common values between ccl_prefill and ccl_decode, change the value in ccl_prefill and set it to the closest value which is multiple of CCL_UNIQNE_STEP to avoid repetition and also be hardware and compiler efficient
0 commit comments