You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Adding the support of CCL to the Prefilling of Disaggregated Serving (quic#825)
In this PR, I have added the support of CCL during prefilling of
Disaggregated Serving. In the current version, we only have the support
of CCL during decoding of DA which results in very high TTFT for larger
Context Lengths. With this added we can compile the model with the
largest CL and yet get good TTFT for smaller PL using the related CCL
value instead of CL.
These changes don't affect other applications and are only related to
Disaggregated Serving and only prefilling of Disaggregated Serving.
---------
Signed-off-by: Vahid Janfaza <vjanfaza@qti.qualcomm.com>
# Check the last element of ccl_prefill and ccl_decode to make sure it's not less than ctx_len
139
-
ifccl_prefill[-1] <ctx_len-1:
140
-
ccl_prefill.append(ctx_len)
141
-
ifccl_decode[-1] <ctx_len:
142
-
ccl_decode.append(ctx_len)
141
+
# Check the last element of ccl_prefill and ccl_decode to make sure it's not less than ctx_len
142
+
ifccl_decode[-1] <ctx_len:
143
+
ccl_decode.append(ctx_len)
143
144
144
145
ifprefill_seq_len==1:
145
146
# both prefill and decode ccl can share the same specializations since prefill_seq_len=1. So, a sorted union of both lists can be used for both of them.
# Handling the common values between ccl_prefill and ccl_decode. The elements of these two lists should be unique (COMPILER)
157
-
tmp_prefill=ccl_prefill
158
-
ccl_prefill= []
159
-
forvalintmp_prefill:
160
-
whilevalinccl_decodeorvalinccl_prefill:
161
-
val-=1
162
-
ifval<0:
163
-
break# Prevent negative values
164
-
ifval>=0:
165
-
ccl_prefill.append(val)
166
-
ccl_prefill.sort()
157
+
# This cheking is related to disaggregated serving application since it generates two separate QPCs for prefilling and decoding. So, ccl_prefill will be None in decode QPC and ccl_decode will be None in prefill QPC
158
+
ifccl_prefillandccl_decode:
159
+
# Handling the common values between ccl_prefill and ccl_decode. The elements of these two lists should be unique (COMPILER)
160
+
tmp_prefill=ccl_prefill
161
+
ccl_prefill= []
162
+
forvalintmp_prefill:
163
+
whilevalinccl_decodeorvalinccl_prefill:
164
+
# In case of common values between ccl_prefill and ccl_decode, change the value in ccl_prefill and set it to the closest value which is multiple of CCL_UNIQNE_STEP to avoid repetition and also be hardware and compiler efficient
0 commit comments