Skip to content

Commit 71d0a31

Browse files
Adjust end_prompt_index calculation for samples
hange end_prompt_index from PROMPT_LENGTH to (PROMPT_LENGTH - 1); Not sure if this is right. Let it test.
1 parent 78b5bb3 commit 71d0a31

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

generative-proof-of-concept-CPU-preprocessing-in-memory.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -224,7 +224,7 @@ def prepare_data(data, max_seq_length: int = MAX_SEQ_LENGTH):
224224
end_prompt_index = sample_tokens.index(end_prompt_token_id)
225225
except ValueError:
226226
# If </prompt> not found, treat sample as a non-instruct sample
227-
end_prompt_index = PROMPT_LENGTH # int(np.ceil(len(sample_tokens) * (1/3))) # 0 ## 1. Give it a fair starting place to predict the next word 2. reduce the number of expanded samples
227+
end_prompt_index = (PROMPT_LENGTH - 1) # int(np.ceil(len(sample_tokens) * (1/3))) # 0 ## 1. Give it a fair starting place to predict the next word 2. reduce the number of expanded samples
228228

229229
# Find first pad token after </prompt>
230230
first_pad_index = None

0 commit comments

Comments
 (0)