Skip to content

Commit 9d1120b

Browse files
Update generative-proof-of-concept-CPU-preprocessing-in-memory.py
Try using the temporary WEB bible data set.
1 parent 25d8277 commit 9d1120b

File tree

1 file changed

+5
-3
lines changed

1 file changed

+5
-3
lines changed

generative-proof-of-concept-CPU-preprocessing-in-memory.py

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -271,9 +271,11 @@ def prepare_data(data, max_seq_length: int = MAX_SEQ_LENGTH):
271271

272272
## Only add re, tokenizer already in script
273273

274-
275-
with open('king-james-bible.txt', 'r') as kjv:
276-
bible = kjv.read()
274+
from vanilla-datasets.web-english-bible.py import samples as bible
275+
# with open('king-james-bible.txt', 'r') as kjv:
276+
# # bible = kjv.read()
277+
# #
278+
# bible = samples
277279

278280

279281
def package_non_instruct_text(text: str, desired_samples: int, max_length_tokens: int) -> list[str]:

0 commit comments

Comments
 (0)