Skip to content

Commit c8b1850

Browse files
chore: README.md
1 parent 3f5d0fa commit c8b1850

File tree

2 files changed

+14
-6
lines changed

2 files changed

+14
-6
lines changed

README.md

Lines changed: 13 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -226,14 +226,21 @@ print(dataset['text'][0])
226226
```python
227227
dataset = gemma_template.load_dataset(
228228
"your_huggingface_dataset",
229-
output_format='gpt', # enum: text, gpt, alpaca
230-
instruction_template=INSTRUCTION_TEMPLATE, # Template for instruction the user prompt.
231-
structure_template=STRUCTURE_TEMPLATE, # Template for structuring the user prompt.
232-
max_hidden_ratio=10, # Percentage of documents that need to be word masked. Min: 0, Max: 1. Default: 0.
229+
# enum: text, gpt, alpaca
230+
output_format='gpt',
231+
# Template for instruction the user prompt.
232+
instruction_template=INSTRUCTION_TEMPLATE,
233+
# Template for structuring the user prompt.
234+
structure_template=STRUCTURE_TEMPLATE,
235+
# Percentage of documents that need to be word masked.
236+
# Min: 0, Max: 1. Default: 0.
237+
max_hidden_ratio=.1,
233238
# Replace 10% of words in the input document with '_____'.
234239
# Use int to extract the correct number of words. The `max_hidden_ratio` parameter must be greater than 0.
235240
max_hidden_words=.1,
236-
min_chars_length=2, # Minimum character of a word, used to create unigrams, bigrams, and trigrams. Default is 2.
237-
max_chars_length=8 # Maximum character of a word, used to create unigrams, bigrams and trigrams. Default is 0.
241+
# Minimum character of a word, used to create unigrams, bigrams, and trigrams. Default is 2.
242+
min_chars_length=2,
243+
# Maximum character of a word, used to create unigrams, bigrams and trigrams. Default is 0.
244+
max_chars_length=8,
238245
)
239246
```

gemma_template/models.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -580,6 +580,7 @@ def _close():
580580
with tqdm(total=len(dataset)) as pbar:
581581
for field in dataset.column_names:
582582
items = []
583+
max_hidden_count = int(len(dataset[field]) * max_hidden_ratio)
583584
_ = loop.run_until_complete(run_task(dataset[field]))
584585
mapping[field] = Dataset.from_list(items)
585586

0 commit comments

Comments
 (0)