Skip to content

Commit 79bb1ba

Browse files
Refactor prepare_data function with type hints
Updated type hints in the prepare_data function to use List and Tuple from typing. Improved return type annotations for clarity.
1 parent 0285ec9 commit 79bb1ba

File tree

1 file changed

+10
-3
lines changed

1 file changed

+10
-3
lines changed

cerebrosllmutils/llm_utils.py

Lines changed: 10 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -6,8 +6,15 @@
66
"""
77

88

9+
from typing import List, Tuple, Any
910

10-
def prepare_data(data_0: list[str], tokenizer_0, max_seq_length: int = 1024, prompt_length: int=1):
11+
12+
13+
def prepare_data(
14+
data_0: List[str],
15+
tokenizer_0: Any,
16+
max_seq_length: int = 1024,
17+
prompt_length: int = 1) -> Tuple[List[List[int]], List[List[int]], int]:
1118
"""
1219
Prepares tokenized input sequences and corresponding labels for training the Cerebros
1320
[not so] large language model.
@@ -38,9 +45,9 @@ def prepare_data(data_0: list[str], tokenizer_0, max_seq_length: int = 1024, pro
3845
Returns:
3946
--------
4047
tuple:
41-
- all_input_ids (list of list of int): list[list[int]] Token IDs for each input sequence, shaped
48+
- all_input_ids (2d list of int): Tuple[List[List[int]] Token IDs for each input sequence, shaped
4249
[num_samples, max_seq_length].
43-
- all_labels (list of list of int): list[list[int]] One-hot encoded labels for next-token prediction,
50+
- all_labels (2d list of int): Tuple[List[List[int]] One-hot encoded labels for next-token prediction,
4451
shaped [num_samples, vocab_size].
4552
- vocab_size (int): Size of the tokenizer's vocabulary, used for label dimensions.
4653

0 commit comments

Comments
 (0)