-
Notifications
You must be signed in to change notification settings - Fork 3
Open
Description
` def parse_input(self,
tokenizer,
input_text=None,
prompt_template=None,
input_file=None,
add_special_tokens=True,
max_input_length=923,
pad_id=None,
num_prepend_vtokens=[],
model_name=None,
model_version=None):
if pad_id is None:
pad_id = tokenizer.pad_token_id
batch_input_ids = []
if input_file is None:
for curr_text in input_text:
if prompt_template is not None:
curr_text = prompt_template.format(input_text=curr_text)
input_ids = tokenizer.encode(curr_text,
add_special_tokens=add_special_tokens,
truncation=True,
max_length=max_input_length)
batch_input_ids.append(input_ids)
batch_input_ids = [
torch.tensor(x, dtype=torch.int32) for x in batch_input_ids
]
return batch_input_ids
`
In the Above part of the code
this line torch.tensor(x, dtype=torch.int32) for x in batch_input_ids
We have our model Compiled in fp16 now during the parse shouldn’t we use torch.int16?
The model outputs that we get are
"{'output_ids': tensor([[[1602, 298, 4430, ..., 2, 2, 2]]], device='cuda:0',\n dtype=torch.int32), 'sequence_lengths': tensor([[6]], device='cuda:0', dtype=torch.int32)}"}
Hence will this be able to be decoded at the predict step where we are using a tokenizer that is
```
if self.runtime_rank == 0:
output_ids = outputs['output_ids']
sequence_lengths = outputs['sequence_lengths']
batch_size, num_beams, _ = output_ids.size()
for batch_idx in range(batch_size):
for beam in range(num_beams):
output_begin = input_lengths[batch_idx]
output_end = sequence_lengths[batch_idx][beam]
outputs = output_ids[batch_idx][beam][
output_begin:output_end].tolist()
output_text = self.tokenizer.decode(outputs)
return {"output": output_text}
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels