The truss server model.py

```
`    def parse_input(self,
                    tokenizer,
                    input_text=None,
                    prompt_template=None,
                    input_file=None,
                    add_special_tokens=True,
                    max_input_length=923,
                    pad_id=None,
                    num_prepend_vtokens=[],
                    model_name=None,
                    model_version=None):
        if pad_id is None:
            pad_id = tokenizer.pad_token_id

        batch_input_ids = []
        if input_file is None:
            for curr_text in input_text:
                if prompt_template is not None:
                    curr_text = prompt_template.format(input_text=curr_text)
                input_ids = tokenizer.encode(curr_text,
                                             add_special_tokens=add_special_tokens,
                                             truncation=True,
                                             max_length=max_input_length)
                batch_input_ids.append(input_ids)

        batch_input_ids = [
            torch.tensor(x, dtype=torch.int32) for x in batch_input_ids
        ]
        return batch_input_ids
`

```

In the Above part of the code 
this line torch.tensor(x, dtype=torch.int32) for x in batch_input_ids

We have our model Compiled in fp16 now during the parse shouldn’t we use torch.int16?
The model outputs that we get are 

`"{'output_ids': tensor([[[1602,  298, 4430,  ...,    2,    2,    2]]], device='cuda:0',\n       dtype=torch.int32), 'sequence_lengths': tensor([[6]], device='cuda:0', dtype=torch.int32)}"}` 


Hence will this be able to be decoded at the predict step where we are using a tokenizer that is 

      ```
      if self.runtime_rank == 0:
                output_ids = outputs['output_ids']
                sequence_lengths = outputs['sequence_lengths']
                batch_size, num_beams, _ = output_ids.size()
                for batch_idx in range(batch_size):
                    for beam in range(num_beams):
                        output_begin = input_lengths[batch_idx]
                        output_end = sequence_lengths[batch_idx][beam]
                        outputs = output_ids[batch_idx][beam][
                                  output_begin:output_end].tolist()
                        output_text = self.tokenizer.decode(outputs)
                        return {"output": output_text}
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The truss server model.py #1

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

The truss server model.py #1

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions