Skip to content

The truss server model.py #1

@ranjanshivaji

Description

@ranjanshivaji
`    def parse_input(self,
                    tokenizer,
                    input_text=None,
                    prompt_template=None,
                    input_file=None,
                    add_special_tokens=True,
                    max_input_length=923,
                    pad_id=None,
                    num_prepend_vtokens=[],
                    model_name=None,
                    model_version=None):
        if pad_id is None:
            pad_id = tokenizer.pad_token_id

        batch_input_ids = []
        if input_file is None:
            for curr_text in input_text:
                if prompt_template is not None:
                    curr_text = prompt_template.format(input_text=curr_text)
                input_ids = tokenizer.encode(curr_text,
                                             add_special_tokens=add_special_tokens,
                                             truncation=True,
                                             max_length=max_input_length)
                batch_input_ids.append(input_ids)

        batch_input_ids = [
            torch.tensor(x, dtype=torch.int32) for x in batch_input_ids
        ]
        return batch_input_ids
`

In the Above part of the code
this line torch.tensor(x, dtype=torch.int32) for x in batch_input_ids

We have our model Compiled in fp16 now during the parse shouldn’t we use torch.int16?
The model outputs that we get are

"{'output_ids': tensor([[[1602, 298, 4430, ..., 2, 2, 2]]], device='cuda:0',\n dtype=torch.int32), 'sequence_lengths': tensor([[6]], device='cuda:0', dtype=torch.int32)}"}

Hence will this be able to be decoded at the predict step where we are using a tokenizer that is

  ```
  if self.runtime_rank == 0:
            output_ids = outputs['output_ids']
            sequence_lengths = outputs['sequence_lengths']
            batch_size, num_beams, _ = output_ids.size()
            for batch_idx in range(batch_size):
                for beam in range(num_beams):
                    output_begin = input_lengths[batch_idx]
                    output_end = sequence_lengths[batch_idx][beam]
                    outputs = output_ids[batch_idx][beam][
                              output_begin:output_end].tolist()
                    output_text = self.tokenizer.decode(outputs)
                    return {"output": output_text}

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions