Skip to content

How to use inference mode of ASMv2 (llama.cpp)? #24

@sailfish009

Description

@sailfish009

The GPU memory consumption of the model was too high, so I converted it to a LLAMA.CPP file. The GPU memory usage is fine.
However, due to the nature of the model converted to llama.cpp in the model inference step, we need to convert the input parameter format. If there are any llama.cpp experts, we would appreciate it if you could tell us how to convert it.

        # all-seeing/all-seeing-v2/llava/eval/model_vqa_loader_vocab_rank.py line 156 :
        # model ( == ASMv2.gguf )
        # Below are the source code locations that need to be converted
        with torch.inference_mode():
            logits = model(
                input_ids=input_ids,
                attention_mask=attention_mask,
                images=image_tensor.to(dtype=torch.float16, device=args.device, non_blocking=True),
            ).logits

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions