-
Notifications
You must be signed in to change notification settings - Fork 153
[OpenVINO]add support for minicpmv4/4_5 #1412
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[OpenVINO]add support for minicpmv4/4_5 #1412
Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
|
@IlyasMoutawwakil could help to take a look ? |
|
Thanks for the fix ! let's create a tiny random model with llama as the decoder to test this 🤗 tell me you need help with that ! |
But i guess we need merge this PR first ? otherwise test case will not work |
|
@openvino-dev-samples no need to merge it now, you can simply pin that PR in setup.py so that the tests would run with it 🤗 |
Hi since minicpmv4 and minicpmv share a same model type, but different LLM. It is possible to add both of them in utils_tests.py ? |
|
@openvino-dev-samples yes, you can name it minicpmv4 in utils_tests.py |
|
Hi @openvino-dev-samples it would be faster if you made sure the minicpmv4 tests pass locally, the ci is slow and shouldn't be used as a testing mechanism, only use it for validating when local tests are already passing. |
Sorry for that, and i fully understand, but i always met connection issue in local test case, e.g
|
you can target minicpmv tests specifically to avoid this issue with pytest -k "minicpmv" |
tests/openvino/utils_tests.py
Outdated
| "minicpm3": "katuni4ka/tiny-random-minicpm3", | ||
| "minicpmv": "katuni4ka/tiny-random-minicpmv-2_6", | ||
| "minicpmv4": "snake7gun/minicpm-v-4-tiny", | ||
| "minicpmv4_5": "snake7gun/tiny-minicpmv-4_5", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
158M model size, it makes sense to try to reduce the size
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please add tests for inference to test generate() method and compare with transformers
|
@IlyasMoutawwakil could you help to trigger CI, thanks |
| if isinstance(behavior, str) and not isinstance(behavior, MiniCPMVConfigBehavior): | ||
| behavior = MiniCPMVConfigBehavior(behavior) | ||
|
|
||
| model_mapping = {2.6: "llama", 4.0: "qwen2", 4.5: "qwen3"} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should use str for versions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
may i understand why?the version in model's config is a number:
https://huggingface.co/openbmb/MiniCPM-V-4_5/blob/main/config.json#L3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah okay I see ! thanks for the clarification.
(it's generally a bad idea to use numbers for versions: 4.0 becomes 4 and 4.10 and 4.1 are the same version 😅)
| if isinstance(behavior, str) and not isinstance(behavior, MiniCPMVConfigBehavior): | ||
| behavior = MiniCPMVConfigBehavior(behavior) | ||
|
|
||
| model_mapping = {2.6: "llama", 4.0: "qwen2", 4.5: "qwen3"} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is a bad idea to make decision about architecture based on model version in general.
I think you should parse class model object and use isinstance for inner objects to make decision.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, its a better approach in this case, but I dont know if we can access the modeling file at this stage.
b1e9ace to
02a4acf
Compare
|
@openvino-dev-samples Please fix the failed tests |
done |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please do additional patch for temporal_ids as we discussed. Without this, functionality is limited.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please also update the PR description according to this comment #1491 (comment)
| max_size = self.config.vision_config.image_size // self.config.vision_config.patch_size | ||
| self._pos_embeds = torch.from_numpy(self._get_2d_sincos_pos_embed(self.embed_dim, max_size)).float() | ||
| self.max_size = (max_size, max_size) | ||
| self.max_temporal_size = 72000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why 72000? Should this value be loaded from the config?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its default for this model: https://huggingface.co/openbmb/MiniCPM-V-4_5/blob/main/resampler.py#L100
and not initialized from config: https://huggingface.co/openbmb/MiniCPM-V-4_5/blob/main/modeling_minicpmv.py#L50
| all_temporal_ids = None | ||
| if temporal_ids is not None: | ||
| all_temporal_ids = [] | ||
| for t in temporal_ids: | ||
| all_temporal_ids.extend(t) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| all_temporal_ids = None | |
| if temporal_ids is not None: | |
| all_temporal_ids = [] | |
| for t in temporal_ids: | |
| all_temporal_ids.extend(t) | |
| all_temporal_ids = [t for seq_t in temporal_ids for t in seq_t] if temporal_ids is not None else None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
its copied from original model: https://huggingface.co/openbmb/MiniCPM-V-4_5/blob/main/modeling_minicpmv.py#L94
| # example: [[-1], [-1], [2, 6, 9]] | ||
| temporal_ids_flatten = list(chain.from_iterable(temporal_ids)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we actually need to do an additional flattening pass here? As I understand all_temporal_ids is prepared already flattened inside get_vision_embeddings(). If not needed, I'd remove flattening logic from get_vision_embeddings() and keep it only here.
| if max_temporal_size > -1: | ||
| temporal_pos_emb = True | ||
| if max_temporal_size > self.max_temporal_size: | ||
| self._adjust_temporal_pos_cache(max_temporal_size, "cpu") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see a definition of self._adjust_temporal_pos_cache(). Since the tests pass, this means the code does not reach this point in any of existing tests. Please clarify this. Ideally, every scenario should be tested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to align with original model.
| if temporal_ids_flatten[i] == -1: | ||
| pos_embed_temporal.append(torch.zeros(self.embed_dim, dtype=torch.float32, device="cpu")) | ||
| else: | ||
| pos_embed_temporal.append(self.temporal_pos_embed[temporal_ids_flatten[i]].to(torch.float32)) # D |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is self.temporal_pos_embed defined?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
|
|
||
| def resampling(self, x, tgt_sizes): | ||
| def resampling(self, x, tgt_sizes, temporal_ids=None): | ||
| from itertools import chain |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be imported at the top of the file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this imports is used by minicpmv only, so i think it can be left here. e.g https://github.com/huggingface/optimum-intel/blob/main/optimum/intel/openvino/modeling_visual_language.py#L1229
|
|
||
| self._adjust_pos_cache(tgt_sizes) | ||
|
|
||
| temporal_pos_emb = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For me these names are a bit confusing: temporal_pos_emb, pos_embed_temporal, self.temporal_pos_embed, temporal_embed. I would suggest to rename these variable to something more meaningful. For example, use_temporal_pos_embed instead of temporal_pos_emb.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i only created temporal_embed, and other of them are from original modeling file directly.
| 1, 0, 2 | ||
| ) # BLD => L * B * D | ||
| res = torch.from_numpy(self.resampler(image_feature=x, pos_embed=pos_embed, key_padding_mask=key_padding_mask)) | ||
| if temporal_pos_emb: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| if temporal_pos_emb: | |
| if len(pos_embed_temporal) > 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
its copied from original model: https://huggingface.co/openbmb/MiniCPM-V-4_5/blob/main/resampler.py#L216
| if is_transformers_version("<", "4.49"): | ||
| expected = {"llama4", "qwen2_5_vl", "phi4mm"} | ||
| expected = {"llama4", "qwen2_5_vl", "phi4mm", "minicpmv4", "minicpmv4_5"} | ||
| elif is_transformers_version("<", "4.51"): | ||
| expected = {"llama4", "phi4mm"} | ||
| elif is_transformers_version("<", "4.52"): | ||
| expected = set() | ||
| else: | ||
| expected = {"llava-qwen2", "phi3_v", "phi4mm", "minicpmo"} | ||
| expected = {"llava-qwen2", "phi3_v", "phi4mm", "minicpmo", "minicpmv4", "minicpmv4_5"} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From this, I get an understanding that minicpmv4/minicpmv4_5 are supported for transformers 4.49 .. 4.51. Is this correct? If so, please set MIN_TRANSFORMERS_VERSION = "4.49.0" and MAX_TRANSFORMERS_VERSION = "4.51.3" for MiniCPMVOpenVINOConfig.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dont see any limitation on these 2 models. They can share same same version of transformers with minicpm-v-2.6
e0385e6 to
09a3f19
Compare
Depends on PR
As LLM of minicpmv4 switched to Llama
https://huggingface.co/openbmb/MiniCPM-V-4/blob/main/modeling_minicpmv.py#L26
What does this PR do?
Conversion cmd-line for openbmb/MiniCPM-V-4 or MiniCPM-V-4_5:
Inference of MiniCPM-V-4_5 using OpenVINO backend:
Before submitting