Skip to content

Conversation

@helena-intel
Copy link
Collaborator

This is #1297 updated to latest main branch.

Currently inference on Phi-3-mini and Phi-4-mini returns bad outputs (random characters) when context gets larger than about 2000 tokens. This PR, contributed by @eaidova , fixes that. This is not my code. The original PR is no longer being updated; I'm making this a new PR to make it easier to discuss and add updates.

I saw no negative impact on inference speed. I see slightly different outputs with shorter contexts on SPR (on inference with the model exported with the PR vs the model exported with main). Any suggestions to fix that would be much appreciated.

Draft PR for now, awaiting some feedback and testing, but I hope we can merge this soon.

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Comment on lines +361 to +364
# if config.model_type == "phi3" and config.max_position_embeddings != getattr(
# config, "original_max_position_embeddings", config.max_position_embeddings
# ):
# config.max_position_embeddings = config.original_max_position_embeddings
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be deleted but I left it here for now for context.

@helena-intel helena-intel added the openvino-slow Runs OpenVINO slow tests with different versions of transformers label Oct 30, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

openvino-slow Runs OpenVINO slow tests with different versions of transformers

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants