Skip to content

Refactor multiple flows to enhance performance, boost scalability, and ensure stability.#131

Draft
luuquangvu wants to merge 275 commits intoNativu5:mainfrom
luuquangvu:main
Draft

Refactor multiple flows to enhance performance, boost scalability, and ensure stability.#131
luuquangvu wants to merge 275 commits intoNativu5:mainfrom
luuquangvu:main

Conversation

@luuquangvu
Copy link
Copy Markdown
Collaborator

This PR is still a work in progress and uses features that aren't yet officially available in the Gemini-API library, so we'll need to wait for the library's official update before merging. Feel free to try it out and share any feedback or report any issues you encounter. Thanks!

Here are some highlights of the changes:

  • The entire logic for storing conversation history has been rewritten, aiming for compatibility with various endpoints and easy scalability in the future.
  • The logic of the endpoints has been rewritten, and now all endpoints work correctly with both streaming and non-streaming flows.
  • Compatible with the latest library updates, including the ability to download full-size images and enable video or music generation.
  • All cookie-related errors will be fully resolved, and users will get a clear notification if the server invalidates cookies, making it simple to know when to manually refresh a new one.

…nd update response handling for consistency
…andling, and improved extension determination
- Introduced `model_strategy` configuration for "append" (default + custom models) or "overwrite" (custom models only).
- Enhanced `/v1/models` endpoint to return models based on the configured strategy.
- Improved model loading with environment variable overrides and validation.
- Refactored model handling logic for improved modularity and error handling.
…eld support

- Enhanced `extract_gemini_models_env` to handle nested fields within environment variables.
- Updated type hints for more flexibility in model overrides.
- Improved `_merge_models_with_env` to better support field-level updates and appending new models.
- Moved utility functions like `strip_code_fence`, `extract_tool_calls`, and `iter_stream_segments` to a centralized helper module.
- Removed unused and redundant private methods from `chat.py`, including `_strip_code_fence`, `_strip_tagged_blocks`, and `_strip_system_hints`.
- Updated imports and references across modules for consistency.
- Simplified tool call and streaming logic by replacing inline implementations with shared helper functions.
- Replaced unused model placeholder in `config.yaml` with an empty list.
- Added JSON parsing validators for `model_header` and `models` to enhance flexibility and error handling.
- Improved validation to filter out incomplete model configurations.
…N support

- Replaced prefix-based parsing with a root key approach.
- Added JSON parsing to handle list-based model configurations.
- Improved handling of errors and cleanup of environment variables.
…to Python literals

- Added `ast.literal_eval` as a fallback for parsing environment variables when JSON decoding fails.
- Improved error handling and logging for invalid configurations.
- Ensured proper cleanup of environment variables post-parsing.
- Adjusted `TOOL_CALL_RE` regex pattern for better accuracy.
…nvironment variables; enhance error logging in config validation
…tring or list structure for enhanced flexibility in automated environments
…s found in either the raw or cleaned history.
@luuquangvu
Copy link
Copy Markdown
Collaborator Author

Using latest image it seems to not support multi turn conversation am I right? Every message I send it forget the older one

I don't know how you tested it, but it works fine for me. Note that to keep conversations running continuously through restarts, you need to enable Gemini Activities.

@Vigno04
Copy link
Copy Markdown
Contributor

Vigno04 commented Mar 26, 2026

Gemini activity is enabled, but I think it is using temporary chat (the best option to not saturate my chat history with all the users request) I think it should work regardless the original code worked

@luuquangvu
Copy link
Copy Markdown
Collaborator Author

The library's author is preparing to release a major 2.0 update. Therefore, this PR will also have to wait for it, as some features are being developed based on the latest code from the library.

@luuquangvu
Copy link
Copy Markdown
Collaborator Author

@Nativu5 The library is now at version 2.0. There's currently PR #134 waiting for you to merge it. Would you like to merge it before this PR? This PR has undergone many changes, so I need a stable main branch to effectively manage and resolve all merge conflicts. Thank you!

@Vigno04
Copy link
Copy Markdown
Contributor

Vigno04 commented Apr 14, 2026

I think one improvement you could make is stripping unnecessary tokens before sending messages to the chat.

For example, when using Open WebUI, I see messages formatted like this looking at them from the gemini web ui:

 <|im_start|>user

aiutami a valutare ...

<|im_end|>

<|im_start|>assistant 

As you can see, it includes all the special tags. However, these tags are already reintroduced by Gemini on the backend, since the message is sent as part of a web ui chat request.

Removing them would have a few benefits:

  • Based on tests with tiktoken (using Gemma), it reduces around ~30 tokens per request
  • It may improve model performance by avoiding duplicated start/end tokens
  • It could make the traffic less detectable by Google, as the message would resemble a more standard chat format

Overall, stripping these redundant tokens seems like a simple optimization with multiple advantages, i write it here since opening a pull request for this really small feature seems a bit pointless and also i wanted a second opinion on the matter

@luuquangvu
Copy link
Copy Markdown
Collaborator Author

@Vigno04 Thank you for your feedback. Regarding why we need to add ChatML tags or unnecessary system hints, you can refer back to previous issues like #59. It might save a few tokens, but it won't work with some clients that require a call tool.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants