Skip to content

Commit 3c01117

Browse files
authored
Merge pull request #10 from Unsupervisedcom/upstream-v0.6.2
Upstream v0.6.2
2 parents 51dcc80 + 63533c9 commit 3c01117

File tree

144 files changed

+7129
-2147
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

144 files changed

+7129
-2147
lines changed

.github/dependabot.yml

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,26 @@
11
version: 2
22
updates:
3+
- package-ecosystem: uv
4+
directory: '/'
5+
schedule:
6+
interval: monthly
7+
target-branch: 'dev'
8+
39
- package-ecosystem: pip
410
directory: '/backend'
511
schedule:
612
interval: monthly
713
target-branch: 'dev'
14+
15+
- package-ecosystem: npm
16+
directory: '/'
17+
schedule:
18+
interval: monthly
19+
target-branch: 'dev'
20+
821
- package-ecosystem: 'github-actions'
922
directory: '/'
1023
schedule:
1124
# Check for updates to GitHub Actions every week
1225
interval: monthly
26+
target-branch: 'dev'

.github/workflows/format-backend.yaml

Lines changed: 12 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -5,10 +5,18 @@ on:
55
branches:
66
- main
77
- dev
8+
paths:
9+
- 'backend/**'
10+
- 'pyproject.toml'
11+
- 'uv.lock'
812
pull_request:
913
branches:
1014
- main
1115
- dev
16+
paths:
17+
- 'backend/**'
18+
- 'pyproject.toml'
19+
- 'uv.lock'
1220

1321
jobs:
1422
build:
@@ -17,15 +25,17 @@ jobs:
1725

1826
strategy:
1927
matrix:
20-
python-version: [3.11]
28+
python-version:
29+
- 3.11.x
30+
- 3.12.x
2131

2232
steps:
2333
- uses: actions/checkout@v4
2434

2535
- name: Set up Python
2636
uses: actions/setup-python@v5
2737
with:
28-
python-version: ${{ matrix.python-version }}
38+
python-version: '${{ matrix.python-version }}'
2939

3040
- name: Install dependencies
3141
run: |

.github/workflows/format-build-frontend.yaml

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,10 +5,18 @@ on:
55
branches:
66
- main
77
- dev
8+
paths-ignore:
9+
- 'backend/**'
10+
- 'pyproject.toml'
11+
- 'uv.lock'
812
pull_request:
913
branches:
1014
- main
1115
- dev
16+
paths-ignore:
17+
- 'backend/**'
18+
- 'pyproject.toml'
19+
- 'uv.lock'
1220

1321
jobs:
1422
build:
@@ -21,7 +29,7 @@ jobs:
2129
- name: Setup Node.js
2230
uses: actions/setup-node@v4
2331
with:
24-
node-version: '22' # Or specify any other version you want to use
32+
node-version: '22'
2533

2634
- name: Install Dependencies
2735
run: npm install

CHANGELOG.md

Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,46 @@ All notable changes to this project will be documented in this file.
55
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
66
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
77

8+
## [0.6.2] - 2025-04-06
9+
10+
### Added
11+
12+
- 🌍 **Improved Global Language Support**: Expanded and refined translations across multiple languages to enhance clarity and consistency for international users.
13+
14+
### Fixed
15+
16+
- 🛠️ **Accurate Tool Descriptions from OpenAPI Servers**: External tools now use full endpoint descriptions instead of summaries when generating tool specifications—helping AI models understand tool purpose more precisely and choose the right tool more accurately in tool workflows.
17+
- 🔧 **Precise Web Results Source Attribution**: Fixed a key issue where all web search results showed the same source ID—now each result gets its correct and distinct source, ensuring accurate citations and traceability.
18+
- 🔍 **Clean Web Search Retrieval**: Web search now retains only results from URLs where real content was successfully fetched—improving accuracy and removing empty or broken links from citations.
19+
- 🎵 **Audio File Upload Response Restored**: Resolved an issue where uploading audio files did not return valid responses, restoring smooth file handling for transcription and audio-based workflows.
20+
21+
### Changed
22+
23+
- 🧰 **General Backend Refactoring**: Multiple behind-the-scenes improvements streamline backend performance, reduce complexity, and ensure a more stable, maintainable system overall—making everything smoother without changing your workflow.
24+
25+
## [0.6.1] - 2025-04-05
26+
27+
### Added
28+
29+
- 🛠️ **Global Tool Servers Configuration**: Admins can now centrally configure global external tool servers from Admin Settings > Tools, allowing seamless sharing of tool integrations across all users without manual setup per user.
30+
- 🔐 **Direct Tool Usage Permission for Users**: Introduced a new user-level permission toggle that grants non-admin users access to direct external tools, empowering broader team collaboration while maintaining control.
31+
- 🧠 **Mistral OCR Content Extraction Support**: Added native support for Mistral OCR as a high-accuracy document loader, drastically improving text extraction from scanned documents in RAG workflows.
32+
- 🖼️ **Tools Indicator UI Redesign**: Enhanced message input now smartly displays both built-in and external tools via a unified dropdown, making it simpler and more intuitive to activate tools during conversations.
33+
- 📄 **RAG Prompt Improved and More Coherent**: Default RAG system prompt has been revised to be more clear and citation-focused—admins can leave the template field empty to use this new gold-standard prompt.
34+
- 🧰 **Performance & Developer Improvements**: Major internal restructuring of several tool-related components, simplifying styling and merging external/internal handling logic, resulting in better maintainability and performance.
35+
- 🌍 **Improved Translations**: Updated translations for Tibetan, Polish, Chinese (Simplified & Traditional), Arabic, Russian, Ukrainian, Dutch, Finnish, and French to improve clarity and consistency across the interface.
36+
37+
### Fixed
38+
39+
- 🔑 **External Tool Server API Key Bug Resolved**: Fixed a critical issue where authentication headers were not being sent when calling tools from external OpenAPI tool servers, ensuring full security and smooth tool operations.
40+
- 🚫 **Conditional Export Button Visibility**: UI now gracefully hides export buttons when there's nothing to export in models, prompts, tools, or functions, improving visual clarity and reducing confusion.
41+
- 🧪 **Hybrid Search Failure Recovery**: Resolved edge case in parallel hybrid search where empty or unindexed collections caused backend crashes—these are now cleanly skipped to ensure system stability.
42+
- 📂 **Admin Folder Deletion Fix**: Addressed an issue where folders created in the admin workspace couldn't be deleted, restoring full organizational flexibility for admins.
43+
- 🔐 **Improved Generic Error Feedback on Login**: Authentication errors now show simplified, non-revealing messages for privacy and improved UX, especially with federated logins.
44+
- 📝 **Tool Message with Images Improved**: Enhanced how tool-generated messages with image outputs are shown in chat, making them more readable and consistent with the overall UI design.
45+
- ⚙️ **Auto-Exclusion for Broken RAG Collections**: Auto-skips document collections that fail to fetch data or return "None", preventing silent errors and streamlining retrieval workflows.
46+
- 📝 **Docling Text File Handling Fix**: Fixed file parsing inconsistency that broke docling-based RAG functionality for certain plain text files, ensuring wider file compatibility.
47+
848
## [0.6.0] - 2025-03-31
949

1050
### Added

backend/open_webui/config.py

Lines changed: 31 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -331,12 +331,14 @@ def __getattr__(self, key):
331331
# OAuth config
332332
####################################
333333

334+
334335
ENABLE_OAUTH_SIGNUP = PersistentConfig(
335336
"ENABLE_OAUTH_SIGNUP",
336337
"oauth.enable_signup",
337338
os.environ.get("ENABLE_OAUTH_SIGNUP", "False").lower() == "true",
338339
)
339340

341+
340342
OAUTH_MERGE_ACCOUNTS_BY_EMAIL = PersistentConfig(
341343
"OAUTH_MERGE_ACCOUNTS_BY_EMAIL",
342344
"oauth.merge_accounts_by_email",
@@ -466,6 +468,7 @@ def __getattr__(self, key):
466468
os.environ.get("OAUTH_USERNAME_CLAIM", "name"),
467469
)
468470

471+
469472
OAUTH_PICTURE_CLAIM = PersistentConfig(
470473
"OAUTH_PICTURE_CLAIM",
471474
"oauth.oidc.avatar_claim",
@@ -878,6 +881,17 @@ def oidc_oauth_register(client):
878881
pass
879882
OPENAI_API_BASE_URL = "https://api.openai.com/v1"
880883

884+
####################################
885+
# TOOL_SERVERS
886+
####################################
887+
888+
889+
TOOL_SERVER_CONNECTIONS = PersistentConfig(
890+
"TOOL_SERVER_CONNECTIONS",
891+
"tool_server.connections",
892+
[],
893+
)
894+
881895
####################################
882896
# WEBUI
883897
####################################
@@ -1034,6 +1048,11 @@ def oidc_oauth_register(client):
10341048
== "true"
10351049
)
10361050

1051+
USER_PERMISSIONS_FEATURES_DIRECT_TOOL_SERVERS = (
1052+
os.environ.get("USER_PERMISSIONS_FEATURES_DIRECT_TOOL_SERVERS", "False").lower()
1053+
== "true"
1054+
)
1055+
10371056
USER_PERMISSIONS_FEATURES_WEB_SEARCH = (
10381057
os.environ.get("USER_PERMISSIONS_FEATURES_WEB_SEARCH", "True").lower() == "true"
10391058
)
@@ -1071,6 +1090,7 @@ def oidc_oauth_register(client):
10711090
"temporary_enforced": USER_PERMISSIONS_CHAT_TEMPORARY_ENFORCED,
10721091
},
10731092
"features": {
1093+
"direct_tool_servers": USER_PERMISSIONS_FEATURES_DIRECT_TOOL_SERVERS,
10741094
"web_search": USER_PERMISSIONS_FEATURES_WEB_SEARCH,
10751095
"image_generation": USER_PERMISSIONS_FEATURES_IMAGE_GENERATION,
10761096
"code_interpreter": USER_PERMISSIONS_FEATURES_CODE_INTERPRETER,
@@ -1727,6 +1747,11 @@ class BannerModel(BaseModel):
17271747
os.getenv("DOCUMENT_INTELLIGENCE_KEY", ""),
17281748
)
17291749

1750+
MISTRAL_OCR_API_KEY = PersistentConfig(
1751+
"MISTRAL_OCR_API_KEY",
1752+
"rag.mistral_ocr_api_key",
1753+
os.getenv("MISTRAL_OCR_API_KEY", ""),
1754+
)
17301755

17311756
BYPASS_EMBEDDING_AND_RETRIEVAL = PersistentConfig(
17321757
"BYPASS_EMBEDDING_AND_RETRIEVAL",
@@ -1875,26 +1900,25 @@ class BannerModel(BaseModel):
18751900
)
18761901

18771902
DEFAULT_RAG_TEMPLATE = """### Task:
1878-
Respond to the user query using the provided context, incorporating inline citations in the format [source_id] **only when the <source_id> tag is explicitly provided** in the context.
1903+
Respond to the user query using the provided context, incorporating inline citations in the format [id] **only when the <source> tag includes an explicit id attribute** (e.g., <source id="1">).
18791904
18801905
### Guidelines:
18811906
- If you don't know the answer, clearly state that.
18821907
- If uncertain, ask the user for clarification.
18831908
- Respond in the same language as the user's query.
18841909
- If the context is unreadable or of poor quality, inform the user and provide the best possible answer.
18851910
- If the answer isn't present in the context but you possess the knowledge, explain this to the user and provide the answer using your own understanding.
1886-
- **Only include inline citations using [source_id] (e.g., [1], [2]) when a `<source_id>` tag is explicitly provided in the context.**
1887-
- Do not cite if the <source_id> tag is not provided in the context.
1911+
- **Only include inline citations using [id] (e.g., [1], [2]) when the <source> tag includes an id attribute.**
1912+
- Do not cite if the <source> tag does not contain an id attribute.
18881913
- Do not use XML tags in your response.
18891914
- Ensure citations are concise and directly related to the information provided.
18901915
18911916
### Example of Citation:
1892-
If the user asks about a specific topic and the information is found in "whitepaper.pdf" with a provided <source_id>, the response should include the citation like so:
1893-
* "According to the study, the proposed method increases efficiency by 20% [whitepaper.pdf]."
1894-
If no <source_id> is present, the response should omit the citation.
1917+
If the user asks about a specific topic and the information is found in a source with a provided id attribute, the response should include the citation like in the following example:
1918+
* "According to the study, the proposed method increases efficiency by 20% [1]."
18951919
18961920
### Output:
1897-
Provide a clear and direct response to the user's query, including inline citations in the format [source_id] only when the <source_id> tag is present in the context.
1921+
Provide a clear and direct response to the user's query, including inline citations in the format [id] only when the <source> tag with id attribute is present in the context.
18981922
18991923
<context>
19001924
{{CONTEXT}}

backend/open_webui/main.py

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -105,6 +105,8 @@
105105
OPENAI_API_CONFIGS,
106106
# Direct Connections
107107
ENABLE_DIRECT_CONNECTIONS,
108+
# Tool Server Configs
109+
TOOL_SERVER_CONNECTIONS,
108110
# Code Execution
109111
ENABLE_CODE_EXECUTION,
110112
CODE_EXECUTION_ENGINE,
@@ -191,6 +193,7 @@
191193
DOCLING_SERVER_URL,
192194
DOCUMENT_INTELLIGENCE_ENDPOINT,
193195
DOCUMENT_INTELLIGENCE_KEY,
196+
MISTRAL_OCR_API_KEY,
194197
RAG_TOP_K,
195198
RAG_TOP_K_RERANKER,
196199
RAG_TEXT_SPLITTER,
@@ -355,6 +358,7 @@
355358

356359
from open_webui.utils.auth import (
357360
get_license_data,
361+
get_http_authorization_cred,
358362
decode_token,
359363
get_admin_user,
360364
get_verified_user,
@@ -477,6 +481,15 @@ async def lifespan(app: FastAPI):
477481

478482
app.state.OPENAI_MODELS = {}
479483

484+
########################################
485+
#
486+
# TOOL SERVERS
487+
#
488+
########################################
489+
490+
app.state.config.TOOL_SERVER_CONNECTIONS = TOOL_SERVER_CONNECTIONS
491+
app.state.TOOL_SERVERS = []
492+
480493
########################################
481494
#
482495
# DIRECT CONNECTIONS
@@ -582,6 +595,7 @@ async def lifespan(app: FastAPI):
582595
app.state.config.DOCLING_SERVER_URL = DOCLING_SERVER_URL
583596
app.state.config.DOCUMENT_INTELLIGENCE_ENDPOINT = DOCUMENT_INTELLIGENCE_ENDPOINT
584597
app.state.config.DOCUMENT_INTELLIGENCE_KEY = DOCUMENT_INTELLIGENCE_KEY
598+
app.state.config.MISTRAL_OCR_API_KEY = MISTRAL_OCR_API_KEY
585599

586600
app.state.config.TEXT_SPLITTER = RAG_TEXT_SPLITTER
587601
app.state.config.TIKTOKEN_ENCODING_NAME = TIKTOKEN_ENCODING_NAME
@@ -862,6 +876,10 @@ async def commit_session_after_request(request: Request, call_next):
862876
@app.middleware("http")
863877
async def check_url(request: Request, call_next):
864878
start_time = int(time.time())
879+
request.state.token = get_http_authorization_cred(
880+
request.headers.get("Authorization")
881+
)
882+
865883
request.state.enable_api_key = app.state.config.ENABLE_API_KEY
866884
response = await call_next(request)
867885
process_time = int(time.time()) - start_time

backend/open_webui/retrieval/loaders/main.py

Lines changed: 27 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,9 @@
2020
YoutubeLoader,
2121
)
2222
from langchain_core.documents import Document
23+
24+
from open_webui.retrieval.loaders.mistral import MistralLoader
25+
2326
from open_webui.env import SRC_LOG_LEVELS, GLOBAL_LOG_LEVEL
2427

2528
logging.basicConfig(stream=sys.stdout, level=GLOBAL_LOG_LEVEL)
@@ -181,13 +184,16 @@ def load(
181184
for doc in docs
182185
]
183186

187+
def _is_text_file(self, file_ext: str, file_content_type: str) -> bool:
188+
return file_ext in known_source_ext or (
189+
file_content_type and file_content_type.find("text/") >= 0
190+
)
191+
184192
def _get_loader(self, filename: str, file_content_type: str, file_path: str):
185193
file_ext = filename.split(".")[-1].lower()
186194

187195
if self.engine == "tika" and self.kwargs.get("TIKA_SERVER_URL"):
188-
if file_ext in known_source_ext or (
189-
file_content_type and file_content_type.find("text/") >= 0
190-
):
196+
if self._is_text_file(file_ext, file_content_type):
191197
loader = TextLoader(file_path, autodetect_encoding=True)
192198
else:
193199
loader = TikaLoader(
@@ -196,11 +202,14 @@ def _get_loader(self, filename: str, file_content_type: str, file_path: str):
196202
mime_type=file_content_type,
197203
)
198204
elif self.engine == "docling" and self.kwargs.get("DOCLING_SERVER_URL"):
199-
loader = DoclingLoader(
200-
url=self.kwargs.get("DOCLING_SERVER_URL"),
201-
file_path=file_path,
202-
mime_type=file_content_type,
203-
)
205+
if self._is_text_file(file_ext, file_content_type):
206+
loader = TextLoader(file_path, autodetect_encoding=True)
207+
else:
208+
loader = DoclingLoader(
209+
url=self.kwargs.get("DOCLING_SERVER_URL"),
210+
file_path=file_path,
211+
mime_type=file_content_type,
212+
)
204213
elif (
205214
self.engine == "document_intelligence"
206215
and self.kwargs.get("DOCUMENT_INTELLIGENCE_ENDPOINT") != ""
@@ -222,6 +231,15 @@ def _get_loader(self, filename: str, file_content_type: str, file_path: str):
222231
api_endpoint=self.kwargs.get("DOCUMENT_INTELLIGENCE_ENDPOINT"),
223232
api_key=self.kwargs.get("DOCUMENT_INTELLIGENCE_KEY"),
224233
)
234+
elif (
235+
self.engine == "mistral_ocr"
236+
and self.kwargs.get("MISTRAL_OCR_API_KEY") != ""
237+
and file_ext
238+
in ["pdf"] # Mistral OCR currently only supports PDF and images
239+
):
240+
loader = MistralLoader(
241+
api_key=self.kwargs.get("MISTRAL_OCR_API_KEY"), file_path=file_path
242+
)
225243
else:
226244
if file_ext == "pdf":
227245
loader = PyPDFLoader(
@@ -257,9 +275,7 @@ def _get_loader(self, filename: str, file_content_type: str, file_path: str):
257275
loader = UnstructuredPowerPointLoader(file_path)
258276
elif file_ext == "msg":
259277
loader = OutlookMessageLoader(file_path)
260-
elif file_ext in known_source_ext or (
261-
file_content_type and file_content_type.find("text/") >= 0
262-
):
278+
elif self._is_text_file(file_ext, file_content_type):
263279
loader = TextLoader(file_path, autodetect_encoding=True)
264280
else:
265281
loader = TextLoader(file_path, autodetect_encoding=True)

0 commit comments

Comments
 (0)