Python: feat(oracle): add new Oracle connector for Semantic Kernel #13229

monita1208 · 2025-10-07T19:17:18Z

…ync support

Motivation and Context

This change is required to enable Semantic Kernel users to store and retrieve embeddings using Oracle databases. Currently, Semantic Kernel supports vector storage for several backends, but Oracle was missing. This connector solves that gap by providing full async support, native VECTOR type handling, and vector index management.

Description

This PR introduces a new Oracle connector for Semantic Kernel with the following features:

Asynchronous upsert, get, delete and search operations for memory records.
Native Oracle VECTOR type support for storing embeddings efficiently.
Support for HNSW and IVFFLAT vector indexes for similarity search.
Integration with Semantic Kernel collections, enabling semantic search and memory operations.
Comprehensive unit tests to ensure correctness and stability.

The connector is designed to work seamlessly with existing Semantic Kernel memory abstractions and follows the same async patterns as other vector stores.

Integration tests have also been implemented and verified locally; however, they are not included in this PR because the current CI environment setup for Oracle Database support is unknown.
Once guidance is provided on Oracle DB availability in the CI pipeline, integration tests can be enabled and added in a follow-up PR.

Contribution Checklist

The code builds clean without any errors or warnings
The PR follows the SK Contribution Guidelines and the pre-submission formatting script raises no violations
All unit tests pass, and I have added new tests where possible
I didn't break anyone 😄

…ync support

monita1208 · 2025-10-07T19:37:13Z

@monita1208 please read the following Contributor License Agreement(CLA). If you agree with the CLA, please reply with the following information.
@microsoft-github-policy-service agree [company="oracle"]
Options:

(default - no company specified) I have sole ownership of intellectual property rights to my Submissions and I am not making Submissions in the course of work for my employer.
@microsoft-github-policy-service agree
(when company given) I am making Submissions in the course of work for my employer (or my employer has intellectual property rights in my Submissions by contract or applicable law). I have permission from my employer to make Submissions and enter into this Agreement on behalf of my employer. By signing below, the defined term “You” includes me and my employer.
@microsoft-github-policy-service agree company="Microsoft"
Contributor License Agreement

alexkeh · 2025-10-07T20:30:08Z

@microsoft-github-policy-service agree [company="Oracle"]

monita1208 · 2025-10-07T20:34:22Z

@microsoft-github-policy-service agree company="Oracle"

eavanvalkenburg · 2025-10-08T12:42:16Z

python/semantic_kernel/connectors/oracle.py

@@ -0,0 +1,1349 @@
+# Copyright (c) 2025, Oracle Corporation. All rights reserved.
+
+from __future__ import annotations


this shouldn't be necessary

Removed it.

eavanvalkenburg · 2025-10-08T12:46:40Z

python/semantic_kernel/connectors/oracle.py

+    VectorSearchExecutionException,
+    VectorStoreOperationException
+)
+from semantic_kernel.exceptions.memory_connector_exceptions import MemoryConnectorConnectionException


Suggested change

from semantic_kernel.exceptions.memory_connector_exceptions import MemoryConnectorConnectionException

from semantic_kernel.exceptions import MemoryConnectorConnectionException

Updated code.

eavanvalkenburg · 2025-10-08T12:51:30Z

python/semantic_kernel/connectors/oracle.py

+    async connection pools for Oracle.
+    """
+
+    user: str | None = Field(default=None, validation_alias=ORACLE_USER_ENV_VAR)


This is not the intended use of a KernelBaseSettings object, make sure in this class to set:

env_prefix: ClassVar[str] = "ORACLE_"

and then each of the parameters will be prefixed by that + the name of the param capatilized. And then you can remove all the validation_alias's

So only, min and max, should then become pool_min and pool_max.

And it is important to document in the docstring which parameters there are and what their respective env variable name is.

Updated code!

eavanvalkenburg · 2025-10-08T12:51:40Z

python/semantic_kernel/connectors/oracle.py

+
+    connection_pool: oracledb.AsyncConnectionPool | None = None
+
+    model_config = SettingsConfigDict(


this is also likely not needed

removed it!

eavanvalkenburg · 2025-10-08T12:52:02Z

python/semantic_kernel/connectors/oracle.py

+    wallet_location: str | None = Field(default=None, validation_alias=ORACLE_WALLET_LOCATION_ENV_VAR)
+    wallet_password: SecretStr | None = Field(default=None, validation_alias=ORACLE_WALLET_PASSWORD_ENV_VAR)
+
+    connection_pool: oracledb.AsyncConnectionPool | None = None


this should be a PrivateAttr or changed to _connection_pool

Updated code!

eavanvalkenburg · 2025-10-08T12:52:43Z

python/semantic_kernel/connectors/oracle.py

+    def _unwrap_secret(self, value):
+        if value is None:
+            return None
+        return value.get_secret_value() if hasattr(value, "get_secret_value") else str(value)


is this really needed, since you only use this for parameters that you know are secrets...

updated code!

eavanvalkenburg · 2025-10-08T12:53:09Z

python/semantic_kernel/connectors/oracle.py

+            # Create pool with extra user-supplied kwargs
+            self.connection_pool = oracledb.create_pool_async(
+                user=self.user,
+                password=self._unwrap_secret(self.password),


Suggested change

password=self._unwrap_secret(self.password),

password=self.password.get_secret_value() if self.password else None

updated code!

eavanvalkenburg · 2025-10-08T12:53:42Z

python/semantic_kernel/connectors/oracle.py

+    connection_pool: oracledb.AsyncConnectionPool | None = None
+    db_schema: str | None = None
+    pool_args: dict[str, Any] | None = None
+    supported_key_types: ClassVar[set[str] | None] = {"str", "int", "UUID"}


is UUID a separate type (in python)?

yes, UUID is a separate type in Python, not just a string.

eavanvalkenburg · 2025-10-08T12:57:10Z

python/semantic_kernel/connectors/oracle.py

+        query, bind, columns = await self._inner_search_vector(options, values, vector, **kwargs)
+
+        # If total count is requested, fetch all rows to count.
+        if options.include_total_count:


if this is set, but the database doesn't support a parameter, then we shouldnt pull everything in, just ignore the setting

Having considered options.include_total_count for Oracle, here are some possible approaches,

Raise a warning or log message (e.g., RuntimeWarning or logger.warning) indicating that include_total_count is not supported for Oracle and will be ignored.

Raise a NotSupportedError if we want stricter enforcement and to prevent misuse in performance-sensitive scenarios.

Fetch all rows and log a warning a balanced approach,

Users still get the total count.

A clear log message or warning highlights that fetching all rows may be inefficient for large datasets.

The behavior will be properly documented to ensure developers are aware of the performance implications.

I recommend option 3 as it maintains correctness while providing transparency and guidance but like to hear your thoughts or if you prefer a stricter approach.

@eavanvalkenburg Need your input here.

I would go with 1, I think that's what we do in most other cases, just looked at some, and in many we actually don't even log anything when it's not supported, I will update that and log something in those cases. Thanks for looking at this

eavanvalkenburg · 2025-11-17T14:15:28Z

python/semantic_kernel/connectors/oracle.py

+
+# Third-party Libraries
+import numpy as np
+import oracledb


are these present in the pyproject under a appropriate extra?

Good catch, this dependency isn't included under an appropriate extra in the pyproject.toml yet. It is missing at the moment and I’ll add it so the extras section is complete and consistent.

eavanvalkenburg · 2025-11-17T14:17:31Z

python/semantic_kernel/connectors/oracle.py

+    """
+
+    env_prefix: ClassVar[str] = "ORACLE_"
+    user: str | None = Field(default=None, validation_alias=USER_ENV_VAR)


do we need all these aliases? as far as I can see, the used variable name for this line would be: ORACLE_USER and that is also the value of the USER_ENV_VAR, so this doesn't add anything

You're right that USER_ENV_VAR currently has the value "ORACLE_USER" and with validation_alias=USER_ENV_VAR the model just maps ORACLE_USER → user. Technically this doesn't add functionality if we don't need a difference between the field name and the environment variable name.

The only reason to keep the alias would be if we intentionally want Python friendly field names (user) while still reading uppercase environment variables (ORACLE_USER). If we don’t need that separation, we can remove the alias and either:

rename the field to match the env var, or

rely on the field name directly without aliasing.

So yes, if the field name and the env var name always match, the alias doesn't provide additional value.

you're misunderstanding, because of the env_prefix the field user will already map to env variable ORACLE_USER so that is already the name of the field when it loads, so the validation_alias doesn't change anything, we can always reintroduce it later if we want to change it without changing the code that uses the settings. TLDR if we remove it, the same variables are loaded

Thanks for the clarification, that makes sense now. Since env_prefix already ensures user maps to ORACLE_USER, the validation alias wasn’t actually doing anything there. I had assumed it was required for the mapping, but if the prefix already guarantees the correct variable name, then removing it is fine and keeps things clean.
I did keep the validation alias for a few fields like the pool min/max/increment values because those use different environment variable names, so in those cases the alias is still needed. Good point as well about being able to reintroduce aliases later if we ever need backward compatible names.

eavanvalkenburg · 2025-11-17T14:19:52Z

python/semantic_kernel/connectors/oracle.py

+        )
+
+        # Build settings from env if we need to manage our own pool
+        self._settings = settings or OracleSettings(env_file_path=env_file_path, env_file_encoding=env_file_encoding)


I would prefer that this is created first and then used to create the connection_pool which is then always present, or are there benefits to late initialization of the connection pool?

Just to confirm my understanding of your suggestion:

“This is created first” → you want the OracleSettings object to be created in the store, not inside the collection.

“Then used to create the connection_pool” → the store would also manage the pool creation (either eagerly or via centralized async logic), so that collections never handle settings or pool creation themselves.

Collections would then simply consume the store’s connection_pool, without reading env vars or creating _settings.

Is this the approach you were envisioning?

No, Collection's should be usable without a Store. So what we do in most connectors is:

check if there is something that we need before calling super.__init__()

create those (often involving creating settings and with those settings, create a client or connection pool)

pass the newly created connection_pool and the managed_client appropriately set to super

remember that super.init in this case is all about pydantic, so it's about filling out the defined and validated fields.
The advantage of this approach is that you can actually set connection_pool: oracledb.AsyncConnectionPool | None = None to connection_pool: oracledb.AsyncConnectionPool simplying the logic later on.

I do recognize that connection pools are always somewhat special so if this makes more sense for oracle connection pools then by al means create them on first run instead.

Thanks, with your guidance, I’m going to switch to eager initialization.
That means:
Before calling super().init, I will:

Build the connector settings.

Create the Oracle AsyncConnectionPool eagerly.

Determine whether the collection should manage the client (managed_client=True if the pool was created internally).

Then I will call super().init(...) with the final validated fields, including:

connection_pool=

managed_client=<True/False>

Because initialization is now eager, there’s no need for lazy initialization inside aenter.
aenter will simply return self.

In aexit, I’ll close the pool only if managed_client=True.

eavanvalkenburg · 2025-11-17T14:20:39Z

python/semantic_kernel/connectors/oracle.py

+        env_file_path: str | None = None,
+        env_file_encoding: str | None = None,
+        settings: OracleSettings | None = None,
+        pool_args: dict[str, Any] | None = None,


these aren't documented

Will add to docstring

eavanvalkenburg · 2025-11-19T07:35:46Z

python/semantic_kernel/connectors/oracle.py

+    VectorSearchExecutionException,
+    VectorStoreOperationException
+)
+from semantic_kernel.exceptions import MemoryConnectorConnectionException


when you have the pre-commit setup, these should be merged, this is also checked here as a GH action, so have a look at the DEV_SETUP.md in the python root!

feat(connectors): add Oracle connector with native VECTOR type and as…

e388a24

…ync support

monita1208 requested a review from a team as a code owner October 7, 2025 19:17

monita1208 closed this Oct 7, 2025

monita1208 reopened this Oct 7, 2025

markwallace-microsoft added the msft.ext.vectordata Related to Microsoft.Extensions.VectorData label Oct 7, 2025

markwallace-microsoft assigned roji, westey-m and eavanvalkenburg and unassigned westey-m Oct 7, 2025

eavanvalkenburg reviewed Oct 8, 2025

View reviewed changes

monita1208 added 3 commits October 9, 2025 19:38

adressing review comments

7673b8c

Merge branch 'microsoft:main' into feature/oracle-connector

4ad5ee2

updates in search query

c351f1b

markwallace-microsoft added the python Pull requests for the Python Semantic Kernel label Nov 4, 2025

github-actions bot changed the title ~~feat(oracle): add new Oracle connector for Semantic Kernel~~ Python: feat(oracle): add new Oracle connector for Semantic Kernel Nov 4, 2025

eavanvalkenburg reviewed Nov 17, 2025

View reviewed changes

eavanvalkenburg reviewed Nov 19, 2025

View reviewed changes

		@@ -0,0 +1,1349 @@
		# Copyright (c) 2025, Oracle Corporation. All rights reserved.

		from __future__ import annotations

	from semantic_kernel.exceptions.memory_connector_exceptions import MemoryConnectorConnectionException
	from semantic_kernel.exceptions import MemoryConnectorConnectionException


		connection_pool: oracledb.AsyncConnectionPool \| None = None

		model_config = SettingsConfigDict(

	password=self._unwrap_secret(self.password),
	password=self.password.get_secret_value() if self.password else None

Python: feat(oracle): add new Oracle connector for Semantic Kernel #13229

Are you sure you want to change the base?

Python: feat(oracle): add new Oracle connector for Semantic Kernel #13229

Conversation

monita1208 commented Oct 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation and Context

Description

Contribution Checklist

Uh oh!

monita1208 commented Oct 7, 2025

Uh oh!

alexkeh commented Oct 7, 2025

Uh oh!

monita1208 commented Oct 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

monita1208 Oct 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

monita1208 commented Oct 7, 2025 •

edited

Loading

monita1208 commented Oct 7, 2025 •

edited

Loading

monita1208 Oct 8, 2025 •

edited

Loading

monita1208 Nov 20, 2025 •

edited

Loading