microsoft
diff --git a/‎.pre-commit-config.yaml‎
Lines changed: 6 additions & 6 deletions b/‎.pre-commit-config.yaml‎
Lines changed: 6 additions & 6 deletions
diff --git a/‎deploy_ai_search/README.md‎
Lines changed: 3 additions & 3 deletions b/‎deploy_ai_search/README.md‎
Lines changed: 3 additions & 3 deletions
diff --git a/‎deploy_ai_search/src/deploy_ai_search/text_2_sql_column_value_store.py‎
Lines changed: 2 additions & 1 deletion b/‎deploy_ai_search/src/deploy_ai_search/text_2_sql_column_value_store.py‎
Lines changed: 2 additions & 1 deletion
diff --git a/‎image_processing/requirements.txt‎
Lines changed: 0 additions & 1238 deletions b/‎image_processing/requirements.txt‎
Lines changed: 0 additions & 1238 deletions
diff --git a/‎text_2_sql/.env.example‎
Lines changed: 44 additions & 0 deletions b/‎text_2_sql/.env.example‎
Lines changed: 44 additions & 0 deletions
diff --git a/‎text_2_sql/GETTING_STARTED.md‎
Lines changed: 21 additions & 6 deletions b/‎text_2_sql/GETTING_STARTED.md‎
Lines changed: 21 additions & 6 deletions
diff --git a/‎text_2_sql/README.md‎
Lines changed: 17 additions & 17 deletions b/‎text_2_sql/README.md‎
Lines changed: 17 additions & 17 deletions
diff --git a/‎text_2_sql/__init__.py‎ b/‎text_2_sql/__init__.py‎
diff --git a/‎text_2_sql/autogen/Iteration 5 - Agentic Vector Based Text2SQL.ipynb‎
Lines changed: 14 additions & 5 deletions b/‎text_2_sql/autogen/Iteration 5 - Agentic Vector Based Text2SQL.ipynb‎
Lines changed: 14 additions & 5 deletions
diff --git a/‎text_2_sql/autogen/README.md‎
Lines changed: 1 addition & 1 deletion b/‎text_2_sql/autogen/README.md‎
Lines changed: 1 addition & 1 deletion
@@ -45,12 +45,12 @@ repos:
         args: [--fix, --ignore, UP007]
         exclude: samples
 
-  - repo: https://github.com/astral-sh/uv-pre-commit
-    # uv version.
-    rev: 0.5.20
-    hooks:
-      # Update the uv lockfile
-      - id: uv-lock
+  # - repo: https://github.com/astral-sh/uv-pre-commit
+  #   # uv version.
+  #   rev: 0.5.20
+  #   hooks:
+  #     # Update the uv lockfile
+  #     - id: uv-lock
 
   - repo: local
     hooks:
 
@@ -1,8 +1,8 @@
 # AI Search Indexing Pre-built Index Setup
 
-The associated scripts in this portion of the repository contains pre-built scripts to deploy the skillset with Azure Document Intelligence.
+The associated scripts in this portion of the repository contains pre-built scripts to deploy the skillsets needed for both Text2SQL and Image Processing.
 
-## Steps for Rag Documents Index Deployment (For Unstructured RAG)
+## Steps for Rag Documents Index Deployment (For Image Processing)
 
 1. Update `.env` file with the associated values. Not all values are required dependent on whether you are using System / User Assigned Identities or a Key based authentication.
 2. Adjust `rag_documents.py` with any changes to the index / indexer. The `get_skills()` method implements the skills pipeline. Make any adjustments here in the skills needed to enrich the data source.
@@ -13,7 +13,7 @@ The associated scripts in this portion of the repository contains pre-built scri
     - `rebuild`. Whether to delete and rebuild the index.
     - `suffix`. Optional parameter that will apply a suffix onto the deployed index and indexer. This is useful if you want deploy a test version, before overwriting the main version.
 
-## Steps for Text2SQL Index Deployment (For Structured RAG)
+## Steps for Text2SQL Index Deployment (For Text2SQL)
 
 ### Schema Store Index
 
 
@@ -85,9 +85,10 @@ def get_index_fields(self) -> list[SearchableField]:
                 name="Warehouse",
                 type=SearchFieldDataType.String,
             ),
-            SimpleField(
+            SearchableField(
                 name="Column",
                 type=SearchFieldDataType.String,
+                hidden=False,
             ),
             SearchableField(
                 name="Value",
 
@@ -0,0 +1,44 @@
+# Environment variables for Text2SQL
+IdentityType=<identityType> # system_assigned or user_assigned or key
+
+Text2Sql__DatabaseEngine=<DatabaseEngine> # TSQL or PostgreSQL or Snowflake or Databricks
+Text2Sql__UseQueryCache=<Determines if the Query Cache will be used to speed up query generation. Defaults to True.> # True or False
+Text2Sql__PreRunQueryCache=<Determines if the results from the Query Cache will be pre-run to speed up answer generation. Defaults to True.> # True or False
+Text2Sql__UseColumnValueStore=<Determines if the Column Value Store will be used for schema selection Defaults to True.> # True or False
+
+# Open AI Connection Details
+OpenAI__CompletionDeployment=<openAICompletionDeploymentId. Used for data dictionary creator>
+OpenAI__MiniCompletionDeployment=<OpenAI__MiniCompletionDeploymentId. Used for agentic text2sql>
+OpenAI__Endpoint=<openAIEndpoint>
+OpenAI__ApiKey=<openAIKey if using non identity based connection>
+OpenAI__ApiVersion=<openAIApiVersion>
+
+# Azure AI Search Connection Details
+AIService__AzureSearchOptions__Endpoint=<AI search endpoint>
+AIService__AzureSearchOptions__Key=<AI search key if using non identity based connection>
+AIService__AzureSearchOptions__Text2SqlSchemaStore__Index=<Schema store index name. Default is created as "text-2-sql-schema-store-index">
+AIService__AzureSearchOptions__Text2SqlSchemaStore__SemanticConfig=<Schema store semantic config. Default is created as "text-2-sql-schema-store-semantic-config">
+AIService__AzureSearchOptions__Text2SqlQueryCache__Index=<Query cache index name. Default is created as "text-2-sql-query-cache-index">
+AIService__AzureSearchOptions__Text2SqlQueryCache__SemanticConfig=<Query cache semantic config. Default is created as "text-2-sql-query-cache-semantic-config">
+AIService__AzureSearchOptions__Text2SqlColumnValueStore__Index=<Column value store index name. Default is created as "text-2-sql-column-value-store-index">
+
+# TSQL
+Text2Sql__Tsql__ConnectionString=<Tsql databaseConnectionString if using Tsql Data Source>
+Text2Sql__Tsql__Database=<Tsql database if using Tsql Data Source>
+
+# PostgreSQL Specific Connection Details
+Text2Sql__Postgresql__ConnectionString=<Postgresql databaseConnectionString if using Postgresql Data Source>
+Text2Sql__Postgresql__Database=<Postgresql database if using Postgresql Data Source>
+
+# Snowflake Specific Connection Details
+Text2Sql__Snowflake__User=<snowflakeUser if using Snowflake Data Source>
+Text2Sql__Snowflake__Password=<snowflakePassword if using Snowflake Data Source>
+Text2Sql__Snowflake__Account=<snowflakeAccount if using Snowflake Data Source>
+Text2Sql__Snowflake__Warehouse=<snowflakeWarehouse if using Snowflake Data Source>
+Text2Sql__Snowflake__Database=<snowflakeDatabase if using Snowflake Data Source>
+
+# Databricks Specific Connection Details
+Text2Sql__Databricks__Catalog=<databricksCatalog if using Databricks Data Source with Unity Catalog>
+Text2Sql__Databricks__ServerHostname=<databricksServerHostname if using Databricks Data Source with Unity Catalog>
+Text2Sql__Databricks__HttpPath=<databricksHttpPath if using Databricks Data Source with Unity Catalog>
+Text2Sql__Databricks__AccessToken=<databricks AccessToken if using Databricks Data Source with Unity Catalog>
@@ -2,10 +2,25 @@
 
 To get started, perform the following steps:
 
+**Execute the following commands in the `deploy_ai_search` directory:**
+
 1. Setup Azure OpenAI in your subscription with **gpt-4o-mini** & an embedding model, alongside a SQL Server sample database, AI Search and a storage account.
-2. Clone this repository and deploy the AI Search text2sql indexes from `deploy_ai_search`.
-3. Run `uv sync` within the text_2_sql directory to install dependencies.
-4. Configure the .env file based on the provided sample
-5. Generate a data dictionary for your target server using the instructions in `data_dictionary`.
-6. Upload these data dictionaries to the relevant contains in your storage account. Wait for them to be automatically indexed.
-7. Navigate to `autogen` directory to view the AutoGen implementation. Follow the steps in `Iteration 5 - Agentic Vector Based Text2SQL.ipynb` to get started.
+2. Create your `.env` file based on the provided sample `deploy_ai_search/.env.example`. Place this file in the same place in `deploy_ai_search/.env`.
+3. Clone this repository and deploy the AI Search text2sql indexes from `deploy_ai_search`. See the instructions in the **Steps for Text2SQL Index Deployment (For Structured RAG)** section of the `deploy_ai_search/README.md`.
+
+**Execute the following commands in the `text_2_sql_core` directory:**
+
+4. Create your `.env` file based on the provided sample `text_2_sql/.env.example`. Place this file in the same place in `text_2_sql/.env`.
+5. Run `uv sync` within the text_2_sql directory to install dependencies.
+    - Install the optional dependencies if you need a database connector other than TSQL. `uv sync --extra <DATABASE ENGINE>`
+    - See the supported connectors in `text_2_sql_core/src/text_2_sql_core/connectors`.
+6. Create your `.env` file based on the provided sample `text_2_sql/.env.example`. Place this file in the same place in `text_2_sql/.env`.
+7. Generate a data dictionary for your target server using the instructions in the **Running** section of the `data_dictionary/README.md`.
+8. Upload these generated data dictionaries files to the relevant containers in your storage account. Wait for them to be automatically indexed with the included skillsets.
+
+**Execute the following commands in the `autogen` directory:**
+
+9. Run `uv sync` within the text_2_sql directory to install dependencies.
+    - Install the optional dependencies if you need a database connector other than TSQL. `uv sync --extra <DATABASE ENGINE>`
+    - See the supported connectors in `text_2_sql_core/src/text_2_sql_core/connectors`.
+10. Navigate to `autogen` directory to view the AutoGen implementation. Follow the steps in `Iteration 5 - Agentic Vector Based Text2SQL.ipynb` to get started.
@@ -54,7 +54,20 @@ As the query cache is shared between users (no data is stored in the cache), a n
 
 ![Vector Based with Query Cache Logical Flow.](./images/Agentic%20Text2SQL%20Query%20Cache.png "Agentic Vector Based with Query Cache Logical Flow")
 
-#### Parallel execution
+## Agents
+
+This agentic system contains the following agents:
+
+- **Query Cache Agent:** Responsible for checking the cache for previously asked questions.
+- **Query Decomposition Agent:** Responsible for decomposing complex questions, into sub questions that can be answered with SQL.
+- **Schema Selection Agent:** Responsible for extracting key terms from the question and checking the index store for the queries.
+- **SQL Query Generation Agent:** Responsible for using the previously extracted schemas and generated SQL queries to answer the question. This agent can request more schemas if needed. This agent will run the query.
+- **SQL Query Verification Agent:** Responsible for verifying that the SQL query and results question will answer the question.
+- **Answer Generation Agent:** Responsible for taking the database results and generating the final answer for the user.
+
+The combination of this agent allows the system to answer complex questions, whilst staying under the token limits when including the database schemas. The query cache ensures that previously asked questions, can be answered quickly to avoid degrading user experience.
+
+### Parallel execution
 
 After the first agent has rewritten and decomposed the user input, we execute each of the individual questions in parallel for the quickest time to generate an answer.
 
@@ -189,22 +202,9 @@ Below is a sample entry for a view / table that we which to expose to the LLM. T
 }
 ```
 
-See `./data_dictionary` for more details on how the data dictionary is structured and ways to **automatically generate it**.
-
-## Agentic Vector Based Approach (Iteration 5)
-
-This approach builds on the the Vector Based SQL Plugin approach that was previously developed, but adds a agentic approach to the solution.
-
-This agentic system contains the following agents:
-
-- **Query Cache Agent:** Responsible for checking the cache for previously asked questions.
-- **Query Decomposition Agent:** Responsible for decomposing complex questions, into sub questions that can be answered with SQL.
-- **Schema Selection Agent:** Responsible for extracting key terms from the question and checking the index store for the queries.
-- **SQL Query Generation Agent:** Responsible for using the previously extracted schemas and generated SQL queries to answer the question. This agent can request more schemas if needed. This agent will run the query.
-- **SQL Query Verification Agent:** Responsible for verifying that the SQL query and results question will answer the question.
-- **Answer Generation Agent:** Responsible for taking the database results and generating the final answer for the user.
-
-The combination of this agent allows the system to answer complex questions, whilst staying under the token limits when including the database schemas. The query cache ensures that previously asked questions, can be answered quickly to avoid degrading user experience.
+> [!NOTE]
+>
+> - See `./data_dictionary` for more details on how the data dictionary is structured and ways to **automatically generate it**.
 
 ## Tips for good Text2SQL performance.
 
 
@@ -35,11 +35,13 @@
         "\n",
         "### Dependencies\n",
         "\n",
-        "To install dependencies for this demo:\n",
+        "To install dependencies for this demo. Navigate to the autogen directory:\n",
         "\n",
-        "`uv sync --package autogen_text_2_sql`\n",
+        "`uv sync`\n",
         "\n",
-        "`uv add --editable text_2_sql_core`"
+        "If you need a differnet connector to TSQL.\n",
+        "\n",
+        "`uv sync --extra <DATABASE ENGINE>`"
       ]
     },
     {
@@ -87,6 +89,13 @@
         "agentic_text_2_sql = AutoGenText2Sql(use_case=\"Analysing sales data\")"
       ]
     },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {},
+      "outputs": [],
+      "source": []
+    },
     {
       "cell_type": "markdown",
       "metadata": {},
@@ -100,7 +109,7 @@
       "metadata": {},
       "outputs": [],
       "source": [
-        "async for message in agentic_text_2_sql.process_user_message(UserMessagePayload(user_message=\"What is the total number of sales?\")):\n",
+        "async for message in agentic_text_2_sql.process_user_message(UserMessagePayload(user_message=\"what are the total sales\")):\n",
         "    logging.info(\"Received %s Message from Text2SQL System\", message)"
       ]
     },
@@ -128,7 +137,7 @@
       "name": "python",
       "nbconvert_exporter": "python",
       "pygments_lexer": "ipython3",
-      "version": "3.12.7"
+      "version": "3.12.8"
     }
   },
   "nbformat": 4,
 
@@ -134,7 +134,7 @@ Each agent can be configured with specific parameters and prompts to optimize it
 
 ## Query Cache Implementation Details
 
-The vector based with query cache uses the `fetch_queries_from_cache()` method to fetch the most relevant previous query and injects it into the prompt before the initial LLM call. The use of Auto-Function Calling here is avoided to reduce the response time as the cache index will always be used first.
+The vector based with query cache uses the `fetch_sql_queries_with_schemas_from_cache()` method to fetch the most relevant previous query and injects it into the prompt before the initial LLM call. The use of Auto-Function Calling here is avoided to reduce the response time as the cache index will always be used first.
 
 If the score of the top result is higher than the defined threshold, the query will be executed against the target data source and the results included in the prompt. This allows us to prompt the LLM to evaluated whether it can use these results to answer the question, **without further SQL Query generation** to speed up the process.