Skip to content

Commit 396ce24

Browse files
committed
update readme
1 parent 2af47bb commit 396ce24

File tree

1 file changed

+35
-2
lines changed
  • Python/CosmosDB-NoSQL_SemanticSearchDemo

1 file changed

+35
-2
lines changed

Python/CosmosDB-NoSQL_SemanticSearchDemo/README.md

Lines changed: 35 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,13 @@ This repository contains a Python Streamlit application that demonstrates advanc
2828

2929
## Prerequisites
3030

31-
- [Azure Cosmos DB](https://azure.microsoft.com/services/cosmos-db/) account with NoSQL API with vector search and full text search enabled
31+
- [Azure Cosmos DB](https://azure.microsoft.com/services/cosmos-db/) account with NoSQL API
32+
- **⚠️ IMPORTANT:** You must enable the following preview features in your Azure Cosmos DB account:
33+
- Navigate to your Cosmos DB account in the Azure Portal
34+
- Go to **Settings****Features**
35+
- Enable **"Vector Search for NoSQL API"**
36+
- Enable **"Preview Features for Full Text Search"**
37+
- These features are required for vector indexing and full text search capabilities
3238
- [Azure OpenAI](https://azure.microsoft.com/products/ai-services/openai-service) account
3339
- [Azure CLI](https://docs.microsoft.com/cli/azure/install-azure-cli) (for local authentication)
3440

@@ -107,7 +113,34 @@ The app uses **keyless authentication** with Azure Cosmos DB:
107113
az login
108114
```
109115

110-
### 7. Run the Application
116+
### 7. Load Sample Data
117+
118+
**⚠️ IMPORTANT:** You must load data into your Azure Cosmos DB database before running the application. The search functionality requires data to be present.
119+
120+
Make sure your virtual environment is activated, then run the data loader:
121+
122+
**PowerShell (Windows):**
123+
```powershell
124+
.\venv\Scripts\Activate.ps1
125+
python src\data\data-loader.py --text_field_name "overview" --path_to_json_array "https://raw.githubusercontent.com/microsoft/AzureDataRetrievalAugmentedGenerationSamples/refs/heads/main/DataSet/Movies/MovieLens-4489-256D.json" --database_name "searchdemo2" --concurrency 5 --vector_field_name "vector" --re_embed True
126+
```
127+
128+
**Bash (Linux/macOS):**
129+
```bash
130+
source venv/bin/activate
131+
python src/data/data-loader.py --text_field_name "overview" --path_to_json_array "https://raw.githubusercontent.com/microsoft/AzureDataRetrievalAugmentedGenerationSamples/refs/heads/main/DataSet/Movies/MovieLens-4489-256D.json" --database_name "searchdemo2" --concurrency 5 --vector_field_name "vector" --re_embed True
132+
```
133+
134+
**What this does:**
135+
- Downloads the movie dataset (4,489 records) from GitHub
136+
- Creates Azure Cosmos DB containers with vector and full text indexing
137+
- Generates embeddings using Azure OpenAI
138+
- Loads all data into both `search_qflat` and `search_diskann` containers
139+
- Takes approximately 10-15 minutes to complete
140+
141+
**Note:** Make sure the `COSMOS_DB_DATABASE` in your `.env` file matches the `--database_name` parameter (e.g., "searchdemo2").
142+
143+
### 8. Run the Application
111144

112145
Make sure your virtual environment is activated, then choose one of the options below:
113146

0 commit comments

Comments
 (0)