|
| 1 | +This example builds an embedding index based on files stored in an Azure Blob Storage container. |
| 2 | +It continuously updates the index as files are added / updated / deleted in the source container: |
| 3 | +it keeps the index in sync with the Azure Blob Storage container effortlessly. |
| 4 | + |
| 5 | +## Quick Start (Public Test Container) |
| 6 | + |
| 7 | +🚀 **Try it immediately!** We provide a public test container with sample documents: |
| 8 | +- **Account:** `testnamecocoindex1` |
| 9 | +- **Container:** `testpublic1` (public access) |
| 10 | +- **No authentication required!** |
| 11 | + |
| 12 | +Just copy `.env.example` to `.env` and run - it works out of the box with anonymous access. |
| 13 | + |
| 14 | +## Prerequisite |
| 15 | + |
| 16 | +Before running the example, you need to: |
| 17 | + |
| 18 | +1. [Install Postgres](https://cocoindex.io/docs/getting_started/installation#-install-postgres) if you don't have one. |
| 19 | + |
| 20 | +2. Prepare for Azure Blob Storage. |
| 21 | + You'll need an Azure Storage account and container. Supported authentication methods: |
| 22 | + - **Connection String** (recommended for development) |
| 23 | + - **SAS Token** (recommended for production) |
| 24 | + - **Account Key** (full access) |
| 25 | + - **Anonymous access** (for public containers only) |
| 26 | + |
| 27 | +3. Create a `.env` file with your Azure Blob Storage configuration. |
| 28 | + Start from copying the `.env.example`, and then edit it to fill in your credentials. |
| 29 | + |
| 30 | + ```bash |
| 31 | + cp .env.example .env |
| 32 | + $EDITOR .env |
| 33 | + ``` |
| 34 | + |
| 35 | + Example `.env` file with connection string: |
| 36 | + ``` |
| 37 | + # Database Configuration |
| 38 | + DATABASE_URL=postgresql://localhost:5432/cocoindex |
| 39 | + |
| 40 | + # Azure Blob Storage Configuration |
| 41 | + AZURE_STORAGE_ACCOUNT_NAME=mystorageaccount |
| 42 | + AZURE_BLOB_CONTAINER_NAME=mydocuments |
| 43 | + AZURE_BLOB_PREFIX= |
| 44 | + |
| 45 | + # Authentication (choose one) |
| 46 | + AZURE_BLOB_CONNECTION_STRING=DefaultEndpointsProtocol=https;AccountName=mystorageaccount;AccountKey=mykey123;EndpointSuffix=core.windows.net |
| 47 | + ``` |
| 48 | + |
| 49 | +## Run |
| 50 | + |
| 51 | +Install dependencies: |
| 52 | + |
| 53 | +```sh |
| 54 | +pip install -e . |
| 55 | +``` |
| 56 | + |
| 57 | +Run: |
| 58 | + |
| 59 | +```sh |
| 60 | +python main.py |
| 61 | +``` |
| 62 | + |
| 63 | +During running, it will keep observing changes in the Azure Blob Storage container and update the index automatically. |
| 64 | +At the same time, it accepts queries from the terminal, and performs search on top of the up-to-date index. |
| 65 | + |
| 66 | + |
| 67 | +## CocoInsight |
| 68 | +CocoInsight is in Early Access now (Free) 😊 You found us! A quick 3 minute video tutorial about CocoInsight: [Watch on YouTube](https://youtu.be/ZnmyoHslBSc?si=pPLXWALztkA710r9). |
| 69 | + |
| 70 | +Run CocoInsight to understand your RAG data pipeline: |
| 71 | + |
| 72 | +```sh |
| 73 | +cocoindex server -ci main.py |
| 74 | +``` |
| 75 | + |
| 76 | +You can also add a `-L` flag to make the server keep updating the index to reflect source changes at the same time: |
| 77 | + |
| 78 | +```sh |
| 79 | +cocoindex server -ci -L main.py |
| 80 | +``` |
| 81 | + |
| 82 | +Then open the CocoInsight UI at [https://cocoindex.io/cocoinsight](https://cocoindex.io/cocoinsight). |
| 83 | + |
| 84 | +## Authentication Methods & Troubleshooting |
| 85 | + |
| 86 | +### Connection String (Recommended for Development) |
| 87 | +```bash |
| 88 | +AZURE_BLOB_CONNECTION_STRING="DefaultEndpointsProtocol=https;AccountName=testnamecocoindex1;AccountKey=your-key;EndpointSuffix=core.windows.net" |
| 89 | +``` |
| 90 | +- **Pros:** Easiest to set up, contains all necessary information |
| 91 | +- **Cons:** Contains account key (full access) |
| 92 | +- **⚠️ Important:** Use **Account Key** connection string, NOT SAS connection string! |
| 93 | + |
| 94 | +### SAS Token (Recommended for Production) |
| 95 | +```bash |
| 96 | +AZURE_BLOB_SAS_TOKEN="sp=r&st=2024-01-01T00:00:00Z&se=2025-12-31T23:59:59Z&spr=https&sv=2022-11-02&sr=c&sig=..." |
| 97 | +``` |
| 98 | +- **Pros:** Fine-grained permissions, time-limited |
| 99 | +- **Cons:** More complex to generate and manage |
| 100 | + |
| 101 | +**SAS Token Requirements:** |
| 102 | +- `sp=r` - Read permission (required) |
| 103 | +- `sp=rl` - Read + List permissions (recommended) |
| 104 | +- `sr=c` - Container scope (to access all blobs) |
| 105 | +- Valid time range (`st` and `se` in UTC) |
| 106 | + |
| 107 | +### Account Key |
| 108 | +```bash |
| 109 | +AZURE_BLOB_ACCOUNT_KEY="your-account-key-here" |
| 110 | +``` |
| 111 | +- **Pros:** Simple to use |
| 112 | +- **Cons:** Full account access, security risk |
| 113 | + |
| 114 | +### Anonymous Access |
| 115 | +Leave all authentication options empty - only works with public containers. |
| 116 | + |
| 117 | +## Common Issues |
| 118 | + |
| 119 | +### 401 Authentication Error |
| 120 | +``` |
| 121 | +Error: server returned error status which will not be retried: 401 |
| 122 | +Error Code: NoAuthenticationInformation |
| 123 | +``` |
| 124 | +
|
| 125 | +**Solutions:** |
| 126 | +1. **Check authentication priority:** Connection String > SAS Token > Account Key > Anonymous |
| 127 | +2. **Verify SAS token permissions:** Must include `r` (read) and `l` (list) permissions |
| 128 | +3. **Check SAS token expiry:** Ensure `se` (expiry time) is in the future |
| 129 | +4. **Verify container scope:** Use `sr=c` for container-level access |
| 130 | +
|
| 131 | +### Connection String Issues |
| 132 | +
|
| 133 | +**⚠️ CRITICAL: Use Account Key Connection String, NOT SAS Connection String!** |
| 134 | +
|
| 135 | +**✅ Correct (Account Key Connection String):** |
| 136 | +``` |
| 137 | +DefaultEndpointsProtocol=https;AccountName=testnamecocoindex1;AccountKey=your-key;EndpointSuffix=core.windows.net |
| 138 | +``` |
| 139 | +
|
| 140 | +**❌ Wrong (SAS Connection String - will not work):** |
| 141 | +``` |
| 142 | +BlobEndpoint=https://testnamecocoindex1.blob.core.windows.net/;SharedAccessSignature=sp=r&st=... |
| 143 | +``` |
| 144 | +
|
| 145 | +**Other tips:** |
| 146 | +- Don't include quotes in the actual connection string value |
| 147 | +- Account name in connection string should match `AZURE_STORAGE_ACCOUNT_NAME` |
| 148 | +- Connection string must contain `AccountKey=` parameter |
| 149 | +
|
| 150 | +### Container Access Issues |
| 151 | +- Verify container exists and account has access |
| 152 | +- Check `AZURE_BLOB_CONTAINER_NAME` spelling |
| 153 | +- For anonymous access, container must be public |
0 commit comments