You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* initial checkin
* s
* s
* use workaround to use latest models
* use workaround to use latest models
* s
* s
* s
* s
* s
* s
* s
* format black
* fix uts
* fix uts
* Update README.md
Co-authored-by: Pamela Fox <[email protected]>
* Update README.md
Co-authored-by: Pamela Fox <[email protected]>
* Update README.md
Co-authored-by: Pamela Fox <[email protected]>
* Update README.md
Co-authored-by: Pamela Fox <[email protected]>
* Update README.md
Co-authored-by: Pamela Fox <[email protected]>
* Update README.md
Co-authored-by: Pamela Fox <[email protected]>
* Update infra/main.bicep
Co-authored-by: Pamela Fox <[email protected]>
* Update infra/main.bicep
Co-authored-by: Pamela Fox <[email protected]>
* Update scripts/prepdocs.py
Co-authored-by: Pamela Fox <[email protected]>
* Update scripts/prepdocslib/integratedvectorizerstrategy.py
Co-authored-by: Pamela Fox <[email protected]>
* Update scripts/prepdocslib/integratedvectorizerstrategy.py
Co-authored-by: Pamela Fox <[email protected]>
* Update scripts/prepdocslib/integratedvectorizerstrategy.py
Co-authored-by: Pamela Fox <[email protected]>
* s
* s
* fixut
* s
* s
* s
* fix black format
* fix UT
* add blob test'
* s
* s
* s
* add new ut
* s
* s
* s
* Docs tweaks
* Rewords
* Add close, fix typo
* s
* s
* fix formatting
---------
Co-authored-by: Pamela Fox <[email protected]>
Co-authored-by: Pamela Fox <[email protected]>
-[Enabling login and document level access control](#enabling-login-and-document-level-access-control)
42
43
-[Enabling CORS for an alternate frontend](#enabling-cors-for-an-alternate-frontend)
@@ -246,6 +247,17 @@ either you or they can follow these steps:
246
247
247
248
This section covers the integration of GPT-4 Vision with Azure AI Search. Learn how to enhance your search capabilities with the power of image and text indexing, enabling advanced search functionalities over diverse document types. For a detailed guide on setup and usage, visit our [Enabling GPT-4 Turbo with Vision](docs/gpt4v.md) page.
248
249
250
+
### Enabling Integrated Vectorization
251
+
252
+
Azure AI search recently introduced an [integrated vectorization feature in preview mode](https://techcommunity.microsoft.com/t5/ai-azure-ai-services-blog/announcing-the-public-preview-of-integrated-vectorization-in/ba-p/3960809#:~:text=Integrated%20vectorization%20is%20a%20new%20feature%20of%20Azure,pull-indexers%2C%20and%20vectorization%20of%20text%20queries%20through%20vectorizers). This feature is a cloud-based approach to data ingestion, which takes care of document format cracking, data extraction, chunking, vectorization, and indexing, all with Azure technologies.
253
+
254
+
To enable integrated vectorization with this sample:
255
+
256
+
1. If you've previously deployed, delete the existing search index.
257
+
2. Run `azd env set USE_FEATURE_INT_VECTORIZATION true`
258
+
3. Run `azd up` to update system and user roles
259
+
4. You can view the resources such as the indexer and skillset in Azure Portal and monitor the status of the vectorization process.
260
+
249
261
### Enabling authentication
250
262
251
263
By default, the deployed Azure web app will have no authentication or access restrictions enabled, meaning anyone with routable network access to the web app can chat with your indexed data. You can require authentication to your Azure Active Directory by following the [Add app authentication](https://learn.microsoft.com/azure/app-service/scenario-secure-app-authentication-app-service) tutorial and set it up against the deployed web app.
The `scripts/prepdocs.py` script is responsible for both uploading and indexing documents. The typical usage is to call it using `scripts/prepdocs.sh` (Mac/Linux) or `scripts/prepdocs.ps1` (Windows), as these scripts will set up a Python virtual environment and pass in the required parameters based on the current `azd` environment. Whenever `azd up` or `azd provision` is run, the script is called automatically.
@@ -23,16 +32,44 @@ Chunking allows us to limit the amount of information we send to OpenAI due to t
23
32
24
33
If needed, you can modify the chunking algorithm in `scripts/prepdocslib/textsplitter.py`.
25
34
26
-
## Indexing additional documents
35
+
###Indexing additional documents
27
36
28
37
To upload more PDFs, put them in the data/ folder and run `./scripts/prepdocs.sh` or `./scripts/prepdocs.ps1`.
29
38
30
39
A [recent change](https://github.com/Azure-Samples/azure-search-openai-demo/pull/835) added checks to see what's been uploaded before. The prepdocs script now writes an .md5 file with an MD5 hash of each file that gets uploaded. Whenever the prepdocs script is re-run, that hash is checked against the current hash and the file is skipped if it hasn't changed.
31
40
32
-
## Removing documents
41
+
###Removing documents
33
42
34
43
You may want to remove documents from the index. For example, if you're using the sample data, you may want to remove the documents that are already in the index before adding your own.
35
44
36
45
To remove all documents, use the `--removeall` flag. Open either `scripts/prepdocs.sh` or `scripts/prepdocs.ps1` and add `--removeall` to the command at the bottom of the file. Then run the script as usual.
37
46
38
47
You can also remove individual documents by using the `--remove` flag. Open either `scripts/prepdocs.sh` or `scripts/prepdocs.ps1`, add `--remove` to the command at the bottom of the file, and replace `/data/*` with `/data/YOUR-DOCUMENT-FILENAME-GOES-HERE.pdf`. Then run the script as usual.
48
+
49
+
## Overview of Integrated Vectorization
50
+
51
+
Azure AI search recently introduced an [integrated vectorization feature in preview mode](https://techcommunity.microsoft.com/t5/ai-azure-ai-services-blog/announcing-the-public-preview-of-integrated-vectorization-in/ba-p/3960809#:~:text=Integrated%20vectorization%20is%20a%20new%20feature%20of%20Azure,pull-indexers%2C%20and%20vectorization%20of%20text%20queries%20through%20vectorizers). This feature is a cloud-based approach to data ingestion, which takes care of document format cracking, data extraction, chunking, vectorization, and indexing, all with Azure technologies.
52
+
53
+
See [this notebook](https://github.com/Azure/azure-search-vector-samples/blob/main/demo-python/code/azure-search-integrated-vectorization-sample.ipynb) to understand the process of setting up integrated vectorization.
54
+
We have integrated that code into our `prepdocs` script, so you can use it without needing to understand the details.
55
+
56
+
This feature cannot be used on existing index. You need to create a new index or drop and recreate an existing index.
57
+
In the newly created index schema, a new field 'parent_id' is added. This is used internally by the indexer to manage life cycle of chunks.
58
+
59
+
This feature is not supported in the free SKU for Azure AI Search.
60
+
61
+
### Indexing of additional documents
62
+
63
+
To add additional documents to the index, first upload them to your data source (Blob storage, by default).
64
+
Then navigate to the Azure portal, find the index, and run it.
65
+
The Azure AI Search indexer will identify the new documents and ingest them into the index.
66
+
67
+
### Removing documents
68
+
69
+
To remove documents from the index, remove them from your data source (Blob storage, by default).
70
+
Then navigate to the Azure portal, find the index, and run it.
71
+
The Azure AI Search indexer will take care of removing those documents from the index.
72
+
73
+
### Scheduled indexing
74
+
75
+
If you would like the indexer to run automatically, you can set it up to [run on a schedule](https://learn.microsoft.com/azure/search/search-howto-schedule-indexers).
help="Required if --searchimages is specified and --keyvaultname is provided. Fetch the Azure AI Vision key from this key vault instead of using the current user identity to login.",
311
392
)
393
+
parser.add_argument(
394
+
"--useintvectorization",
395
+
required=False,
396
+
help="Required if --useintvectorization is specified. Enable Integrated vectorizer indexer support which is in preview)",
0 commit comments