Skip to content

Commit ea19e14

Browse files
Prepdocs: Support additional args (#1813)
Prepdocs shell scripts support additional arguments
1 parent 8f4af71 commit ea19e14

File tree

3 files changed

+16
-5
lines changed

3 files changed

+16
-5
lines changed

docs/data_ingestion.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,7 @@ The Blob indexer used by the Integrated Vectorization approach also supports a f
3030

3131
## Overview of the manual indexing process
3232

33-
The [`prepdocs.py`](../app/backend/prepdocs.py) script is responsible for both uploading and indexing documents. The typical usage is to call it using `scripts/prepdocs.sh` (Mac/Linux) or `scripts/prepdocs.ps1` (Windows), as these scripts will set up a Python virtual environment and pass in the required parameters based on the current `azd` environment. Whenever `azd up` or `azd provision` is run, the script is called automatically.
33+
The [`prepdocs.py`](../app/backend/prepdocs.py) script is responsible for both uploading and indexing documents. The typical usage is to call it using `scripts/prepdocs.sh` (Mac/Linux) or `scripts/prepdocs.ps1` (Windows), as these scripts will set up a Python virtual environment and pass in the required parameters based on the current `azd` environment. You can pass additional arguments directly to the script, for example `scripts/prepdocs.ps1 --removeall`. Whenever `azd up` or `azd provision` is run, the script is called automatically.
3434

3535
![Diagram of the indexing process](images/diagram_prepdocs.png)
3636

@@ -59,9 +59,9 @@ A [recent change](https://github.com/Azure-Samples/azure-search-openai-demo/pull
5959

6060
You may want to remove documents from the index. For example, if you're using the sample data, you may want to remove the documents that are already in the index before adding your own.
6161

62-
To remove all documents, use the `--removeall` flag. Open either `scripts/prepdocs.sh` or `scripts/prepdocs.ps1` and add `--removeall` to the command at the bottom of the file. Then run the script as usual.
62+
To remove all documents, use `scripts/prepdocs.sh --removeall` or `scripts/prepdocs.ps1 --removeall`.
6363

64-
You can also remove individual documents by using the `--remove` flag. Open either `scripts/prepdocs.sh` or `scripts/prepdocs.ps1`, add `--remove` to the command at the bottom of the file, and replace `/data/*` with `/data/YOUR-DOCUMENT-FILENAME-GOES-HERE.pdf`. Then run the script as usual.
64+
You can also remove individual documents by using the `--remove` flag. Open either `scripts/prepdocs.sh` or `scripts/prepdocs.ps1` and replace `/data/*` with `/data/YOUR-DOCUMENT-FILENAME-GOES-HERE.pdf`. Then run `scripts/prepdocs.sh --remove` or `scripts/prepdocs.ps1 --remove`.
6565

6666
## Overview of Integrated Vectorization
6767

scripts/prepdocs.ps1

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -73,6 +73,10 @@ if ($env:AZURE_OPENAI_API_KEY) {
7373

7474
$cwd = (Get-Location)
7575
$dataArg = "`"$cwd/data/*`""
76+
$additionalArgs = ""
77+
if ($args) {
78+
$additionalArgs = "$args"
79+
}
7680

7781
$argumentList = "./app/backend/prepdocs.py $dataArg --verbose " + `
7882
"--subscriptionid $env:AZURE_SUBSCRIPTION_ID " + `
@@ -88,7 +92,8 @@ $argumentList = "./app/backend/prepdocs.py $dataArg --verbose " + `
8892
"$adlsGen2StorageAccountArg $adlsGen2FilesystemArg $adlsGen2FilesystemPathArg " + `
8993
"$tenantArg $aclArg " + `
9094
"$disableVectorsArg $localPdfParserArg $localHtmlParserArg " + `
91-
"$integratedVectorizationArg "
95+
"$integratedVectorizationArg " + `
96+
"$additionalArgs "
9297

9398
$argumentList
9499

scripts/prepdocs.sh

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -69,6 +69,11 @@ elif [ -n "$OPENAI_API_KEY" ]; then
6969
openAiApiKeyArg="--openaikey $OPENAI_API_KEY"
7070
fi
7171

72+
additionalArgs=""
73+
if [ $# -gt 0 ]; then
74+
additionalArgs="$@"
75+
fi
76+
7277
./.venv/bin/python ./app/backend/prepdocs.py './data/*' --verbose \
7378
--subscriptionid $AZURE_SUBSCRIPTION_ID \
7479
--storageaccount "$AZURE_STORAGE_ACCOUNT" --container "$AZURE_STORAGE_CONTAINER" --storageresourcegroup $AZURE_STORAGE_RESOURCE_GROUP \
@@ -83,4 +88,5 @@ $searchImagesArg $visionEndpointArg \
8388
$adlsGen2StorageAccountArg $adlsGen2FilesystemArg $adlsGen2FilesystemPathArg \
8489
$tenantArg $aclArg \
8590
$disableVectorsArg $localPdfParserArg $localHtmlParserArg \
86-
$integratedVectorizationArg
91+
$integratedVectorizationArg \
92+
$additionalArgs

0 commit comments

Comments
 (0)