You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* feat: add local html parser
* format with ruff
* format with ruff
* fix package name
* run ruff
* add comments to explain regex
* add localhtmlparser flag option to ps1 py sh
* add tests
* add beautiful soup types
* run black again
* docs: add HTML parser to low cost
* docs: add local parsers info
* docs: remove spaces and add to TOC
* Update docs/deploy_lowcost.md
* Update tests/test_htmlparser.py
* add more tests to cover all cases
* Move to deploy features branch
* Reformat the args
* Reformat the args
* Add output to indicate parser used
* Coverage for verbose
---------
Co-authored-by: Pamela Fox <[email protected]>
@@ -176,7 +175,6 @@ It will look like the following:
176
175
177
176
> NOTE: It may take 5-10 minutes after you see 'SUCCESS'for the application to be fully deployed. If you see a "Python Developer" welcome screen or an error page, thenwait a bit and refresh the page. See [guide on debugging App Service deployments](docs/appservice.md).
178
177
179
-
180
178
### Deploying again
181
179
182
180
If you've only changed the backend/frontend code in the `app` folder, then you don't need to re-provision the Azure resources. You can just run:
*[Enabling login and document level access control](#enabling-login-and-document-level-access-control)
12
12
*[Enabling CORS for an alternate frontend](#enabling-cors-for-an-alternate-frontend)
13
+
*[Using local parsers](#using-local-parsers)
13
14
14
15
## Using GPT-4
15
16
@@ -56,3 +57,10 @@ For an alternate frontend that's written in Web Components and deployed to Stati
56
57
[azure-search-openai-javascript](https://github.com/Azure-Samples/azure-search-openai-javascript) and its guide
57
58
on [using a different backend](https://github.com/Azure-Samples/azure-search-openai-javascript#using-a-different-backend).
58
59
Both these repositories adhere to the same [HTTP protocol for RAG chat apps](https://github.com/Azure-Samples/ai-chat-app-protocol).
60
+
61
+
## Using local parsers
62
+
63
+
If you want to decrease the charges by using local parsers instead of Azure Document Intelligence, you can set environment variables before running the [data ingestion script](/docs/data_ingestion.md). Note that local parsers will generally be not as sophisticated.
64
+
65
+
1. Run `azd env set USE_LOCAL_PDF_PARSER true` to use the local PDF parser.
66
+
1. Run `azd env set USE_LOCAL_HTML_PARSER true` to use the local HTML parser.
help="Use PyPdf local PDF parser (supports only digital PDFs) instead of Azure Document Intelligence service to extract text, tables and layout from the documents",
355
361
)
362
+
parser.add_argument(
363
+
"--localhtmlparser",
364
+
action="store_true",
365
+
help="Use Beautiful soap local HTML parser instead of Azure Document Intelligence service to extract text, tables and layout from the documents",
0 commit comments