Refactor scripts to avoid anti-patterns, redundancy #1986

pamelafox · 2024-09-23T15:13:30Z

Purpose

This PR refactors our scripts in several key ways:

Removes the anti-pattern of exporting all of azd env get-values to the current environment. Instead, we only load the current environment into the currently active Python environment, or if we're only accessing a few variables in a shell script, we use azd env get-value to fetch the value.
Uses os.getenv instead of argparse arguments for all cases where the "argument" is actually set in the azd environment. This affected prepdocs.py/sh/ps1 the most, which had a growing unmaintenable list of arguments. Now, only non-azd arguments can be specified.
Removed sh/ps1 scripts where the wrapper doesn't seem necessary, for manageacls.py and adlsgen2.py. The developer just has to run them inside the Python env with the right packages, and they'll work.

Does this introduce a breaking change?

When developers merge from main and run the server, azd up, or azd deploy, will this produce an error?
If you're not sure, try it out on an old environment.

[ ] Yes
[X] No - In theory, no, but more testing needed.

Does this require changes to learn.microsoft.com docs?

This repository is referenced by this tutorial
which includes deployment, settings and usage instructions. If text or screenshot need to change in the tutorial,
check the box below and notify the tutorial author. A Microsoft employee can do this for you if you're an external contributor.

[ ] Yes
[X] No

Type of change

[ ] Bugfix
[ ] Feature
[ ] Code style update (formatting, local variables)
[X] Refactoring (no functional changes, no api changes)
[ ] Documentation content changes
[ ] Other... Please describe:

Code quality checklist

See CONTRIBUTING.md for more details.

The current tests all pass (python -m pytest).
I added tests that prove my fix is effective or that my feature works
I ran python -m pytest --cov to verify 100% coverage of added lines
I ran python -m mypy to check for type errors
I either used the pre-commit hooks or ran ruff and black manually on my code.

…der instead of cwd

… own folder instead of cwd" This reverts commit 40287f2.

github-actions · 2024-09-24T11:53:17Z

Check Broken URLs

We have automatically detected the following broken URLs in your files. Review and fix the paths to resolve this issue.

Check the file paths and associated broken URLs inside them. For more details, check our Contributing Guide.

File Full Path Issues

README.md

#	Link	Line Number
1	`https://stackoverflow.com/questions/35569042/ssl-certificate-verify-failed-with-python3/43855394#43855394`	`262`

samples/chat/README.md

#	Link	Line Number
1	`https://stackoverflow.com/questions/35569042/ssl-certificate-verify-failed-with-python3/43855394#43855394`	`265`

github-actions · 2024-09-25T14:36:40Z

Check Broken Paths

We have automatically detected the following broken relative paths in your files.
Review and fix the paths to resolve this issue.

Check the file paths and associated broken paths inside them. For more details, check our Contributing Guide.

File Full Path Issues

samples/document-security/README.md

#	Link	Line Number
1	`../app/backend/core/authentication.py`	`176`
2	`../scripts/manageacl.ps1`	`189`
3	`../scripts/manageacl.ps1`	`193`
4	`../scripts/manageacl.sh`	`193`
5	`../scripts/adlsgen2setup.py`	`237`
6	`../scripts/sampleacls.json`	`249`
7	`../scripts/sampleacls.json`	`251`
8	`../scripts/sampleacls.json`	`252`
9	`../scripts/sampleacls.json`	`253`
10	`../app/backend/prepdocs.py`	`261`

docs/login_and_acl.md

#	Link	Line Number
1	`../scripts/manageacl.ps1`	`170`
2	`../scripts/manageacl.ps1`	`174`
3	`../scripts/manageacl.sh`	`174`

github-actions · 2024-09-25T21:14:14Z

Check Broken Paths

We have automatically detected the following broken relative paths in your files.
Review and fix the paths to resolve this issue.

Check the file paths and associated broken paths inside them. For more details, check our Contributing Guide.

File Full Path Issues

samples/document-security/README.md

#	Link	Line Number
1	`../app/backend/core/authentication.py`	`176`
2	`../scripts/manageacl.ps1`	`189`
3	`../scripts/manageacl.ps1`	`193`
4	`../scripts/manageacl.sh`	`193`
5	`../scripts/adlsgen2setup.py`	`237`
6	`../scripts/sampleacls.json`	`249`
7	`../scripts/sampleacls.json`	`251`
8	`../scripts/sampleacls.json`	`252`
9	`../scripts/sampleacls.json`	`253`
10	`../app/backend/prepdocs.py`	`261`

github-actions · 2024-09-26T17:51:19Z

Check Broken Paths

We have automatically detected the following broken relative paths in your files.
Review and fix the paths to resolve this issue.

Check the file paths and associated broken paths inside them. For more details, check our Contributing Guide.

File Full Path Issues

samples/document-security/README.md

#	Link	Line Number
1	`../app/backend/core/authentication.py`	`176`
2	`../scripts/manageacl.ps1`	`189`
3	`../scripts/manageacl.ps1`	`193`
4	`../scripts/manageacl.sh`	`193`
5	`../scripts/adlsgen2setup.py`	`237`
6	`../scripts/sampleacls.json`	`249`
7	`../scripts/sampleacls.json`	`251`
8	`../scripts/sampleacls.json`	`252`
9	`../scripts/sampleacls.json`	`253`
10	`../app/backend/prepdocs.py`	`261`

github-actions · 2024-09-26T18:00:39Z

Check Broken Paths

We have automatically detected the following broken relative paths in your files.
Review and fix the paths to resolve this issue.

Check the file paths and associated broken paths inside them. For more details, check our Contributing Guide.

File Full Path Issues

samples/document-security/README.md

#	Link	Line Number
1	`../app/backend/core/authentication.py`	`177`
2	`../scripts/manageacl.py`	`190`
3	`../scripts/adlsgen2setup.py`	`241`
4	`../scripts/sampleacls.json`	`253`
5	`../scripts/sampleacls.json`	`255`
6	`../scripts/sampleacls.json`	`256`
7	`../scripts/sampleacls.json`	`257`
8	`../app/backend/prepdocs.py`	`265`

pamelafox · 2024-09-26T20:39:06Z

app/backend/prepdocs.py

-    html_parser: Parser
-    pdf_parser: Parser
-    doc_int_parser: DocumentAnalysisParser
+    sentence_text_splitter = SentenceTextSplitter(has_image_embeddings=search_images)


I changed this code around a bit as I found some issues once I wrote tests that touched it.

pamelafox · 2024-09-26T20:39:41Z

app/backend/app.py

+    # Set our own logger levels to INFO by default
+    app_level = os.getenv("APP_LOG_LEVEL", "INFO")
+    app.logger.setLevel(os.getenv("APP_LOG_LEVEL", app_level))
+    logging.getLogger("scripts").setLevel(app_level)


Since prepdocs uses the "scripts" logger, I wanted to make sure we also see its logs when using user upload feature.
Though I wonder if I should give it a different name, like "ragapp".

I think scripts is an acceptable name for now - we can change it later if it's not working

pamelafox · 2024-09-26T20:40:08Z

app/backend/load_azd_env.py

+logger = logging.getLogger("scripts")
+
+
+def load_azd_env():


This file actually shows up in two places, both here and in scripts folder, for convenience.

pamelafox · 2024-09-26T20:40:39Z

app/backend/main.py

+from load_azd_env import load_azd_env
+
+# WEBSITE_HOSTNAME is always set by App Service, RUNNING_IN_PRODUCTION is set in main.bicep
+RUNNING_ON_AZURE = os.getenv("WEBSITE_HOSTNAME") is not None or os.getenv("RUNNING_IN_PRODUCTION") is not None


This code also exists in two places-

here, so that we load the environment when starting up the app locally

in app.py, so that we can decide what kind of credential to use

pamelafox · 2024-09-26T20:41:07Z

app/backend/prepdocs.py

    parser = argparse.ArgumentParser(
        description="Prepare documents by extracting content from PDFs, splitting content into sections, uploading to blob storage, and indexing in a search index.",
-        epilog="Example: prepdocs.py '.\\data\*' --storageaccount myaccount --container mycontainer --searchservice mysearch --index myindex -v",
+        epilog="Example: prepdocs.py '.\\data\*' -v",


I only kept the arguments that did not come directly from azd env

pamelafox · 2024-09-26T20:41:28Z

app/backend/prepdocs.py

+                ".heic": FileProcessor(doc_int_parser, sentence_text_splitter),
+            }
+        )
+    return file_processors


^^ all of that code change was due to issues found via tests/mypy

pamelafox · 2024-09-26T20:41:58Z

app/backend/prepdocs.py

    )
    parser.add_argument(
-        "--useintvectorization",
+        "--searchserviceassignedid",


I cannot find ANY evidence of us telling folks how to use this, or even actually using it. Do we even need this arg??

integratedvectorization sets search_user_assigned_identity but then doesnt use it.

yeah, let's create a follow-up pr to remove this

pamelafox · 2024-09-26T20:43:27Z

app/frontend/src/pages/chat/Chat.tsx

                    event["message"] = event["delta"];
                    askResponse = event as ChatAppResponse;
-                } else if (event["delta"]["content"]) {
+                } else if (event["delta"] && event["delta"]["content"]) {


Unrelated error that I found while presenting. This error meant we arent currently rendering errors correctly, as {error: } responses dont contain a delta key.

pamelafox · 2024-09-26T20:44:08Z

infra/main.parameters.json

      "value": "${AZURE_OPENAI_EMB_DIMENSIONS}"
    },
+    "gpt4vDeploymentCapacity":{
+      "value": "${AZURE_OPENAI_GPT4V_DEPLOYMENT_CAPACITY=10}"


Also unrelated, I needed to be able to set capacity to get the vision endpoint deployed to our restrictive tenants. I can move to other branch if desired.

pamelafox · 2024-09-26T20:44:34Z

samples/document-security/README.md


 - [Requirements](#requirements)
- [Setting up Microsoft Entra ID Apps](#setting-up-entra-id-apps)
+- [Setting up Microsoft Entra applications](#setting-up-microsoft-entra-applications)


This file is just a copy paste of login_and_acl.md

can we eventually remove this? OK to leave it for now

I assume you mean, can we we eventually remove the duplicate copy of login_and_acl.md?

Arun wants the ACL feature to show up as a completely separate sample in the Microsoft Learn Samples browser, which requires the copy. If it's annoying to maintain, we could discuss removing it, and it wouldn't show up in the samples browser anymore.

app/backend/prepdocs.py

mattgotteiner · 2024-09-26T23:04:10Z

app/backend/prepdocs.py

-        logger.info("Connecting to Azure services using the azd credential for tenant %s", args.tenantid)
-        azd_credential = AzureDeveloperCliCredential(tenant_id=args.tenantid, process_timeout=60)
+    # Use the current user identity to connect to Azure services. See infra/main.bicep for role assignments.
+    if tenant_id := os.getenv("AZURE_TENANT_ID"):


I like the idea of using the env vars instead of args - great improvement!

mattgotteiner

Thanks, this is amazing

1yefuwang1 and others added 30 commits August 22, 2024 13:19

Update bicep for ACA

d721099

First working version

30f00e5

Support workload profile

72e34d2

Merge branch 'Azure-Samples:main' into main

55a97fd

Add support for CORS and fix identity for openai

7edd2db

Add aca-host

8fc2d5a

Make acr unique

9cadd14

Add doc for aca host

0623e9b

Merge branch 'Azure-Samples:main' into yefu/aca

73b2bb3

Update ACA docs

e362545

Remove unneeded bicep files

24d668a

Revert chanes to infra/main.parameters.json

fbb4b05

Fix markdown lint issues

4ced7ce

Run frontend build before building docker image

625866f

remove symlinks and update scripts with paths relative to its own fol…

40287f2

…der instead of cwd

Merge with main.bicep

a99a6c5

output AZURE_CONTAINER_REGISTRY_ENDPOINT

9dc65ca

Fix deployment with app service

7f523a0

Improve naming and README

9e6e145

Fix identity name and cost esitmation for aca

4ec32f7

Share env vars in bicep and update docs

4174fd3

Revert "remove symlinks and update scripts with paths relative to its…

7e49c99

… own folder instead of cwd" This reverts commit 40287f2.

Add containerapps as a commented out host option

259e7a5

Update app/backend/.dockerignore

920e979

Apply suggestions from code review

eb09e46

Merge branch 'main' into yefu/aca

56025eb

More steps for deployment guide

13021cb

Update azure.yaml

8b19702

Merge branch 'main' into yefu/aca

6550960

Update comment

d49f60c

Fix error handling

a4a4f11

pamelafox added 2 commits September 25, 2024 09:03

Update manageacl.py commands

e4a7abf

Doc update

0ab84da

Adding more tests for prepdocs

4fef884

Fix markdown copy

7d57de8

pamelafox added 2 commits September 26, 2024 11:12

Fix relative links

1980845

Make prepdocs mypy happy

b378727

pamelafox requested a review from mattgotteiner September 26, 2024 20:37

pamelafox commented Sep 26, 2024

View reviewed changes

Fix auth_update if check

697fa01

mattgotteiner reviewed Sep 26, 2024

View reviewed changes

app/backend/prepdocs.py Show resolved Hide resolved

mattgotteiner reviewed Sep 26, 2024

View reviewed changes

mattgotteiner approved these changes Sep 26, 2024

View reviewed changes

pamelafox merged commit b8f0a74 into Azure-Samples:main Sep 26, 2024
17 checks passed

pamelafox deleted the loadazdenv branch September 26, 2024 23:15

egor-yudkin mentioned this pull request Dec 11, 2024

Start.sh doesn't pre-load environment variables anymore - this breaks automatic VITE_* environment variables in frontend #2227

Open

Refactor scripts to avoid anti-patterns, redundancy #1986

Refactor scripts to avoid anti-patterns, redundancy #1986

Uh oh!

Conversation

pamelafox commented Sep 23, 2024

Purpose

Does this introduce a breaking change?

Does this require changes to learn.microsoft.com docs?

Type of change

Code quality checklist

Uh oh!

github-actions bot commented Sep 24, 2024

Check Broken URLs

Uh oh!

github-actions bot commented Sep 25, 2024

Check Broken Paths

Uh oh!

github-actions bot commented Sep 25, 2024

Check Broken Paths

Uh oh!

github-actions bot commented Sep 26, 2024

Check Broken Paths

Uh oh!

github-actions bot commented Sep 26, 2024

Check Broken Paths

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mattgotteiner left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!