Skip to content

Conversation

pamelafox
Copy link
Collaborator

Purpose

This PR refactors our scripts in several key ways:

  • Removes the anti-pattern of exporting all of azd env get-values to the current environment. Instead, we only load the current environment into the currently active Python environment, or if we're only accessing a few variables in a shell script, we use azd env get-value to fetch the value.
  • Uses os.getenv instead of argparse arguments for all cases where the "argument" is actually set in the azd environment. This affected prepdocs.py/sh/ps1 the most, which had a growing unmaintenable list of arguments. Now, only non-azd arguments can be specified.
  • Removed sh/ps1 scripts where the wrapper doesn't seem necessary, for manageacls.py and adlsgen2.py. The developer just has to run them inside the Python env with the right packages, and they'll work.

Does this introduce a breaking change?

When developers merge from main and run the server, azd up, or azd deploy, will this produce an error?
If you're not sure, try it out on an old environment.

[ ] Yes
[X] No - In theory, no, but more testing needed.

Does this require changes to learn.microsoft.com docs?

This repository is referenced by this tutorial
which includes deployment, settings and usage instructions. If text or screenshot need to change in the tutorial,
check the box below and notify the tutorial author. A Microsoft employee can do this for you if you're an external contributor.

[ ] Yes
[X] No

Type of change

[ ] Bugfix
[ ] Feature
[ ] Code style update (formatting, local variables)
[X] Refactoring (no functional changes, no api changes)
[ ] Documentation content changes
[ ] Other... Please describe:

Code quality checklist

See CONTRIBUTING.md for more details.

  • The current tests all pass (python -m pytest).
  • I added tests that prove my fix is effective or that my feature works
  • I ran python -m pytest --cov to verify 100% coverage of added lines
  • I ran python -m mypy to check for type errors
  • I either used the pre-commit hooks or ran ruff and black manually on my code.

Copy link

Check Broken URLs

We have automatically detected the following broken URLs in your files. Review and fix the paths to resolve this issue.

Check the file paths and associated broken URLs inside them. For more details, check our Contributing Guide.

File Full Path Issues
README.md
#LinkLine Number
1https://stackoverflow.com/questions/35569042/ssl-certificate-verify-failed-with-python3/43855394#43855394262
samples/chat/README.md
#LinkLine Number
1https://stackoverflow.com/questions/35569042/ssl-certificate-verify-failed-with-python3/43855394#43855394265

Copy link

Check Broken Paths

We have automatically detected the following broken relative paths in your files.
Review and fix the paths to resolve this issue.

Check the file paths and associated broken paths inside them. For more details, check our Contributing Guide.

File Full Path Issues
samples/document-security/README.md
#LinkLine Number
1../app/backend/core/authentication.py176
2../scripts/manageacl.ps1189
3../scripts/manageacl.ps1193
4../scripts/manageacl.sh193
5../scripts/adlsgen2setup.py237
6../scripts/sampleacls.json249
7../scripts/sampleacls.json251
8../scripts/sampleacls.json252
9../scripts/sampleacls.json253
10../app/backend/prepdocs.py261
docs/login_and_acl.md
#LinkLine Number
1../scripts/manageacl.ps1170
2../scripts/manageacl.ps1174
3../scripts/manageacl.sh174

Copy link

Check Broken Paths

We have automatically detected the following broken relative paths in your files.
Review and fix the paths to resolve this issue.

Check the file paths and associated broken paths inside them. For more details, check our Contributing Guide.

File Full Path Issues
samples/document-security/README.md
#LinkLine Number
1../app/backend/core/authentication.py176
2../scripts/manageacl.ps1189
3../scripts/manageacl.ps1193
4../scripts/manageacl.sh193
5../scripts/adlsgen2setup.py237
6../scripts/sampleacls.json249
7../scripts/sampleacls.json251
8../scripts/sampleacls.json252
9../scripts/sampleacls.json253
10../app/backend/prepdocs.py261

Copy link

Check Broken Paths

We have automatically detected the following broken relative paths in your files.
Review and fix the paths to resolve this issue.

Check the file paths and associated broken paths inside them. For more details, check our Contributing Guide.

File Full Path Issues
samples/document-security/README.md
#LinkLine Number
1../app/backend/core/authentication.py176
2../scripts/manageacl.ps1189
3../scripts/manageacl.ps1193
4../scripts/manageacl.sh193
5../scripts/adlsgen2setup.py237
6../scripts/sampleacls.json249
7../scripts/sampleacls.json251
8../scripts/sampleacls.json252
9../scripts/sampleacls.json253
10../app/backend/prepdocs.py261

Copy link

Check Broken Paths

We have automatically detected the following broken relative paths in your files.
Review and fix the paths to resolve this issue.

Check the file paths and associated broken paths inside them. For more details, check our Contributing Guide.

File Full Path Issues
samples/document-security/README.md
#LinkLine Number
1../app/backend/core/authentication.py177
2../scripts/manageacl.py190
3../scripts/adlsgen2setup.py241
4../scripts/sampleacls.json253
5../scripts/sampleacls.json255
6../scripts/sampleacls.json256
7../scripts/sampleacls.json257
8../app/backend/prepdocs.py265

html_parser: Parser
pdf_parser: Parser
doc_int_parser: DocumentAnalysisParser
sentence_text_splitter = SentenceTextSplitter(has_image_embeddings=search_images)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed this code around a bit as I found some issues once I wrote tests that touched it.

# Set our own logger levels to INFO by default
app_level = os.getenv("APP_LOG_LEVEL", "INFO")
app.logger.setLevel(os.getenv("APP_LOG_LEVEL", app_level))
logging.getLogger("scripts").setLevel(app_level)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since prepdocs uses the "scripts" logger, I wanted to make sure we also see its logs when using user upload feature.
Though I wonder if I should give it a different name, like "ragapp".

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think scripts is an acceptable name for now - we can change it later if it's not working

logger = logging.getLogger("scripts")


def load_azd_env():
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file actually shows up in two places, both here and in scripts folder, for convenience.

from load_azd_env import load_azd_env

# WEBSITE_HOSTNAME is always set by App Service, RUNNING_IN_PRODUCTION is set in main.bicep
RUNNING_ON_AZURE = os.getenv("WEBSITE_HOSTNAME") is not None or os.getenv("RUNNING_IN_PRODUCTION") is not None
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code also exists in two places-

  1. here, so that we load the environment when starting up the app locally
  2. in app.py, so that we can decide what kind of credential to use

parser = argparse.ArgumentParser(
description="Prepare documents by extracting content from PDFs, splitting content into sections, uploading to blob storage, and indexing in a search index.",
epilog="Example: prepdocs.py '.\\data\*' --storageaccount myaccount --container mycontainer --searchservice mysearch --index myindex -v",
epilog="Example: prepdocs.py '.\\data\*' -v",
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only kept the arguments that did not come directly from azd env

".heic": FileProcessor(doc_int_parser, sentence_text_splitter),
}
)
return file_processors
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

^^ all of that code change was due to issues found via tests/mypy

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yay mypy

)
parser.add_argument(
"--useintvectorization",
"--searchserviceassignedid",
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cannot find ANY evidence of us telling folks how to use this, or even actually using it. Do we even need this arg??

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

integratedvectorization sets search_user_assigned_identity but then doesnt use it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, let's create a follow-up pr to remove this

event["message"] = event["delta"];
askResponse = event as ChatAppResponse;
} else if (event["delta"]["content"]) {
} else if (event["delta"] && event["delta"]["content"]) {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated error that I found while presenting. This error meant we arent currently rendering errors correctly, as {error: } responses dont contain a delta key.

"value": "${AZURE_OPENAI_EMB_DIMENSIONS}"
},
"gpt4vDeploymentCapacity":{
"value": "${AZURE_OPENAI_GPT4V_DEPLOYMENT_CAPACITY=10}"
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also unrelated, I needed to be able to set capacity to get the vision endpoint deployed to our restrictive tenants. I can move to other branch if desired.


- [Requirements](#requirements)
- [Setting up Microsoft Entra ID Apps](#setting-up-entra-id-apps)
- [Setting up Microsoft Entra applications](#setting-up-microsoft-entra-applications)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is just a copy paste of login_and_acl.md

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we eventually remove this? OK to leave it for now

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume you mean, can we we eventually remove the duplicate copy of login_and_acl.md?

Arun wants the ACL feature to show up as a completely separate sample in the Microsoft Learn Samples browser, which requires the copy. If it's annoying to maintain, we could discuss removing it, and it wouldn't show up in the samples browser anymore.

logger.info("Connecting to Azure services using the azd credential for tenant %s", args.tenantid)
azd_credential = AzureDeveloperCliCredential(tenant_id=args.tenantid, process_timeout=60)
# Use the current user identity to connect to Azure services. See infra/main.bicep for role assignments.
if tenant_id := os.getenv("AZURE_TENANT_ID"):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of using the env vars instead of args - great improvement!

Copy link
Collaborator

@mattgotteiner mattgotteiner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this is amazing

@pamelafox pamelafox merged commit b8f0a74 into Azure-Samples:main Sep 26, 2024
17 checks passed
@pamelafox pamelafox deleted the loadazdenv branch September 26, 2024 23:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants