-
Notifications
You must be signed in to change notification settings - Fork 5k
Refactor scripts to avoid anti-patterns, redundancy #1986
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…der instead of cwd
… own folder instead of cwd" This reverts commit 40287f2.
Check Broken URLsWe have automatically detected the following broken URLs in your files. Review and fix the paths to resolve this issue. Check the file paths and associated broken URLs inside them. For more details, check our Contributing Guide.
|
Check Broken PathsWe have automatically detected the following broken relative paths in your files. Check the file paths and associated broken paths inside them. For more details, check our Contributing Guide.
|
Check Broken PathsWe have automatically detected the following broken relative paths in your files. Check the file paths and associated broken paths inside them. For more details, check our Contributing Guide.
|
Check Broken PathsWe have automatically detected the following broken relative paths in your files. Check the file paths and associated broken paths inside them. For more details, check our Contributing Guide.
|
Check Broken PathsWe have automatically detected the following broken relative paths in your files. Check the file paths and associated broken paths inside them. For more details, check our Contributing Guide.
|
html_parser: Parser | ||
pdf_parser: Parser | ||
doc_int_parser: DocumentAnalysisParser | ||
sentence_text_splitter = SentenceTextSplitter(has_image_embeddings=search_images) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed this code around a bit as I found some issues once I wrote tests that touched it.
# Set our own logger levels to INFO by default | ||
app_level = os.getenv("APP_LOG_LEVEL", "INFO") | ||
app.logger.setLevel(os.getenv("APP_LOG_LEVEL", app_level)) | ||
logging.getLogger("scripts").setLevel(app_level) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since prepdocs uses the "scripts" logger, I wanted to make sure we also see its logs when using user upload feature.
Though I wonder if I should give it a different name, like "ragapp".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think scripts is an acceptable name for now - we can change it later if it's not working
logger = logging.getLogger("scripts") | ||
|
||
|
||
def load_azd_env(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file actually shows up in two places, both here and in scripts folder, for convenience.
from load_azd_env import load_azd_env | ||
|
||
# WEBSITE_HOSTNAME is always set by App Service, RUNNING_IN_PRODUCTION is set in main.bicep | ||
RUNNING_ON_AZURE = os.getenv("WEBSITE_HOSTNAME") is not None or os.getenv("RUNNING_IN_PRODUCTION") is not None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code also exists in two places-
- here, so that we load the environment when starting up the app locally
- in app.py, so that we can decide what kind of credential to use
parser = argparse.ArgumentParser( | ||
description="Prepare documents by extracting content from PDFs, splitting content into sections, uploading to blob storage, and indexing in a search index.", | ||
epilog="Example: prepdocs.py '.\\data\*' --storageaccount myaccount --container mycontainer --searchservice mysearch --index myindex -v", | ||
epilog="Example: prepdocs.py '.\\data\*' -v", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I only kept the arguments that did not come directly from azd env
".heic": FileProcessor(doc_int_parser, sentence_text_splitter), | ||
} | ||
) | ||
return file_processors |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
^^ all of that code change was due to issues found via tests/mypy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yay mypy
) | ||
parser.add_argument( | ||
"--useintvectorization", | ||
"--searchserviceassignedid", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I cannot find ANY evidence of us telling folks how to use this, or even actually using it. Do we even need this arg??
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
integratedvectorization sets search_user_assigned_identity but then doesnt use it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, let's create a follow-up pr to remove this
event["message"] = event["delta"]; | ||
askResponse = event as ChatAppResponse; | ||
} else if (event["delta"]["content"]) { | ||
} else if (event["delta"] && event["delta"]["content"]) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unrelated error that I found while presenting. This error meant we arent currently rendering errors correctly, as {error: } responses dont contain a delta key.
"value": "${AZURE_OPENAI_EMB_DIMENSIONS}" | ||
}, | ||
"gpt4vDeploymentCapacity":{ | ||
"value": "${AZURE_OPENAI_GPT4V_DEPLOYMENT_CAPACITY=10}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also unrelated, I needed to be able to set capacity to get the vision endpoint deployed to our restrictive tenants. I can move to other branch if desired.
|
||
- [Requirements](#requirements) | ||
- [Setting up Microsoft Entra ID Apps](#setting-up-entra-id-apps) | ||
- [Setting up Microsoft Entra applications](#setting-up-microsoft-entra-applications) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file is just a copy paste of login_and_acl.md
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we eventually remove this? OK to leave it for now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume you mean, can we we eventually remove the duplicate copy of login_and_acl.md?
Arun wants the ACL feature to show up as a completely separate sample in the Microsoft Learn Samples browser, which requires the copy. If it's annoying to maintain, we could discuss removing it, and it wouldn't show up in the samples browser anymore.
logger.info("Connecting to Azure services using the azd credential for tenant %s", args.tenantid) | ||
azd_credential = AzureDeveloperCliCredential(tenant_id=args.tenantid, process_timeout=60) | ||
# Use the current user identity to connect to Azure services. See infra/main.bicep for role assignments. | ||
if tenant_id := os.getenv("AZURE_TENANT_ID"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the idea of using the env vars instead of args - great improvement!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, this is amazing
Purpose
This PR refactors our scripts in several key ways:
azd env get-values
to the current environment. Instead, we only load the current environment into the currently active Python environment, or if we're only accessing a few variables in a shell script, we useazd env get-value
to fetch the value.Does this introduce a breaking change?
When developers merge from main and run the server, azd up, or azd deploy, will this produce an error?
If you're not sure, try it out on an old environment.
Does this require changes to learn.microsoft.com docs?
This repository is referenced by this tutorial
which includes deployment, settings and usage instructions. If text or screenshot need to change in the tutorial,
check the box below and notify the tutorial author. A Microsoft employee can do this for you if you're an external contributor.
Type of change
Code quality checklist
See CONTRIBUTING.md for more details.
python -m pytest
).python -m pytest --cov
to verify 100% coverage of added linespython -m mypy
to check for type errorsruff
andblack
manually on my code.