-
Notifications
You must be signed in to change notification settings - Fork 5.1k
Increase embedding TPM capacity and add note in cloud ingestion guide #2846
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR increases the default embedding model capacity to address rate limit issues encountered during cloud ingestion. The default capacity changes from 30K TPM to 200K TPM, which aligns with typical Azure OpenAI quota defaults for embedding models.
Key Changes:
- Increased default embedding deployment capacity from 30 to 200 in infrastructure configuration
- Added optional configuration step in cloud ingestion guide recommending capacity increase to 400
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| infra/main.bicep | Updated default embedding deployment capacity from 30 to 200 |
| docs/data_ingestion.md | Added recommended step to increase embedding capacity to 400 for cloud ingestion, renumbered subsequent steps accordingly |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| - name: Run markdownlint | ||
| uses: articulate/actions-markdownlint@v1 | ||
| - name: Run markdownlint-cli2 | ||
| uses: DavidAnson/markdownlint-cli2-action@v21 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We suddenly started getting markdownlint errors, and in debugging, I realized that our markdownlint action was deprecated in favor of this one. David Anson also authors the VS Code extension that we recommend in the repo configuration, so this makes CI consistent with VS Code errors, in theory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds good
| globs: | | ||
| **/*.md | ||
| !data/** | ||
| !.github/** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default markdown files like SECURITY.md are riddled with issues, and we seemed to ignore them before, so I ignored them again here. Could fix em up in future.
Purpose
Some developers have tried out the cloud ingestion approach already and ran into errors due to rate limits with the embedding model. This PR increases the embedding TPM default capacity to 200 and adds a note to the ingestion guide about how to increase even farther. It seems that this is a safe capacity to request, as Azure OpenAI quotas seem to default quite high for embedding models, at least in my account.
Does this introduce a breaking change?
When developers merge from main and run the server, azd up, or azd deploy, will this produce an error?
If you're not sure, try it out on an old environment.
Does this require changes to learn.microsoft.com docs?
This repository is referenced by this tutorial
which includes deployment, settings and usage instructions. If text or screenshot need to change in the tutorial,
check the box below and notify the tutorial author. A Microsoft employee can do this for you if you're an external contributor.
Type of change
Code quality checklist
See CONTRIBUTING.md for more details.
python -m pytest).python -m pytest --covto verify 100% coverage of added linespython -m mypyto check for type errorsruffandblackmanually on my code.