Releases: Azure-Samples/azure-search-openai-demo
2024-03-06: Token-based text splitting for data ingestion
The highlight of this release is a new token-based text splitter, used by the prepdocs script when splitting content into chunks for the search index. The previous algorithm was based solely on character count, which meant that our prepdocs script did not work well for non-English documents or any documents which resulted in a higher than usual amount of tokens. If you do experience any regression in splitting quality as a result of this change, please file an issue.
What's Changed
- Improve text splitter for non-English documents by @tonybaloney in #1326
- Restrict GitHub workflows run by @john0isaac in #1366
- Improvements to load balancer setup script by @pamelafox in #1348
- Update productionizing.md with link to search service size guide by @pamelafox in #1354
- Update README.md to delete old links by @pamelafox in #1372
- Update deploy_features.md link by @pamelafox in #1373
- Add suggestion to use [azd auth login] in the free low-cost deploy tutorial by @elbruno in #1214
- Bump the python-requirements group with 18 updates by @dependabot in #1368
New Contributors
Full Changelog: 2024-03-01...2024-03-06
2024-03-01: Local HTML parser
This release adds the option of using a local HTML parser instead of Azure Document Intelligence, for developers who want to reduce costs or have more flexibility in the HTML parsing. See the docs for information on enabling the parser.
What's Changed
- Readme refactor by @pamelafox in #1346
- feat: add local html parser by @john0isaac in #1351
New Contributors
- @john0isaac made their first contribution in #1351
Full Changelog: 2024-02-28...2024-03-01
2024-02-28: OpenTelemetry instrumentation for OpenAI calls
The primary change in this release is the integration of the opentelemetry-instrumentation-openai package for tracing OpenAI calls. You should now see traces for all calls made by the OpenAI SDK in Azure Monitor.
We are still using the HTTPX instrumentation package as well, which should also trace the calls since OpenAI SDK uses HTTPX for HTTP calls behind the scenes, but they recently made a change that's resulting in inconsistent HTTPX tracing. You may sometimes see multiple traces for same call, one from the HTTPX instrumentor and one from the new OpenAI instrumentor.
What's Changed
- Implement better LLM tracing with llmetry OpenAI instrumentation by @tonybaloney in #1319
- Improve PR template to reference tutorial changes and contributing checklist by @pamelafox in #1335
- Update launch.json with python backend path by @pamelafox in #1022
- Update app diagram by @pamelafox in #1338
- Show users a special message for context length errors by @pamelafox in #1337
Full Changelog: 2024-02-27...2024-02-28
2024-02-27: HTML parsing via Azure Document Intelligence
We updated prepdocs.py so that HTML files will be processed by Azure Document Intelligence. Here's a stream demonstrating ingestion of HTML docs. You can just update to latest, put HTML files in the data/ folder, and they will get picked up.
What's Changed
- Add HTML parsing via Azure Document Intelligence by @pamelafox in #1325
Full Changelog: 2024-02-23...2024-02-27
2024-02-23: AAD for Computer Vision API, Load balancer script
This release updates our Python dependencies, switches to using AAD for authenticating to the Computer Vision API, and adds a script for easier integration with the ContainerApps-based OpenAI load balancer.
If you are using the GPT-4-vision, then you should run "azd provision" to get the new roles needed to authenticate to the Computer Vision API.
What's Changed
- Add setup script for setting up ACA loadbalancer by @diberry in #1310
- Use AAD instead of key vault for Computer Vision API by @pamelafox in #1062
- Bump the github-actions group with 1 update by @dependabot in #1290
- Bump the python-requirements group with 10 updates by @dependabot in #1291
Full Changelog: 2024-02-16...2024-02-23
2024-02-16: Bug fixes (PDF page jumping) and doc improvements
This release includes a variety of bug fixes and documentation improvements. The most notable fix corrects a regression issue where PDFs stopped jumping to their page numbers. This release also includes an environment variable setting for specifying a custom Azure OpenAI base URL, which is helpful if you're hosting an OpenAI proxy on APIM or ACA.
## What's Changed
- Update README.md with new login doc link by @pamelafox in #1277
- Fix Markdown typos in App Service Debugging doc by @pamelafox in #1280
- Adding support for custom Azure deployments by @andredewes in #1273
- Add missing env vars needed for GPT4v deployment by @pamelafox in #1276
- Clarify that gpt-4-v cant be used with integrated vectorization by @pamelafox in #1282
- Get PDFs to jump to their pages again by @pamelafox in #1283
New Contributors
- @andredewes made their first contribution in #1273
Full Changelog: 2024-02-15b...2024-02-16
2024-02-15b: Azure Document Intelligence upgrade (new doc types)
This releases updates Azure Document Intelligence to use the new SDK and preview API version. We are choosing to use the preview version due to its support for many more doc types (docx, xlsx, pptx, images), and the huge demand for parsing those types. The new API version is only supported in a limited set of regions, so you will be prompted to create a new Azure Document Intelligence service and select a supported region upon your next provision. You can then delete the former service. If your existing service is already in a supported region, follow the steps for reusing an existing Document Intelligence service before running azd up or running azd provision.
What's Changed
- Support more doc formats with new documentintelligence SDK by @pamelafox in #1224
Full Changelog: 2024-02-15...2024-02-15b
2024-02-15: Support for CI/CD with GitHub Actions and Azure DevOps
This release includes a GitHub Actions workflow file and Azure DevOps pipeline file to enable continuous deployment using azd up and your existing environments. See documentation on setting up continuous deployment. We discovered one azd issue which requires a workaround for now, but the azd team hopes to have a fix for that released this week.
This release also includes a few bug fixes related to authentication.
What's Changed
- Bump the python-requirements group with 21 updates by @dependabot in #1267
- Fix ADLS Gen2 scripts and authentication checks by @mattgotteiner in #1272
- Add workflow for azd deployment on GitHub Actions and ADO by @vhvb1989 in #1083
- New documentation on App Service debugging, and some docs organizational changes by @pamelafox in #1261
- Proper event loop setup by @pamelafox in #1275
Full Changelog: 2024-02-09...2024-02-15
2024-02-09: Integrated vectorization
This release includes a big new optional feature, Azure AI Search integrated vectorization, which automates the data ingestion entirely in the cloud, using indexers, skillsets, and embedding models. To try it out, follow the steps here:
https://github.com/Azure-Samples/azure-search-openai-demo?tab=readme-ov-file#enabling-integrated-vectorization
Learn more in the ingestion guide:
https://github.com/Azure-Samples/azure-search-openai-demo/blob/main/docs/data_ingestion.md#overview-of-integrated-vectorization
Please file any issues you encounter if you try out this new feature. It's in preview mode now, so the team is eager for feedback.
What's Changed
- Improvements to token counting for images and type annotations by @pamelafox in #1244
- Add integrated vectorizer support by @srbalakr in #1159
- Only output identity for non-free SKU by @pamelafox in #1258
- Add document describing the HTTP chat protocol by @pamelafox in #1193
- Better conditional around searchservice principalId output by @pamelafox in #1260
Full Changelog: 2024-02-06...2024-02-09
2024-02-06: Temperature settings, AI Search SDK version
This release adds a slider for temperature, sets default temperature to 0.3 for both chat and ask, and updates the Azure AI Search SDK version.
What's Changed
- Change default temp to 0.3 for final chat completion call in chat approaches by @pamelafox in #1238
- Update Azure AI Search SDK by @pamelafox in #1241
- Added FluentUI Slider to chat settings panel for overriding temperature by @bscheurm in #54
New Contributors
Full Changelog: 2024-02-05...2024-02-06