Releases: Azure/gpt-rag-ingestion
v2.3.2
Release Notes v2.3.2
Changed
Default INDEXER_MAX_CONCURRENCY lowered to 2
Reduced the default concurrency for all indexers (blob storage, SharePoint, NL2SQL) from 8/4 to 2.
This reduces memory pressure and rate-limit contention when processing large documents.
Fixed
Dashboard retries column showing inflated count during processing
Display now shows actual retries (processingAttempts - 1) instead of the pre-incremented attempt counter.
Cost estimate displayed with excessive decimal places
Rounded to 2 decimal places in both frontend and backend.
Stale running jobs stuck forever after container crash/restart
Admin API detects runs started more than 2 hours ago without finishing and marks them as interrupted.
Literal \u21b3 text in 429 rate-limit display
Fixed JSX rendering to show the actual arrow character.
Unclear 429 rate-limit display
Changed to: 429 Rate-limit — N retries, Xm Ys wait
Upgrading from Earlier Versions
If you are running an older version of the data ingestion component (e.g., v2.0.6, v2.1.0, v2.2.x) and want to upgrade to v2.3.2, follow the instructions below before running azd deploy. The required steps depend on your current version. Review each section that applies to your upgrade path.
Upgrading from v2.0.x or v2.1.x (versions prior to v2.2.0)
These versions predate the document-level security enforcement feature introduced in v2.2.0. The following steps are required:
1. Add RBAC Security Fields to Azure AI Search Index
Starting with v2.2.0, the ingestion pipeline writes security metadata to the search index. If your index was created before this version, you must manually add the following fields using the Azure Portal JSON editor or the Azure AI Search REST API:
{
"name": "metadata_security_user_ids",
"type": "Collection(Edm.String)",
"filterable": true,
"searchable": false,
"sortable": false,
"facetable": false
},
{
"name": "metadata_security_group_ids",
"type": "Collection(Edm.String)",
"filterable": true,
"searchable": false,
"sortable": false,
"facetable": false
},
{
"name": "metadata_security_rbac_scope",
"type": "Edm.String",
"filterable": true,
"searchable": false,
"sortable": false,
"facetable": false
}
How to add fields via Azure Portal:
- Navigate to your Azure AI Search resource.
- Go to Indexes and select your index (e.g.,
ragindex). - Click Edit JSON (top toolbar).
- In the
fieldsarray, add the three field definitions above. - Click Save.
Note: Azure AI Search allows adding fields to an existing index, but does not allow modifying or removing fields once they exist.
2. Update Container Port Configuration
Starting with v2.2.1, the container uses port 8080 instead of the previously common port 80. If your Azure Container App is configured for port 80, you must update it:
- Navigate to your Azure Container App resource (e.g.,
ca-xxxx-dataingest). - Go to Ingress and change the Target port to
8080. - Go to Containers → Health probes and update:
- Liveness probe port:
8080 - Readiness probe port:
8080 - Startup probe port (if configured):
8080
- Save the configuration and wait for a new revision to deploy.
Alternatively, using Azure CLI:
az containerapp ingress update \
--name <your-container-app-name> \
--resource-group <your-resource-group> \
--target-port 8080
Upgrading from v2.2.0
1. Update Container Port Configuration
If you are on v2.2.0, you still need to update the container port from 80 to 8080 (introduced in v2.2.1). Follow the steps in the previous section.
2. RBAC Role Assignment for Elevated Read
Starting with v2.2.5, the ingestion service uses elevated-read operations to query the index without permission filtering (required when permissionFilterOption is enabled). The managed identity running the Container App must have the Search Index Data Contributor role on the Azure AI Search resource.
az role assignment create \
--assignee <managed-identity-object-id> \
--role "Search Index Data Contributor" \
--scope /subscriptions/<sub-id>/resourceGroups/<rg>/providers/Microsoft.Search/searchServices/<search-service>
The
Search Index Data Contributorrole includes theelevatedOperations/readRBAC data action required for thex-ms-enable-elevated-readheader.
Upgrading from v2.2.1, v2.2.2, v2.2.3, or v2.2.4
1. RBAC Role Assignment for Elevated Read
As noted above, v2.2.5 introduced elevated-read headers. Ensure the Search Index Data Contributor role is assigned to the Container App managed identity.
2. (Optional) Configure Vision Deployment
If you use multimodal processing and your primary chat model does not support vision (e.g., gpt-5-nano), configure the VISION_DEPLOYMENT_NAME setting in Azure App Configuration to point to a vision-capable model (e.g., gpt-4o-mini). This was introduced in v2.2.4.
Upgrading from v2.2.5
1. Verify Azure AI Foundry Account
Starting with v2.3.0, the default document analysis path uses Azure AI Foundry Content Understanding (prebuilt-layout) instead of Document Intelligence, reducing costs by ~69% per page. Ensure you have:
- An Azure AI Foundry account configured.
- The
AI_FOUNDRY_ACCOUNT_ENDPOINTsetting in App Configuration.
If you prefer to continue using Document Intelligence, setUSE_DOCUMENT_INTELLIGENCE=truein App Configuration.
Resource Recommendations for Processing Large Files
Large document processing (e.g., 100+ page PDFs, large spreadsheets) can be memory-intensive. The following container resource configuration is recommended:
| Component | CPU | Memory |
|---|---|---|
| Data Ingestion | 1.0 | 3 GB |
| Orchestrator | 0.5 | 1 GB |
| Frontend | 0.5 | 1 GB |
If you are on a shared workload profile with limited CPU capacity (e.g., 4 CPUs total), ensure the sum of all container CPU allocations does not exceed the profile limit.
To update container resources via CLI:
az containerapp update \
--name <your-container-app-name> \
--resource-group <your-resource-group> \
--cpu 1.0 \
--memory 3Gi
Post-Deployment Verification
After deployment, verify the running version:
az containerapp show \
--name <your-container-app-name> \
--resource-group <your-resource-group> \
--query "properties.template.containers[0].image" \
-o tsv
The image tag corresponds to the Git commit SHA. You can map it to a release by checking the repository tags:
git log --oneline --decorate v2.3.2
To validate the ingestion pipeline:
- Upload a small test file to the documents container.
- Monitor the ingestion logs via the admin dashboard (
/dashboard) or Container App logs. - Verify the document appears in the search index.
Summary by Source Version
| Version | Port Change | Index Fields | RBAC Role |
|---|---|---|---|
| v2.0.x | Required | Required | Required |
| v2.1.x | Required | Required | Required |
| v2.2.0 | Required | Yes | Required |
| v2.2.1–v2.2.4 | Yes | Yes | Required |
| v2.2.5 | Yes | Yes | Yes |
v2.3.1
Added
- Processing timings breakdown in dashboard: Each file processing run now records per-phase timing data (download, analysis, chunking + embeddings, index upload) and stores it in the file log. The admin dashboard detail dialog displays a stacked color bar and a legend with durations for each phase, plus a total. Rate-limit retry wait time (429 backoff) is tracked separately and shown as a sub-item under chunking + embeddings. Run history entries also show a Duration column. This makes it easy to identify bottlenecks when processing large documents.
- 429 rate-limit count and improved display: The number of 429 (Too Many Requests) retries is now tracked per file and displayed alongside the rate-limit wait time in the format "N× 429 Rate-limit wait (duration)". Both the count and the wait time are only shown when retries actually occurred.
- Per-file cost estimation: Processing cost is now estimated per file, broken down by service: analysis (Content Understanding or Document Intelligence, per page), Azure OpenAI Embeddings (per token), and Azure OpenAI Completions (per token, when applicable). Unit prices are configurable via App Config keys (
COST_PER_PAGE_ANALYSIS,COST_PER_1K_EMBEDDING_TOKENS,COST_PER_1K_COMPLETION_INPUT_TOKENS,COST_PER_1K_COMPLETION_OUTPUT_TOKENS) with sensible defaults based on April 2026 list pricing. The dashboard displays the breakdown in a dedicated "Cost Estimate" section with a short disclaimer. - Automatic PDF splitting for large documents: PDFs exceeding the Azure analysis service page limit (configurable via
MAX_PAGES_PER_ANALYSIS, default 300) are now automatically split into smaller parts before analysis. Each part is analyzed separately and the markdown results are concatenated with correct absolute page numbering. This preventsInputPageCountExceedederrors and is transparent to the rest of the pipeline — sameparent_id, same chunk keys, same search index behavior. Requires the newpypdfdependency. - Memory guard before blob download: Before downloading a blob for processing, the indexer now checks the file size against available container memory (via cgroups +
psutil). If the estimated peak memory usage would exceed available capacity, processing is skipped with a descriptive error instead of risking an OOM crash that restarts the container. Configurable viaMEMORY_SAFETY_MULTIPLIER(default 4.0) andMEMORY_SAFETY_THRESHOLD(default 0.85). - Temp file download for large PDFs: PDFs larger than 10 MB are now downloaded to a temporary file on disk instead of being held entirely in memory. The auto-split logic operates on these temp files, keeping peak memory usage bounded to one part at a time (~200 MB) instead of the full document (~1.5 GB+).
Fixed
_as_datetimeNameError crashing every indexer run: The helper function_as_datetimewas called in four places withinblob_storage_indexer.pybut was never defined, causing aNameErroron every run after the retry-tracking feature was added. Added the missing function definition at module level.- Orphaned
valuevariable causing NameError in memory guard: A leftover code block from an earlier refactor inside_check_memory_capacity()referenced an undefined variablevalue, crashing the memory guard check before any file could be processed. Removed the dead code. - Dashboard unresponsive during file processing: The FastAPI event loop was blocked by synchronous chunking and document iteration calls, making the admin dashboard and health endpoints unresponsive for the entire duration of large file processing (20+ minutes). Wrapped the blocking
list(docs_iter)calls withasyncio.to_thread()so they run in a worker thread without blocking the event loop. - Stale error field on successful re-processing: When a file was re-processed successfully after previous failures, the top-level
errorfield in the file log retained the last error message despitestatusbeingsuccess. The field is now explicitly cleared tonullon success.
See CHANGELOG.md for details.
v2.3.0
What's New
- Per-file retry tracking and automatic block list: Files exceeding
MAX_FILE_PROCESSING_ATTEMPTS(default 3) are automatically blocked. Applies to both blob storage and SharePoint indexers. - Content Understanding integration: New
ContentUnderstandingClientusing Azure AI Foundryprebuilt-layoutas default analysis path (~69% cost reduction). - Admin dashboard: React frontend at
/dashboardwith paginated job/file tables, search, filters, and unblock action. - Scheduled log cleanup: Automatic old run-summary blob cleanup via APScheduler (
CRON_RUN_LOG_CLEANUP, default hourly).
v2.2.5
Fixed
- Ingestion re-indexes every file when
permissionFilterOptionis enabled: When the Azure AI Search index haspermissionFilterOptionset toenabled, allsearch()andget_document()calls returned empty or 404 results because there is no end-user token during service-side ingestion. This caused_load_latest_index_state()to return an empty state map, making the indexer treat every blob as new and triggering a full re-index on every run with significant cost implications. Fixed by adding thex-ms-enable-elevated-read: trueheader to all index query operations across blob storage indexer, SharePoint indexer, SharePoint purger, NL2SQL purger, and the AI Search client utility. Also pinnedapi_versionto2025-11-01-previewon allSearchClientinstances, which is required for the elevated-read header to be recognized by the service. Requires theSearch Index Data Contributorrole (which includes theelevatedOperations/readRBAC data action).
v2.2.4
What's Changed
Added
- Vision deployment configuration (
VISION_DEPLOYMENT_NAME): Added a new optional App Configuration settingVISION_DEPLOYMENT_NAMEthat specifies the Azure OpenAI deployment to use for multimodal (image + text) requests such as figure caption generation. When set,get_completion()automatically routes vision requests to this deployment, allowing the use of a vision-capable model (e.g.,gpt-4o-mini) separately from the primary chat model. Falls back toCHAT_DEPLOYMENT_NAMEif not configured.
Fixed
- Empty image captions when chat model lacks vision support: When
CHAT_DEPLOYMENT_NAMEpointed to a model without vision capabilities (e.g.,gpt-5-nano),get_completion()returnedNonesilently for multimodal requests, producing emptyimageCaptionsin the search index. Added a guard in bothAzureOpenAIClient.get_completion()(logs a warning withfinish_reasonand model name) andMultimodalChunker._generate_caption_for_figure()(falls back to "No caption available.") to prevent empty captions from propagating to the index.
v2.2.3
What's Changed
Changed
- Default chunk overlap increased to 200 tokens: Changed the default value of TOKEN_OVERLAP from 100 to 200 across all chunkers (doc_analysis, json, langchain, nl2sql, transcription), improving context continuity between chunks during document ingestion.
- Cron fallback defaults for blob ingestion jobs: Added cron fallback defaults when CRON_RUN_BLOB_INDEX and CRON_RUN_BLOB_PURGE are not configured.
Fixed
- Multimodal image captions not generated: Added vision support to get_completion() by accepting an optional image_base64 parameter and constructing multimodal messages when an image is provided.
- Azure OpenAI API compatibility with newer models: Replaced max_tokens with max_completion_tokens in the chat completions API call, fixing a 400 error with newer models (e.g., GPT-4o).
Repository
- Added .github/copilot-instructions.md with development and release workflow rules.
v2.2.2
v2.2.1
What's Changed
Added
- Added robust retry logic with exponential backoff for Azure OpenAI calls, handling
429andRetry-Afterresponses. Retry behavior is now configurable viaOPENAI_RETRY_*andOPENAI_SDK_MAX_RETRIES, improving reliability for large spreadsheet ingestion jobs.
Changed
- Standardized on using a non-privileged port (
8080) instead of port80, following container best practices and improving stability of long-running ingestion workloads.
Full Changelog: v2.2.0...v2.2.1
v2.2.0
v2.1.0
What's Changed
Added
- Added support for SharePoint Lists, expanding ingestion capabilities beyond document libraries.
Changed
- Improved robustness of Blob Storage indexing and enhanced data ingestion logging.
- Refined chunking logic to ensure consistent and reliable chunk ID incrementation.
- Updated the Azure CLI version in the development container (from 1.2.7 to 1.2.9) for improved tooling support.
Full Changelog: v2.0.6...v2.1.0