Skip to content

Releases: Azure/gpt-rag-ingestion

v2.3.2

08 Apr 14:35

Choose a tag to compare

Release Notes v2.3.2

Changed

Default INDEXER_MAX_CONCURRENCY lowered to 2

Reduced the default concurrency for all indexers (blob storage, SharePoint, NL2SQL) from 8/4 to 2.
This reduces memory pressure and rate-limit contention when processing large documents.

Fixed

Dashboard retries column showing inflated count during processing

Display now shows actual retries (processingAttempts - 1) instead of the pre-incremented attempt counter.

Cost estimate displayed with excessive decimal places

Rounded to 2 decimal places in both frontend and backend.

Stale running jobs stuck forever after container crash/restart

Admin API detects runs started more than 2 hours ago without finishing and marks them as interrupted.

Literal \u21b3 text in 429 rate-limit display

Fixed JSX rendering to show the actual arrow character.

Unclear 429 rate-limit display

Changed to: 429 Rate-limit — N retries, Xm Ys wait


Upgrading from Earlier Versions

If you are running an older version of the data ingestion component (e.g., v2.0.6, v2.1.0, v2.2.x) and want to upgrade to v2.3.2, follow the instructions below before running azd deploy. The required steps depend on your current version. Review each section that applies to your upgrade path.
 

Upgrading from v2.0.x or v2.1.x (versions prior to v2.2.0)

 
These versions predate the document-level security enforcement feature introduced in v2.2.0. The following steps are required:
 

1. Add RBAC Security Fields to Azure AI Search Index

 
Starting with v2.2.0, the ingestion pipeline writes security metadata to the search index. If your index was created before this version, you must manually add the following fields using the Azure Portal JSON editor or the Azure AI Search REST API:
 

{
"name": "metadata_security_user_ids",
"type": "Collection(Edm.String)",
"filterable": true,
"searchable": false,
"sortable": false,
"facetable": false
},
{
"name": "metadata_security_group_ids",
"type": "Collection(Edm.String)",
"filterable": true,
"searchable": false,
"sortable": false,
"facetable": false
},
{
"name": "metadata_security_rbac_scope",
"type": "Edm.String",
"filterable": true,
"searchable": false,
"sortable": false,
"facetable": false
}

 
How to add fields via Azure Portal:
 

  1. Navigate to your Azure AI Search resource.
  2. Go to Indexes and select your index (e.g., ragindex).
  3. Click Edit JSON (top toolbar).
  4. In the fields array, add the three field definitions above.
  5. Click Save.
     

Note: Azure AI Search allows adding fields to an existing index, but does not allow modifying or removing fields once they exist.
 

2. Update Container Port Configuration

 
Starting with v2.2.1, the container uses port 8080 instead of the previously common port 80. If your Azure Container App is configured for port 80, you must update it:
 

  1. Navigate to your Azure Container App resource (e.g., ca-xxxx-dataingest).
  2. Go to Ingress and change the Target port to 8080.
  3. Go to ContainersHealth probes and update:
  • Liveness probe port: 8080
  • Readiness probe port: 8080
  • Startup probe port (if configured): 8080
  1. Save the configuration and wait for a new revision to deploy.
     
    Alternatively, using Azure CLI:
     
az containerapp ingress update \
--name <your-container-app-name> \
--resource-group <your-resource-group> \
--target-port 8080

 

Upgrading from v2.2.0

 

1. Update Container Port Configuration

 
If you are on v2.2.0, you still need to update the container port from 80 to 8080 (introduced in v2.2.1). Follow the steps in the previous section.
 

2. RBAC Role Assignment for Elevated Read

 
Starting with v2.2.5, the ingestion service uses elevated-read operations to query the index without permission filtering (required when permissionFilterOption is enabled). The managed identity running the Container App must have the Search Index Data Contributor role on the Azure AI Search resource.
 

az role assignment create \
--assignee <managed-identity-object-id> \
--role "Search Index Data Contributor" \
--scope /subscriptions/<sub-id>/resourceGroups/<rg>/providers/Microsoft.Search/searchServices/<search-service>

 

The Search Index Data Contributor role includes the elevatedOperations/read RBAC data action required for the x-ms-enable-elevated-read header.
 

Upgrading from v2.2.1, v2.2.2, v2.2.3, or v2.2.4

 

1. RBAC Role Assignment for Elevated Read

 
As noted above, v2.2.5 introduced elevated-read headers. Ensure the Search Index Data Contributor role is assigned to the Container App managed identity.
 

2. (Optional) Configure Vision Deployment

 
If you use multimodal processing and your primary chat model does not support vision (e.g., gpt-5-nano), configure the VISION_DEPLOYMENT_NAME setting in Azure App Configuration to point to a vision-capable model (e.g., gpt-4o-mini). This was introduced in v2.2.4.
 

Upgrading from v2.2.5

 

1. Verify Azure AI Foundry Account

 
Starting with v2.3.0, the default document analysis path uses Azure AI Foundry Content Understanding (prebuilt-layout) instead of Document Intelligence, reducing costs by ~69% per page. Ensure you have:
 

  • An Azure AI Foundry account configured.
  • The AI_FOUNDRY_ACCOUNT_ENDPOINT setting in App Configuration.
     
    If you prefer to continue using Document Intelligence, set USE_DOCUMENT_INTELLIGENCE=true in App Configuration.
     

Resource Recommendations for Processing Large Files

 
Large document processing (e.g., 100+ page PDFs, large spreadsheets) can be memory-intensive. The following container resource configuration is recommended:
 

Component CPU Memory
Data Ingestion 1.0 3 GB
Orchestrator 0.5 1 GB
Frontend 0.5 1 GB

 
If you are on a shared workload profile with limited CPU capacity (e.g., 4 CPUs total), ensure the sum of all container CPU allocations does not exceed the profile limit.
 
To update container resources via CLI:
 

az containerapp update \
--name <your-container-app-name> \
--resource-group <your-resource-group> \
--cpu 1.0 \
--memory 3Gi

 

Post-Deployment Verification

 
After deployment, verify the running version:
 

az containerapp show \
--name <your-container-app-name> \
--resource-group <your-resource-group> \
--query "properties.template.containers[0].image" \
-o tsv

 
The image tag corresponds to the Git commit SHA. You can map it to a release by checking the repository tags:
 

git log --oneline --decorate v2.3.2

 
To validate the ingestion pipeline:
 

  1. Upload a small test file to the documents container.
  2. Monitor the ingestion logs via the admin dashboard (/dashboard) or Container App logs.
  3. Verify the document appears in the search index.
     

Summary by Source Version

 

Version Port Change Index Fields RBAC Role
v2.0.x Required Required Required
v2.1.x Required Required Required
v2.2.0 Required Yes Required
v2.2.1–v2.2.4 Yes Yes Required
v2.2.5 Yes Yes Yes

v2.3.1

08 Apr 09:39

Choose a tag to compare

Added

  • Processing timings breakdown in dashboard: Each file processing run now records per-phase timing data (download, analysis, chunking + embeddings, index upload) and stores it in the file log. The admin dashboard detail dialog displays a stacked color bar and a legend with durations for each phase, plus a total. Rate-limit retry wait time (429 backoff) is tracked separately and shown as a sub-item under chunking + embeddings. Run history entries also show a Duration column. This makes it easy to identify bottlenecks when processing large documents.
  • 429 rate-limit count and improved display: The number of 429 (Too Many Requests) retries is now tracked per file and displayed alongside the rate-limit wait time in the format "N× 429 Rate-limit wait (duration)". Both the count and the wait time are only shown when retries actually occurred.
  • Per-file cost estimation: Processing cost is now estimated per file, broken down by service: analysis (Content Understanding or Document Intelligence, per page), Azure OpenAI Embeddings (per token), and Azure OpenAI Completions (per token, when applicable). Unit prices are configurable via App Config keys (COST_PER_PAGE_ANALYSIS, COST_PER_1K_EMBEDDING_TOKENS, COST_PER_1K_COMPLETION_INPUT_TOKENS, COST_PER_1K_COMPLETION_OUTPUT_TOKENS) with sensible defaults based on April 2026 list pricing. The dashboard displays the breakdown in a dedicated "Cost Estimate" section with a short disclaimer.
  • Automatic PDF splitting for large documents: PDFs exceeding the Azure analysis service page limit (configurable via MAX_PAGES_PER_ANALYSIS, default 300) are now automatically split into smaller parts before analysis. Each part is analyzed separately and the markdown results are concatenated with correct absolute page numbering. This prevents InputPageCountExceeded errors and is transparent to the rest of the pipeline — same parent_id, same chunk keys, same search index behavior. Requires the new pypdf dependency.
  • Memory guard before blob download: Before downloading a blob for processing, the indexer now checks the file size against available container memory (via cgroups + psutil). If the estimated peak memory usage would exceed available capacity, processing is skipped with a descriptive error instead of risking an OOM crash that restarts the container. Configurable via MEMORY_SAFETY_MULTIPLIER (default 4.0) and MEMORY_SAFETY_THRESHOLD (default 0.85).
  • Temp file download for large PDFs: PDFs larger than 10 MB are now downloaded to a temporary file on disk instead of being held entirely in memory. The auto-split logic operates on these temp files, keeping peak memory usage bounded to one part at a time (~200 MB) instead of the full document (~1.5 GB+).

Fixed

  • _as_datetime NameError crashing every indexer run: The helper function _as_datetime was called in four places within blob_storage_indexer.py but was never defined, causing a NameError on every run after the retry-tracking feature was added. Added the missing function definition at module level.
  • Orphaned value variable causing NameError in memory guard: A leftover code block from an earlier refactor inside _check_memory_capacity() referenced an undefined variable value, crashing the memory guard check before any file could be processed. Removed the dead code.
  • Dashboard unresponsive during file processing: The FastAPI event loop was blocked by synchronous chunking and document iteration calls, making the admin dashboard and health endpoints unresponsive for the entire duration of large file processing (20+ minutes). Wrapped the blocking list(docs_iter) calls with asyncio.to_thread() so they run in a worker thread without blocking the event loop.
  • Stale error field on successful re-processing: When a file was re-processed successfully after previous failures, the top-level error field in the file log retained the last error message despite status being success. The field is now explicitly cleared to null on success.

See CHANGELOG.md for details.

v2.3.0

07 Apr 14:20

Choose a tag to compare

What's New

  • Per-file retry tracking and automatic block list: Files exceeding MAX_FILE_PROCESSING_ATTEMPTS (default 3) are automatically blocked. Applies to both blob storage and SharePoint indexers.
  • Content Understanding integration: New ContentUnderstandingClient using Azure AI Foundry prebuilt-layout as default analysis path (~69% cost reduction).
  • Admin dashboard: React frontend at /dashboard with paginated job/file tables, search, filters, and unblock action.
  • Scheduled log cleanup: Automatic old run-summary blob cleanup via APScheduler (CRON_RUN_LOG_CLEANUP, default hourly).

v2.2.5

31 Mar 22:21

Choose a tag to compare

Fixed

  • Ingestion re-indexes every file when permissionFilterOption is enabled: When the Azure AI Search index has permissionFilterOption set to enabled, all search() and get_document() calls returned empty or 404 results because there is no end-user token during service-side ingestion. This caused _load_latest_index_state() to return an empty state map, making the indexer treat every blob as new and triggering a full re-index on every run with significant cost implications. Fixed by adding the x-ms-enable-elevated-read: true header to all index query operations across blob storage indexer, SharePoint indexer, SharePoint purger, NL2SQL purger, and the AI Search client utility. Also pinned api_version to 2025-11-01-preview on all SearchClient instances, which is required for the elevated-read header to be recognized by the service. Requires the Search Index Data Contributor role (which includes the elevatedOperations/read RBAC data action).

v2.2.4

30 Mar 17:57

Choose a tag to compare

What's Changed

Added

  • Vision deployment configuration (VISION_DEPLOYMENT_NAME): Added a new optional App Configuration setting VISION_DEPLOYMENT_NAME that specifies the Azure OpenAI deployment to use for multimodal (image + text) requests such as figure caption generation. When set, get_completion() automatically routes vision requests to this deployment, allowing the use of a vision-capable model (e.g., gpt-4o-mini) separately from the primary chat model. Falls back to CHAT_DEPLOYMENT_NAME if not configured.

Fixed

  • Empty image captions when chat model lacks vision support: When CHAT_DEPLOYMENT_NAME pointed to a model without vision capabilities (e.g., gpt-5-nano), get_completion() returned None silently for multimodal requests, producing empty imageCaptions in the search index. Added a guard in both AzureOpenAIClient.get_completion() (logs a warning with finish_reason and model name) and MultimodalChunker._generate_caption_for_figure() (falls back to "No caption available.") to prevent empty captions from propagating to the index.

v2.2.3

24 Mar 15:10
cc329ea

Choose a tag to compare

What's Changed

Changed

  • Default chunk overlap increased to 200 tokens: Changed the default value of TOKEN_OVERLAP from 100 to 200 across all chunkers (doc_analysis, json, langchain, nl2sql, transcription), improving context continuity between chunks during document ingestion.
  • Cron fallback defaults for blob ingestion jobs: Added cron fallback defaults when CRON_RUN_BLOB_INDEX and CRON_RUN_BLOB_PURGE are not configured.

Fixed

  • Multimodal image captions not generated: Added vision support to get_completion() by accepting an optional image_base64 parameter and constructing multimodal messages when an image is provided.
  • Azure OpenAI API compatibility with newer models: Replaced max_tokens with max_completion_tokens in the chat completions API call, fixing a 400 error with newer models (e.g., GPT-4o).

Repository

  • Added .github/copilot-instructions.md with development and release workflow rules.

v2.2.2

04 Feb 21:45
32fcf4e

Choose a tag to compare

What's Changed

Full Changelog: v2.2.1...v2.2.2

v2.2.1

20 Jan 11:40
739ac1f

Choose a tag to compare

What's Changed

Added

  • Added robust retry logic with exponential backoff for Azure OpenAI calls, handling 429 and Retry-After responses. Retry behavior is now configurable via OPENAI_RETRY_* and OPENAI_SDK_MAX_RETRIES, improving reliability for large spreadsheet ingestion jobs.

Changed

  • Standardized on using a non-privileged port (8080) instead of port 80, following container best practices and improving stability of long-running ingestion workloads.

Full Changelog: v2.2.0...v2.2.1

v2.2.0

16 Jan 01:41
7d2eec0

Choose a tag to compare

What's Changed

Full Changelog: v2.1.0...v2.2.0

v2.1.0

15 Dec 18:24
dac2e4a

Choose a tag to compare

What's Changed

Added

  • Added support for SharePoint Lists, expanding ingestion capabilities beyond document libraries.

Changed

  • Improved robustness of Blob Storage indexing and enhanced data ingestion logging.
  • Refined chunking logic to ensure consistent and reliable chunk ID incrementation.
  • Updated the Azure CLI version in the development container (from 1.2.7 to 1.2.9) for improved tooling support.

Full Changelog: v2.0.6...v2.1.0