refactor: replace references to preview API version 2025-11-15-preview with stable version by kmandryk · Pull Request #111 · bcgov/ai-hub-tracking

kmandryk · 2026-02-25T00:32:50Z

… the GA 2024-11-01 version. Updated docs (_pages and public page), APIM README, and the APIM fragment (pii-anonymization.xml) to call /language/:analyze-text?api-version=2024-11-01. No behavioral changes to PII recognition or redaction logic; this is an API version bump and documentation update.

AI Hub Infra Changes

Summary: 1 to add, 2 to change, 0 to destroy (across 2 stack(s))

Show plan details

Terraform will perform the following actions:

  # azurerm_api_management_policy_fragment.pii_anonymization[0] will be updated in-place
  ~ resource "azurerm_api_management_policy_fragment" "pii_anonymization" {
        id                = "/subscriptions/****/resourceGroups/ai-services-hub-test/providers/Microsoft.ApiManagement/service/ai-services-hub-test-apim/policyFragments/pii-anonymization"
        name              = "pii-anonymization"
      ~ value             = <<-EOT
          - <fragment>
          - 	<!-- ================================================================== -->
          - 	<!-- PII Anonymization via Azure Language Service                       -->
          - 	<!-- Per-message multi-document scanning for robust large-payload support-->
          - 	<!-- Enterprise-grade PII detection using Azure AI Language API         -->
          - 	<!-- Supports: Names, addresses, SSN, medical terms, financial data     -->
          - 	<!-- ================================================================== -->
          - 	<!--
          + <fragment>
          +     <!-- ================================================================== -->
          +     <!-- PII Anonymization via Azure Language Service                       -->
          +     <!-- Per-message multi-document scanning for robust large-payload support-->
          +     <!-- Enterprise-grade PII detection using Azure AI Language API         -->
          +     <!-- Supports: Names, addresses, SSN, medical terms, financial data     -->
          +     <!-- ================================================================== -->
          +     <!--
                    Architecture:
                    - Parses the JSON request body and extracts each message's content
                    - Sends message contents as separate documents in one PII API call
                      (up to 5 documents per synchronous request per Azure Language Service limits)
                    - Large messages are automatically chunked at word boundaries to stay
                      within the per-document character limit (5000 chars, safe for all tiers)
                    - Chunked documents use compound IDs (e.g., "1_0", "1_1") and are
                      reassembled in order after redaction
                    - Replaces each message content with its redacted version
                    - JSON envelope (roles, parameters, model) is never scanned
                    - Falls back to raw-body single-document mode for non-JSON payloads
            
                    Document Chunking & Reassembly:
                    The Azure Language Service enforces a per-document character limit
                    (5,120 on F0, 125K on S tier). To safely handle payloads of any size
                    we use a conservative 5,000-char threshold.
            
                    1. Splitting — For each chat message whose content exceeds the limit,
                       the text is split into consecutive chunks at the nearest word
                       boundary (space character) before the limit. Each chunk becomes a
                       separate document in the PII API request with a compound ID:
                         message index 1, chunk 0 → id "1_0"
                         message index 1, chunk 1 → id "1_1"
                         message index 1, chunk 2 → id "1_2"  …and so on
                       Short messages that fit in a single document keep their simple ID
                       (e.g., "0", "1"), so existing behaviour is fully preserved.
            
                    2. Redaction — The Language Service returns a `redactedText` field for
                       every document it successfully processes. Detected PII entities are
                       replaced in-place with a mask character ('#' by default) using the
                       CharacterMask redaction policy.  For example:
                         Input:  "Contact Erin Sanchez at 604-555-7890"
                         Output: "Contact #### ####### at ############"
                       The mask length always matches the original text length, so
                       character offsets and chunk boundaries remain stable.
            
                    3. Reassembly — After the PII API responds, a lookup map is built
                       from document id → redactedText. Reconstruction works per-message:
                       a) If the map contains the simple key (e.g., "1"), the message was
                          not chunked — apply the redacted text directly.
                       b) Otherwise, iterate compound keys "1_0", "1_1", "1_2", … in
                          order and concatenate the redacted fragments. Because chunks
                          were split at the same boundaries, concatenation reproduces
                          the full redacted content with masks in the correct positions.
                       The redacted content replaces the original message content in the
                       JSON body, and the rest of the envelope (role, parameters, model
                       settings) is left untouched.
            
                    4. Error handling — If the Language Service rejects individual
                       documents (e.g., empty text), those are reported in
                       piiDiagnostics.docErrors but do not block successfully-redacted
                       documents. The fail-closed / fail-open mode controls whether an
                       overall failure blocks the request (503) or passes through the
                       original content.
            
                    5. Redaction Coverage Verification (P1 Safety) — After the PII API
                       responds, the fragment verifies that EVERY message with content
                       received complete redaction. This catches two real loopholes:
            
                       a) Document-limit protection: The Language Service accepts at most
                          5 documents per synchronous request. When the payload requires
                          more (many messages, or large messages that chunk into many
                          documents), excess content is silently dropped — the API
                          never sees it. The coverage check detects this by comparing
                          the set of messages that have redacted output against those
                          that don't.
                          Example: 6 short messages = 6 docs needed, but only 5 get
                          sent. The remaining message passes through unscanned.
            
                       b) Partial-chunking detection: When a large message is split into
                          chunks and only some chunks fit within the 5-document limit,
                          the reassembled text would be shorter than the original. The
                          fragment detects this by comparing total redacted chunk length
                          against original message length. CharacterMask preserves text
                          length, so any discrepancy means missing chunks.
                          Example: A 30K-char message needs 6 chunks. If the 5-doc
                          limit only allows 3 chunks, the reassembled text is ~15K chars
                          — the trailing 15K chars (potentially containing PII) would
                          be silently lost.
            
                       c) Document-error tolerance: If the Language Service returns
                          errors for specific documents (e.g., unsupported language),
                          those documents have no redactedText in the response. The
                          coverage check detects these as unscanned messages.
            
                       Coverage result: piiRedactionCoverage JSON with:
                         - msgsWithContent: total messages that had non-empty content
                         - msgsRedacted: messages with complete redacted output
                         - msgsPartial: messages partially chunked (some chunks missing)
                         - msgsUnscanned: messages completely skipped (no output at all)
                         - fullCoverage: true only if msgsRedacted == msgsWithContent
            
                       Fail-closed mode blocks when fullCoverage == false, returning 503
                       with failure_reason "partial-redaction-N-msgs-unscanned" or
                       "partial-redaction-N-msgs-truncated" for diagnostics.
                       Fail-open mode passes through original content for unscanned/
                       partial messages (no silent truncation) and logs coverage metrics
                       to App Insights for monitoring.
            
                    Performance optimizations:
                    - Single Body.As<string>() read (cached in piiResponseBodyStr)
                    - Single JObject.Parse() pass for diagnostics (cached in piiDiagnostics)
                    - Timing via piiStartTimeTicks / piiDurationMs for latency monitoring
            
                    Prerequisites - Set these variables before including this fragment:
                    - piiInputContent: The request body as a string (JSON with messages array)
                    - piiAnonymizationEnabled: "true" or "false" based on tenant config
            
                    Optional configuration variables:
                    - piiExcludedCategories: JSON array of PII categories to exclude
                    - piiDetectionLanguage: Language code for detection (default: "en")
                    - piiFailClosed: "true" to block requests when redaction fails (default: "false" = fail-open)
            
                    Failure behavior:
                    - If piiFailClosed="true" and redaction fails, the request is blocked with HTTP 503
                    - If piiFailClosed="false" (default), the original unredacted content is forwarded
                    - Detailed failure diagnostics (failure_reason, MSI status) are always logged regardless of mode
            
                    Required Named Value:
                    - piiServiceUrl: The Language Service endpoint URL (set in APIM Named Values)
            
                    Output:
                    - piiAnonymizedContent: The reconstructed JSON body with redacted message content
                    - piiRedactionSucceeded: "true" or "false" (at least one doc redacted)
                    - piiRedactionCoverage: JSON with fullCoverage, msgsWithContent, msgsRedacted,
                      msgsPartial, msgsUnscanned — comprehensive per-message coverage report
                    - piiDetectionStatusCode: HTTP status code from PII API
                    - piiDurationMs: PII API call latency in milliseconds
                    - piiDiagnostics: JSON object with docCount, entityCount, entityTypes, docErrors,
                      plus coverage metrics (msgsWithContent, msgsRedacted, msgsPartial, msgsUnscanned)
                    - piiContentChanged: "true" if content was modified by PII redaction
          -     -->
          - 	<choose>
          - 		<when condition="@(context.Variables.GetValueOrDefault&lt;string&gt;(&quot;piiAnonymizationEnabled&quot;, &quot;false&quot;) == &quot;true&quot;)">
          - 			<!-- Get MSI token for Cognitive Services -->
          - 			<!-- ignore-error=true: if MSI fails, send-request will fail too, handled by fail-closed/fail-open -->
          - 			<authentication-managed-identity resource="https://cognitiveservices.azure.com" output-token-variable-name="pii-msi-access-token" ignore-error="true" />
          - 			<!-- Capture MSI token status for failure diagnostics -->
          - 			<set-variable name="piiMsiTokenStatus" value="@{
          +     -->
          +     <choose>
          +         <when condition="@(context.Variables.GetValueOrDef
(truncated, see workflow logs for complete plan)

Updated by CI — plan against test environment (run #234) at 2026-03-02 19:48:58 UTC.

… the GA 2024-11-01 version. Updated docs (_pages and public page), APIM README, and the APIM fragment (pii-anonymization.xml) to call /language/:analyze-text?api-version=2024-11-01. No behavioral changes to PII recognition or redaction logic; this is an API version bump and documentation update.

kmandryk changed the title ~~Replace references to the preview API version 2025-11-15-preview with…~~ refactor: replace references to preview API version 2025-11-15-preview with stable version Feb 25, 2026

kmandryk temporarily deployed to tools February 25, 2026 00:36 — with GitHub Actions Inactive

kmandryk temporarily deployed to tools February 25, 2026 00:37 — with GitHub Actions Inactive

kmandryk temporarily deployed to test February 25, 2026 00:37 — with GitHub Actions Inactive

kmandryk had a problem deploying to dev February 25, 2026 00:39 — with GitHub Actions Failure

kmandryk temporarily deployed to tools February 27, 2026 00:10 — with GitHub Actions Inactive

kmandryk temporarily deployed to dev February 27, 2026 00:11 — with GitHub Actions Inactive

Merge branch 'main' into refactor/set-PII-version

c70f2d7

kmandryk temporarily deployed to tools February 27, 2026 21:46 — with GitHub Actions Inactive

kmandryk temporarily deployed to test February 27, 2026 21:47 — with GitHub Actions Inactive

Merge branch 'main' into refactor/set-PII-version

dcbca34

kmandryk temporarily deployed to tools March 2, 2026 19:44 — with GitHub Actions Inactive

kmandryk temporarily deployed to test March 2, 2026 19:46 — with GitHub Actions Inactive

kmandryk temporarily deployed to tools March 3, 2026 00:21 — with GitHub Actions Inactive

kmandryk had a problem deploying to dev March 3, 2026 00:22 — with GitHub Actions Failure

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: replace references to preview API version 2025-11-15-preview with stable version#111

refactor: replace references to preview API version 2025-11-15-preview with stable version#111
kmandryk wants to merge 3 commits intomainfrom
refactor/set-PII-version

kmandryk commented Feb 25, 2026 •

edited by github-actions bot

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

kmandryk commented Feb 25, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

AI Hub Infra Changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

kmandryk commented Feb 25, 2026 •

edited by github-actions bot

Loading