chore: update tenant-info to show live endpoints by mishraomp · Pull Request #120 · bcgov/ai-hub-tracking

mishraomp · 2026-02-28T01:32:55Z

AI Hub Infra Changes

Summary: 0 to add, 4 to change, 0 to destroy (across 1 stack(s))

Show plan details

Terraform will perform the following actions:

  # azurerm_api_management_api_policy.tenant["gcpe-media-monitoring"] will be updated in-place
  ~ resource "azurerm_api_management_api_policy" "tenant" {
        id                  = "/subscriptions/****/resourceGroups/ai-services-hub-test/providers/Microsoft.ApiManagement/service/ai-services-hub-test-apim/apis/gcpe-media-monitoring"
      ~ xml_content         = <<-EOT
          - <policies>
          - 	<inbound>
          - 		<base />
          - 		<!-- Extract tracking dimensions from headers -->
          - 		<include-fragment fragment-id="tracking-dimensions" />
          - 		<!-- Tenant identification -->
          - 		<set-header name="X-Tenant-Id" exists-action="override">
          - 			<value>gcpe-media-monitoring</value>
          - 		</set-header>
          - 		<!-- Per-model token rate limiting for OpenAI requests only -->
          - 		<!-- Each model has its own rate limit matching its Azure OpenAI deployment capacity -->
          - 		<!-- Only applies to /openai/* paths; DocInt/Speech/Search/Storage are not rate-limited by token count -->
          - 		<!-- CRITICAL: estimate-prompt-tokens reads the entire request body for tokenization. -->
          - 		<!-- On large binary payloads (e.g., 500KB base64 DocInt images) this causes APIM to hang. -->
          - 		<!-- Extracts deployment name from URL: /openai/deployments/{deployment-name}/... -->
          - 		<!-- For /v1/ format: extracts from request body "model" field (deployment name lookup key) -->
          - 		<choose>
          - 			<when condition="@(context.Request.Url.Path.ToLower().Contains(&quot;openai&quot;))">
          - 				<set-variable name="deploymentName" value="@{
          + <policies>
          +     <inbound>
          +         <base />
          +         <!-- Extract tracking dimensions from headers -->
          +         <include-fragment fragment-id="tracking-dimensions" />
          +         <!-- Tenant identification -->
          +         <set-header name="X-Tenant-Id" exists-action="override">
          +             <value>gcpe-media-monitoring</value>
          +         </set-header>
          +         <!-- Per-model token rate limiting for OpenAI requests only -->
          +         <!-- Each model has its own rate limit matching its Azure OpenAI deployment capacity -->
          +         <!-- Only applies to /openai/* paths; DocInt/Speech/Search/Storage are not rate-limited by token count -->
          +         <!-- CRITICAL: estimate-prompt-tokens reads the entire request body for tokenization. -->
          +         <!-- On large binary payloads (e.g., 500KB base64 DocInt images) this causes APIM to hang. -->
          +         <!-- Extracts deployment name from URL: /openai/deployments/{deployment-name}/... -->
          +         <!-- For /v1/ format: extracts from request body "model" field (deployment name lookup key) -->
          +         <choose>
          +             <when condition="@(context.Request.Url.Path.ToLower().Contains(&quot;openai&quot;))">
          +                 <set-variable name="deploymentName" value="@{
                                var path = context.Request.Url.Path;
                                var match = System.Text.RegularExpressions.Regex.Match(path, @&quot;/deployments/([^/]+)/&quot;);
                                if (match.Success) { return match.Groups[1].Value; }
                                // For /v1/ format: model field is the deployment name lookup key on Azure OpenAI
                                // Client sends e.g. "gpt-4.1-mini"; tenant-prefix to match deployment name
                                if (path.ToLower().Contains(&quot;/v1/&quot;)) {
                                    try {
                                        var body = context.Request.Body.As&lt;JObject&gt;(preserveContent: true);
                                        var model = body?[&quot;model&quot;]?.ToString();
                                        if (!string.IsNullOrEmpty(model)) { return &quot;gcpe-media-monitoring-&quot; + model; }
                                    } catch { }
                                }
                                return &quot;unknown&quot;;
          -                 }" />
          - 				<choose>
          - 					<when condition="@(context.Variables.GetValueOrDefault&lt;string&gt;(&quot;deploymentName&quot;, &quot;&quot;) == &quot;gpt-4.1&quot;)">
          - 						<!-- Rate limit for gpt-4.1: 300k TPM (deployment capacity is in thousands of TPM) -->
          - 						<llm-token-limit counter-key="@(context.Subscription.Id + &quot;-gpt-4.1&quot;)" tokens-per-minute="300000" estimate-prompt-tokens="true" remaining-tokens-variable-name="remainingTokens" remaining-tokens-header-name="x-ratelimit-remaining-tokens" tokens-consumed-variable-name="tokensConsumed" />
          - 					</when>
          - 					<when condition="@(context.Variables.GetValueOrDefault&lt;string&gt;(&quot;deploymentName&quot;, &quot;&quot;) == &quot;gpt-4.1-mini&quot;)">
          - 						<!-- Rate limit for gpt-4.1-mini: 1500k TPM (deployment capacity is in thousands of TPM) -->
          - 						<llm-token-limit counter-key="@(context.Subscription.Id + &quot;-gpt-4.1-mini&quot;)" tokens-per-minute="1500000" estimate-prompt-tokens="true" remaining-tokens-variable-name="remainingTokens" remaining-tokens-header-name="x-ratelimit-remaining-tokens" tokens-consumed-variable-name="tokensConsumed" />
          - 					</when>
          - 					<when condition="@(context.Variables.GetValueOrDefault&lt;string&gt;(&quot;deploymentName&quot;, &quot;&quot;) == &quot;gpt-4.1-nano&quot;)">
          - 						<!-- Rate limit for gpt-4.1-nano: 1500k TPM (deployment capacity is in thousands of TPM) -->
          - 						<llm-token-limit counter-key="@(context.Subscription.Id + &quot;-gpt-4.1-nano&quot;)" tokens-per-minute="1500000" estimate-prompt-tokens="true" remaining-tokens-variable-name="remainingTokens" remaining-tokens-header-name="x-ratelimit-remaining-tokens" tokens-consumed-variable-name="tokensConsumed" />
          - 					</when>
          - 					<when condition="@(context.Variables.GetValueOrDefault&lt;string&gt;(&quot;deploymentName&quot;, &quot;&quot;) == &quot;gpt-4o&quot;)">
          - 						<!-- Rate limit for gpt-4o: 300k TPM (deployment capacity is in thousands of TPM) -->
          - 						<llm-token-limit counter-key="@(context.Subscription.Id + &quot;-gpt-4o&quot;)" tokens-per-minute="300000" estimate-prompt-tokens="true" remaining-tokens-variable-name="remainingTokens" remaining-tokens-header-name="x-ratelimit-remaining-tokens" tokens-consumed-variable-name="tokensConsumed" />
          - 					</when>
          - 					<when condition="@(context.Variables.GetValueOrDefault&lt;string&gt;(&quot;deploymentName&quot;, &quot;&quot;) == &quot;gpt-4o-mini&quot;)">
          - 						<!-- Rate limit for gpt-4o-mini: 1500k TPM (deployment capacity is in thousands of TPM) -->
          - 						<llm-token-limit counter-key="@(context.Subscription.Id + &quot;-gpt-4o-mini&quot;)" tokens-per-minute="1500000" estimate-prompt-tokens="true" remaining-tokens-variable-name="remainingTokens" remaining-tokens-header-name="x-ratelimit-remaining-tokens" tokens-consumed-variable-name="tokensConsumed" />
          - 					</when>
          - 					<when condition="@(context.Variables.GetValueOrDefault&lt;string&gt;(&quot;deploymentName&quot;, &quot;&quot;) == &quot;gpt-5-mini&quot;)">
          - 						<!-- Rate limit for gpt-5-mini: 100k TPM (deployment capacity is in thousands of TPM) -->
          - 						<llm-token-limit counter-key="@(context.Subscription.Id + &quot;-gpt-5-mini&quot;)" tokens-per-minute="100000" estimate-prompt-tokens="true" remaining-tokens-variable-name="remainingTokens" remaining-tokens-header-name="x-ratelimit-remaining-tokens" tokens-consumed-variable-name="tokensConsumed" />
          - 					</when>
          - 					<when condition="@(context.Variables.GetValueOrDefault&lt;string&gt;(&quot;deploymentName&quot;, &quot;&quot;) == &quot;gpt-5-nano&quot;)">
          - 						<!-- Rate limit for gpt-5-nano: 1500k TPM (deployment capacity is in thousands of TPM) -->
          - 						<llm-token-limit counter-key="@(context.Subscription.Id + &quot;-gpt-5-nano&quot;)" tokens-per-minute="1500000" estimate-prompt-tokens="true" remaining-tokens-variable-name="remainingTokens" remaining-tokens-header-name="x-ratelimit-remaining-tokens" tokens-consumed-variable-name="tokensConsumed" />
          - 					</when>
          - 					<when condition="@(context.Variables.GetValueOrDefault&lt;string&gt;(&quot;deploymentName&quot;, &quot;&quot;) == &quot;gpt-5.1-chat&quot;)">
          - 						<!-- Rate limit for gpt-5.1-chat: 50k TPM (deployment capacity is in thousands of TPM) -->
          - 						<llm-token-limit counter-key="@(context.Subscription.Id + &quot;-gpt-5.1-chat&quot;)" tokens-per-minute="50000" estimate-prompt-tokens="true" remaining-tokens-variable-name="remainingTokens" remaining-tokens-header-name="x-ratelimit-remaining-tokens" tokens-consumed-variable-name="tokensConsumed" />
          - 					</when>
          - 					<when condition="@(context.Variables.GetValueOrDefault&lt;string&gt;(&quot;deploymentName&quot;, &quot;&quot;) == &quot;gpt-5.1-codex-mini&quot;)">
          - 						<!-- Rate limit for gpt-5.1-codex-mini: 100k TPM (deployment capacity is in thousands of TPM) -->
          - 						<llm-token-limit counter-key="@(context.Subscription.Id + &quot;-gpt-5.1-codex-mini&quot;)" tokens-per-minute="100000" estimate-prompt-tokens="true" remaining-tokens-variable-name="remainingTokens" remaining-tokens-header-name="x-ratelimit-remaining-tokens" tokens-consumed-variable-name="tokensConsumed" />
          - 					</when>
          - 					<when condition="@(context.Variables.GetValueOrDefault&lt;string&gt;(&quot;deploymentName&quot;, &quot;&quot;) == &quot;o1&quot;)">
          - 						<!-- Rate limit for o1: 50k TPM (deployment capacity is in thousands of TPM) -->
          - 						<llm-token-limit counter-key="@(context.Subscription.Id + &quot;-o1&quot;)" tokens-per-minute="50000" estimate-prompt-tokens="true" remaining-tokens-variable-name="remainingTokens" remaining-tokens-header-name="x-ratelimit-remaining-tokens" tokens-consumed-variable-name="tokensConsumed" />
          - 					</when>
          - 					<when condition="@(context.Variables.GetValueOrDefault&lt;string&gt;(&quot;deploymentName&quot;, &quot;&quot;) == &quot;o3-mini&quot;)">
          - 						<!-- Rate limit for o3-mini: 50k TPM (deployment capacity is in thousands of TPM) -->
          - 						<llm-token-limit counter-key="@(context.Subscription.Id + &quot;-o3-mini&quot;)" tokens-per-minute="50000" estimate-prompt-tokens="true" remaining-tokens-variable-name="remainingTokens" remaining-tokens-header-name="x-ratelimit-remaining-tokens" tokens-consumed-variable-name="tokensConsumed" />
          - 					</when>
          - 					<when condition="@(context.Variables.GetValueOrDefault&lt;string&gt;(&quot;deploymentName&quot;, &quot;&quot;) == &quot;o4-mini&quot;)">
          - 						<!-- Rate limit for o4-mini: 100k TPM (deployment capacity is in thousands of TPM) -->
          - 						<llm-token-limit counter-key="@(context.Subscription.Id + &quot;-o4-mini&quot;)" tokens-per-minute="100000" estimate-prompt-tokens="true" remaining-tokens-variable-name="remainingTokens" remaining-tokens-header-name="x-ratelimit-remaining-tokens" tokens-consumed-variable-name="tokensConsumed" />
          - 					</when>
          - 					<when condition="@(context.Variables.GetValueOrDefault&lt;string&gt;(&quot;deploymentName&quot;, &quot;&quot;) == &quot;text-embedding-ada-002&quot;)">
          - 						<!-- Rate limit for text-embedding-ada-002: 100k TPM (deployment capacity is in thousands of TPM) -->
          - 						<llm-token-limit counte
(truncated, see workflow logs for complete plan)

Updated by CI — plan against test environment (run #216).

chore: update tenant-info to show live endpoints

8a51565

mishraomp self-assigned this Feb 28, 2026

mishraomp temporarily deployed to tools February 28, 2026 01:33 — with GitHub Actions Inactive

mishraomp added documentation Improvements or additions to documentation enhancement New feature or request Task Terraform devops labels Feb 28, 2026

mishraomp temporarily deployed to test February 28, 2026 01:34 — with GitHub Actions Inactive

mishraomp merged commit ac5bbec into main Feb 28, 2026
11 checks passed

mishraomp deleted the feat/tenant-info-endpoints branch February 28, 2026 02:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: update tenant-info to show live endpoints#120

chore: update tenant-info to show live endpoints#120
mishraomp merged 1 commit intomainfrom
feat/tenant-info-endpoints

mishraomp commented Feb 28, 2026 •

edited by github-actions bot

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mishraomp commented Feb 28, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

AI Hub Infra Changes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

mishraomp commented Feb 28, 2026 •

edited by github-actions bot

Loading