Skip to content

Commit 7f06fa0

Browse files
author
yelevin
committed
Merge branch 'main' of https://github.com/MicrosoftDocs/azure-docs-pr into yelevin/usx-cxe-open-issues
2 parents cd4b7fe + 2107d3b commit 7f06fa0

File tree

217 files changed

+3940
-2003
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

217 files changed

+3940
-2003
lines changed

.openpublishing.redirection.json

Lines changed: 11 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3849,6 +3849,11 @@
38493849
"redirect_url": "/azure/reliability/reliability-guidance-overview",
38503850
"redirect_document_id": false
38513851
},
3852+
{
3853+
"source_path_from_root": "/articles/aks/cluster-configuration.md",
3854+
"redirect_url": "/azure/aks/concepts-clusters-workloads.md",
3855+
"redirect_document_id": false
3856+
},
38523857
{
38533858
"source_path_from_root": "/articles/orbital/overview-analytics.md",
38543859
"redirect_url": "/azure/orbital/overview",
@@ -3984,6 +3989,11 @@
39843989
"source_path_from_root":"/articles/container-instances/availability-zones.md",
39853990
"redirect_url":"/azure/reliability/reliability-containers",
39863991
"redirect_document_id":false
3987-
}
3992+
},
3993+
{
3994+
"source_path_from_root":"/articles/service-connector/quickstart-cli-aks-connection.md",
3995+
"redirect_url":"/azure/service-connector/quickstart-portal-aks-connection",
3996+
"redirect_document_id":false
3997+
}
39883998
]
39893999
}

.openpublishing.redirection.sentinel.json

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1089,6 +1089,11 @@
10891089
"source_path_from_root": "/articles/sentinel/notebooks-with-synapse-hunt.md",
10901090
"redirect_url": "/azure/sentinel/notebooks-hunt",
10911091
"redirect_document_id": false
1092+
},
1093+
{
1094+
"source_path_from_root": "/articles/sentinel/data-connectors/dns.md",
1095+
"redirect_url": "/azure/sentinel/data-connectors/windows-dns-events-via-ama",
1096+
"redirect_document_id": false
10921097
}
10931098
]
10941099
}

articles/ai-services/openai/how-to/latency.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -59,7 +59,7 @@ Latency varies based on what model you're using. For an identical request, expec
5959
When you send a completion request to the Azure OpenAI endpoint, your input text is converted to tokens that are then sent to your deployed model. The model receives the input tokens and then begins generating a response. It's an iterative sequential process, one token at a time. Another way to think of it is like a for loop with `n tokens = n iterations`. For most models, generating the response is the slowest step in the process.
6060

6161
At the time of the request, the requested generation size (max_tokens parameter) is used as an initial estimate of the generation size. The compute-time for generating the full size is reserved by the model as the request is processed. Once the generation is completed, the remaining quota is released. Ways to reduce the number of tokens:
62-
- Set the `max_token` parameter on each call as small as possible.
62+
- Set the `max_tokens` parameter on each call as small as possible.
6363
- Include stop sequences to prevent generating extra content.
6464
- Generate fewer responses: The best_of & n parameters can greatly increase latency because they generate multiple outputs. For the fastest response, either don't specify these values or set them to 1.
6565

@@ -136,4 +136,4 @@ Time from the first token to the last token, divided by the number of generated
136136

137137
* **Streaming**: Enabling streaming can be useful in managing user expectations in certain situations by allowing the user to see the model response as it is being generated rather than having to wait until the last token is ready.
138138

139-
* **Content Filtering** improves safety, but it also impacts latency. Evaluate if any of your workloads would benefit from [modified content filtering policies](./content-filters.md).
139+
* **Content Filtering** improves safety, but it also impacts latency. Evaluate if any of your workloads would benefit from [modified content filtering policies](./content-filters.md).

articles/ai-services/openai/how-to/monitoring.md

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ ms.author: mbullwin
66
ms.service: azure-ai-openai
77
ms.topic: how-to
88
ms.custom: subject-monitoring
9-
ms.date: 03/29/2024
9+
ms.date: 04/16/2024
1010
---
1111

1212
# Monitoring Azure OpenAI Service
@@ -60,7 +60,9 @@ The following table summarizes the current subset of metrics available in Azure
6060
| `Processed FineTuned Training Hours` | Usage |Sum| Number of training hours processed on an Azure OpenAI fine-tuned model. | `ApiName`, `ModelDeploymentName`,`ModelName`, `Region`|
6161
| `Processed Inference Tokens` | Usage | Sum| Number of inference tokens processed by an Azure OpenAI model. Calculated as prompt tokens (input) + generated tokens. Applies to PayGo, PTU, and PTU-manged SKUs.|`ApiName`, `ModelDeploymentName`,`ModelName`, `Region`|
6262
| `Processed Prompt Tokens` | Usage | Sum | Total number of prompt tokens (input) processed on an Azure OpenAI model. Applies to PayGo, PTU, and PTU-managed SKUs.|`ApiName`, `ModelDeploymentName`,`ModelName`, `Region`|
63-
| `Provision-managed Utilization V2` | Usage | Average | Provision-managed utilization is the utilization percentage for a given provisioned-managed deployment. Calculated as (PTUs consumed/PTUs deployed)*100. When utilization is at or above 100%, calls are throttled and return a 429 error code. | `ModelDeploymentName`,`ModelName`,`ModelVersion`, `Region`, `StreamType`|
63+
| `Provision-managed Utilization V2` | HTTP | Average | Provision-managed utilization is the utilization percentage for a given provisioned-managed deployment. Calculated as (PTUs consumed/PTUs deployed)*100. When utilization is at or above 100%, calls are throttled and return a 429 error code. | `ModelDeploymentName`,`ModelName`,`ModelVersion`, `Region`, `StreamType`|
64+
|`Prompt Token Cache Match Rate` | HTTP | Average | **Provisioned-managed only**. The prompt token cache hit ration expressed as a percentage. | `ModelDeploymentName`, `ModelVersion`, `ModelName`, `Region`|
65+
|`Time to Response` | HTTP | Average | Recommended latency (responsiveness) measure for streaming requests. **Applies to PTU, and PTU-managed deployments**. This metric does not apply to standard pay-go deployments. Calculated as time taken for the first response to appear after a user sends a prompt, as measured by the API gateway. This number increases as the prompt size increases and/or cache hit size reduces. Note: this metric is an approximation as measured latency is heavily dependent on multiple factors, including concurrent calls and overall workload pattern. In addition, it does not account for any client- side latency that may exist between your client and the API endpoint. Please refer to your own logging for optimal latency tracking.| `ModelDepIoymentName`, `ModelName`, and `ModelVersion` |
6466

6567
## Configure diagnostic settings
6668

articles/aks/TOC.yml

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -240,9 +240,9 @@
240240
items:
241241
- name: Deploy a cluster in an Edge Zone
242242
href: edge-zones.md
243-
- name: Cluster configuration options
244-
href: cluster-configuration.md
245-
- name: Manually scale nodes in an AKS cluster
243+
- name: Deploy a cluster with a fully managed resource group
244+
href: node-resource-group-lockdown.md
245+
- name: Scale an AKS cluster
246246
href: scale-cluster.md
247247
- name: Stop and start an AKS cluster
248248
href: start-stop-cluster.md
@@ -749,6 +749,8 @@
749749
href: use-windows-hpc.md
750750
- name: Upgrade from Windows Server 2019 to 2022
751751
href: upgrade-windows-2019-2022.md
752+
- name: Use generation 2 VMs
753+
href: generation-2-vm-windows.md
752754
- name: Create Dockerfiles for Windows Server containers
753755
href: /virtualization/windowscontainers/manage-docker/manage-windows-dockerfile?context=/azure/aks/context/aks-context
754756
- name: Optimize Dockerfiles for Windows Server containers

0 commit comments

Comments
 (0)