Skip to content

Commit b8d58d9

Browse files
committed
Addressing Reviewer Comments
1 parent 3e46e32 commit b8d58d9

12 files changed

+15
-15
lines changed

learn-pr/advocates/guide-ai-operations-center-excellence/1-implement-genaiops-processes.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
11
### YamlMime:ModuleUnit
22
uid: learn.workload-operations-generative-ai-center-excellence.implement-genaiops-processes
3-
title: Implement GenAIOps processes.
3+
title: Implement GenAIOps processes
44
metadata:
5-
title: Implement GenAIOps processes.
5+
title: Implement GenAIOps processes
66
description: Learn about GenAIOps processes and how they can help organizations effectively deploy, manage, and maintain generative AI workloads.
77
ms.date: 06/04/2025
88
author: Orin-Thomas

learn-pr/advocates/guide-ai-operations-center-excellence/2-security-methods-practices.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
11
### YamlMime:ModuleUnit
22
uid: learn.workload-operations-generative-ai-center-excellence.security-methods-practices
3-
title: Security methods practices
3+
title: Security methods and practices
44
metadata:
5-
title: Security methods practices
5+
title: Security methods and practices
66
description: Learn about security methods and practices for generative AI workloads.
77
ms.date: 06/04/2025
88
author: Orin-Thomas

learn-pr/advocates/guide-ai-operations-center-excellence/5-cost-management-processes.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
### YamlMime:ModuleUnit
22
uid: learn.workload-operations-generative-ai-center-excellence.cost-management-processes
3-
title: Cost Management processes
3+
title: Cost management processes
44
metadata:
55
title: Cost Management processes
66
description: Learn how to manage costs for generative AI workloads.

learn-pr/advocates/guide-ai-operations-center-excellence/9-knowledge-check.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
### YamlMime:ModuleUnit
22
uid: learn.workload-operations-generative-ai-center-excellence.knowledge-check
3-
title: Knowledge Check
3+
title: Knowledge check
44
metadata:
55
title: Knowledge Check
66
description: Check your understanding of the module.

learn-pr/advocates/guide-ai-operations-center-excellence/includes/1-implement-genaiops-processes.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@ The CoE should provide GenAIOps guidance on the following elements of workload o
2626
- Cost management processes
2727
- Monitoring and optimization processes
2828

29-
![Diagram showing the relationships between DataOps, MLOps, GenAIOps, deployment and monitoring](../media/genaiops-processes.png)
29+
![Diagram showing the relationships between DataOps, MLOps, GenAIOps, deployment and monitoring.](../media/genaiops-processes.png)
3030

3131
There should be bidirectional communication between the generative AI CoE and workload teams.
3232

learn-pr/advocates/guide-ai-operations-center-excellence/includes/10-summary.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
In this module, you learned about how a generative AI Center of Excellence (CoE) can serve as the central point of advice and procedure for teams who need to manage the deployment and operations of generative AI workloads. The AI CoE helps emphasizes the importance of implementing GenAIOps processes, ensuring that AI workloads are well-architected, and the workloads adhere to compliance and security requirements. The CoE can advise on best practice for the organization's infrastructure management to ensure that generative AI workloads meet scalability and reliability requirements.
22

3-
## Learn More
3+
## Learn more
44

55
- Microsoft's responsible AI practices: <https://aka.ms/RAITransparencyReport2024PDF>
66
- Microsoft's Frontier governance framework: https://go.microsoft.com/fwlink/?linkid=2303737&clcid=0xc09

learn-pr/advocates/guide-ai-operations-center-excellence/includes/3-compliance-governance-practices.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ The CoE should ensure that the organization practices responsible AI governance.
1313

1414
Generative AI workloads must comply with existing cloud-platform regulations, along with emerging AI-specific requirements related to provenance, bias, and safety. A well-defined governance framework ensures alignment across legal, risk, and engineering teams, enabling faster, more responsible releases. The following governance guidelines are recommended to establish robust compliance processes:
1515

16-
- **Regulatory alignment (Compliance mapping):** ensures workloads meet compliance by clearly mapping controls and responsibilities for each model, dataset, and endpoint to established standards (GDPR, HIPAA, PCI-DSS) and emerging AI-specific regulations (EU AI Act, NIST AI Risk Management Framework).
16+
- **Regulatory alignment (Compliance mapping):** ensures workloads meet compliance by clearly mapping controls and responsibilities for each model, dataset, and endpoint to established standards (HIPAA, PCI-DSS) and emerging AI-specific regulations (EU AI Act, NIST AI Risk Management Framework).
1717
- **Audit & lineage (Logging and traceability)**: captures end-to-end logs showing who trained what, on which data, and how outputs are used. Lineage and tamper-proof logs support breach forensics, bias investigations, and external audits.
1818
- **Automated policy enforcement (Pre-deployment checks):** gate deployments with preflight checks (region, data-classification, Responsible AI scorecard). It also enforces tag/label requirements and triggers drift or bias retests on every retrain.
1919
- **AI-specific assurance (Risk monitoring and evidence collection)**: tracks fairness, robustness, and content-safety metrics, surfaces risk dashboards to business owners, and keeps evidence packages ready for auditors or customers.

learn-pr/advocates/guide-ai-operations-center-excellence/includes/7-well-architected-generative-ai-workloads.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -20,11 +20,11 @@ Below are the core mechanisms that govern resource supply, cost, and performance
2020
- **Autoscaling:** expands or shrinks online endpoints and training clusters automatically as load changes, eliminating manual resizing and idle over-provisioning.
2121
- **Usage tracking and monitoring:** real-time tracking of key metrics such as tokens per second, queue depth, and response time. Metrics for comprehensive usage tracking and monitoring, including collecting metrics on resource consumption (CPU, GPU, memory, storage), workload performance (throughput, latency), and system health, to gain insights, identify trends, and detect potential issues.
2222

23-
## Generative AI Application Workload Lifecycle
23+
## Generative AI application workload lifecycle
2424

2525
Generative AI workloads swing from near-idle to peak demand in minutes and must survive zone or region failures without manual intervention. Infrastructure therefore needs to: scale out horizontally on-demand, continue operating through hardware or network failures, and recover automatically after a regional outage. Robust Generative AI infrastructure must combine elasticity with fault-tolerant design for handling data growth, model evolution, and increasing user demand. The elements that follow set out the essential platform capabilities required to achieve that balance.
2626

27-
- **Elastic scaling: adds or removes GPU nodes, inference pods, and storage shards based on queue depth or tokens-per-second. Horizontal elasticity keeps latency low without over-provisioning.**
28-
- **High availability: redundant instances sit in separate Availability Zones or regions and are fronted by global load-balancers that steer traffic to the healthiest endpoint. Health probes and automated fail-over remove single points of failure.**
29-
- **Fault tolerance & auto-healing: containers restart on crash, nodes are cordoned and recycled, and retry logic masks transient errors so users never notice.**
27+
- **Elastic scaling:** adds or removes GPU nodes, inference pods, and storage shards based on queue depth or tokens-per-second. Horizontal elasticity keeps latency low without over-provisioning.
28+
- **High availability:** redundant instances sit in separate Availability Zones or regions and are fronted by global load-balancers that steer traffic to the healthiest endpoint. Health probes and automated fail-over remove single points of failure.
29+
- **Fault tolerance & auto-healing:** containers restart on crash, nodes are cordoned and recycled, and retry logic masks transient errors so users never notice.
3030
- **Disaster recovery & geo-replication:** data (including vector indexes used by RAG pipelines) is synchronously copied to a paired region; run-books or automated workflows restore service within Recovery Point Objective (RPO)/Recovery Time Objective (RTO) targets.

learn-pr/advocates/guide-ai-operations-center-excellence/includes/8-deploy-generative-ai-workloads.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@ A generative AI CoE should ensure that all deployments occur using automated dep
22

33
Generative AI workloads are complex and using automated deployment mechanisms makes the deployments repeatable and easier to troubleshoot if the resulting workload is misconfigured. Including versioning and the ability to rapidly roll back deployments mean that iterative changes can be made to the workload while minimizing the concern that if something goes wrong with a new feature or change, systems are in place to roll back to a known working good configuration.
44

5-
**Time-to-market is reduced because structured workflows eliminate ambiguity, allowing teams to focus on iterative improvements rather than resolving misaligned expectations. Processes like iterative refinement with Generative AI-specific workflows such as prompt optimization, fine-tuning pre-trained models for use case alignment, or automating the evaluation of generated outputs reduce bottlenecks, shorten time-to-market, and keep the system adaptable to evolving business and technological needs.**
5+
Time-to-market is reduced because structured workflows eliminate ambiguity, allowing teams to focus on iterative improvements rather than resolving misaligned expectations. Processes like iterative refinement with Generative AI-specific workflows such as prompt optimization, fine-tuning pre-trained models for use case alignment, or automating the evaluation of generated outputs reduce bottlenecks, shorten time-to-market, and keep the system adaptable to evolving business and technological needs.
66

77
In controlled settings, generative AI base models perform due to reduced complexity. However, real-world applications require comprehensive iterative refinement, ongoing, metric-guided evaluation, and precise alignment of AI outputs with business goals.
88

learn-pr/advocates/guide-ai-operations-center-excellence/index.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ metadata:
99
ms.topic: module
1010
ms.collection: ce-advocates-ai-copilot
1111
ms.service: artificial-intelligence
12-
title: Guide AI workload operations with a Generative AI Center of Excellence
12+
title: Guide AI workload operations with a generative AI Center of Excellence
1313
summary: Learn how comprehensive operations guidance from a Generative AI Center of Excellence can help an organization to effectively deploy, manage, and maintain generative AI workloads.
1414
abstract: |
1515
In this module you learn how comprehensive guidance from a generative AI center of excellence can help an organization to effectively deploy, manage, and maintain generative AI workloads. Learn about how a generative AI CoE can help an organization:

0 commit comments

Comments
 (0)