Skip to content

Commit 6ccfbeb

Browse files
Merge pull request #271005 from v-thepet/sf-monitor
Azure Monitor horizontals - Service Fabric (replacement PR)
2 parents c535c78 + 698d3f0 commit 6ccfbeb

File tree

3 files changed

+228
-0
lines changed

3 files changed

+228
-0
lines changed
Lines changed: 47 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,47 @@
1+
---
2+
title: Monitoring data reference for Azure Service Fabric
3+
description: This article contains important reference material you need when you monitor Service Fabric.
4+
ms.date: 03/26/2024
5+
ms.custom: horz-monitor
6+
ms.topic: reference
7+
ms.author: tomcassidy
8+
author: tomvcassidy
9+
ms.service: service-fabric
10+
---
11+
12+
# Azure Service Fabric monitoring data reference
13+
14+
[!INCLUDE [horz-monitor-ref-intro](~/reusable-content/ce-skilling/azure/includes/azure-monitor/horizontals/horz-monitor-ref-intro.md)]
15+
16+
See [Monitor Service Fabric](monitor-service-fabric.md) for details on the data you can collect for Azure Service Fabric and how to use it.
17+
18+
Azure Monitor doesn't collect any platform metrics or resource logs for Service Fabric. You can monitor and collect:
19+
20+
- Service Fabric system, node, and application events. For the full event listing, see [List of Service Fabric events](service-fabric-diagnostics-event-generation-operational.md).
21+
- Windows performance counters on nodes and applications. For the list of performance counters, see [Performance metrics](service-fabric-diagnostics-event-generation-perf.md).
22+
- Cluster, node, and system service health data. You can use the [FabricClient.HealthManager property](/dotnet/api/system.fabric.fabricclient.healthmanager) to get the health client to use for health related operations, like report health or get entity health.
23+
- Metrics for the guest operating system (OS) that runs on a cluster node, through one or more agents that run on the guest OS.
24+
25+
Guest OS metrics include performance counters that track guest CPU percentage or memory usage, which are frequently used for autoscaling or alerting. You can use the agent to send guest OS metrics to Azure Monitor Logs, where you can query them by using Log Analytics.
26+
27+
> [!NOTE]
28+
> The Azure Monitor agent replaces the previously-used Azure Diagnostics extension and Log Analytics agent. For more information, see [Overview of Azure Monitor agents](/azure/azure-monitor/agents/agents-overview).
29+
30+
[!INCLUDE [horz-monitor-ref-logs-tables](~/reusable-content/ce-skilling/azure/includes/azure-monitor/horizontals/horz-monitor-ref-logs-tables.md)]
31+
32+
### Service Fabric Clusters
33+
Microsoft.ServiceFabric/clusters
34+
35+
- [AzureActivity](/azure/azure-monitor/reference/tables/AzureActivity#columns)
36+
- [AzureMetrics](/azure/azure-monitor/reference/tables/AzureMetrics#columns)
37+
38+
[!INCLUDE [horz-monitor-ref-activity-log](~/reusable-content/ce-skilling/azure/includes/azure-monitor/horizontals/horz-monitor-ref-activity-log.md)]
39+
40+
- [Microsoft.ServiceFabric resource provider operations](/azure/role-based-access-control/permissions/compute#microsoftservicefabric)
41+
42+
## Related content
43+
44+
- See [Monitor Service Fabric](monitor-service-fabric.md) for a description of monitoring Service Fabric.
45+
- See [Monitor Azure resources with Azure Monitor](/azure/azure-monitor/essentials/monitor-azure-resource) for details on monitoring Azure resources.
46+
- See [List of Service Fabric events](service-fabric-diagnostics-event-generation-operational.md) for the list of Service Fabric system, node, and application events.
47+
- See [Performance metrics](service-fabric-diagnostics-event-generation-perf.md) for the list of Windows performance counters on nodes and applications.
Lines changed: 177 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,177 @@
1+
---
2+
title: Monitor Azure Service Fabric
3+
description: Start here to learn how to monitor Service Fabric.
4+
ms.date: 03/26/2024
5+
ms.custom: horz-monitor
6+
ms.topic: conceptual
7+
ms.author: tomcassidy
8+
author: tomvcassidy
9+
ms.service: service-fabric
10+
---
11+
12+
# Monitor Azure Service Fabric
13+
14+
[!INCLUDE [horz-monitor-intro](~/reusable-content/ce-skilling/azure/includes/azure-monitor/horizontals/horz-monitor-intro.md)]
15+
16+
## Azure Service Fabric monitoring
17+
18+
Azure Service Fabric has the following layers that you can monitor:
19+
20+
- Service health and performance counters for the service *infrastructure*. For more information, see [Performance metrics](service-fabric-diagnostics-event-generation-perf.md).
21+
- Client metrics, logs, and events for the *platform* or *cluster* nodes, including container metrics. The metrics and logs are different for Linux or Windows nodes. For more information, see [Monitor the cluster](service-fabric-diagnostics-event-generation-infra.md).
22+
- The *applications* that run on the nodes. You can monitor applications with Application Insights key or SDK, EventStore, or ASP.NET Core logging. For more information, see [Application logging](service-fabric-diagnostics-event-generation-app.md).
23+
24+
You can monitor how your applications are used, the actions taken by the Service Fabric platform, your resource utilization with performance counters, and the overall health of your cluster. [Azure Monitor logs](service-fabric-diagnostics-event-analysis-oms.md) and [Application Insights](service-fabric-diagnostics-event-analysis-appinsights.md) offer built-in integration with Service Fabric.
25+
26+
- For an overview of monitoring and diagnostics for Service Fabric infrastructure, platform, and applications, see [Monitoring and diagnostics for Azure Service Fabric](service-fabric-diagnostics-overview.md).
27+
- For a tutorial that shows how to view Service Fabric events and health reports, query the EventStore APIs, and monitor performance counters, see [Tutorial: Monitor a Service Fabric cluster in Azure](service-fabric-tutorial-monitor-cluster.md).
28+
29+
### Service Fabric Explorer
30+
31+
[Service Fabric Explorer](service-fabric-visualizing-your-cluster.md), a desktop application for Windows, macOS, and Linux, is an open-source tool for inspecting and managing Azure Service Fabric clusters. To enable automation, every action that can be taken through Service Fabric Explorer can also be done through PowerShell or a REST API.
32+
33+
### EventStore
34+
35+
[EventStore](service-fabric-diagnostics-eventstore.md) is a feature that shows Service Fabric platform events in Service Fabric Explorer and programmatically through the [Service Fabric Client Library](/dotnet/api/overview/azure/service-fabric#client-library) REST API. You can see a snapshot view of what's going on in your cluster for each node, service, and application, and query based on the time of the event.
36+
37+
The EventStore APIs are available only for Windows clusters running on Azure. On Windows machines, these events are fed into the Event Log, so you can see Service Fabric Events in Event Viewer.
38+
39+
### Application Insights
40+
41+
Application Insights integrates with Service Fabric to provide Service Fabric specific metrics and tooling experiences for Visual Studio and Azure portal. Application Insights provides a comprehensive out-of-the-box logging experience. For more information, see [Event analysis and visualization with Application Insights](service-fabric-diagnostics-event-analysis-appinsights.md).
42+
43+
[!INCLUDE [horz-monitor-resource-types](~/reusable-content/ce-skilling/azure/includes/azure-monitor/horizontals/horz-monitor-resource-types.md)]
44+
45+
For more information about the resource types for Azure Service Fabric, see [Service Fabric monitoring data reference](monitor-service-fabric-reference.md).
46+
47+
[!INCLUDE [horz-monitor-data-storage](~/reusable-content/ce-skilling/azure/includes/azure-monitor/horizontals/horz-monitor-data-storage.md)]
48+
49+
[!INCLUDE [horz-monitor-no-platform-metrics](~/reusable-content/ce-skilling/azure/includes/azure-monitor/horizontals/horz-monitor-no-platform-metrics.md)]
50+
51+
[!INCLUDE [horz-monitor-non-monitor-metrics](~/reusable-content/ce-skilling/azure/includes/azure-monitor/horizontals/horz-monitor-non-monitor-metrics.md)]
52+
53+
### Performance counters
54+
55+
Service Fabric system performance is usually measured through performance counters. These performance counters can come from various sources including the operating system, the .NET framework, or the Service Fabric platform itself. For a list of performance counters that should be collected at the infrastructure level, see [Performance metrics](service-fabric-diagnostics-event-generation-perf.md).
56+
57+
Service Fabric also provides a set of performance counters for the Reliable Services and Actors programming models. For more information, see [Monitoring for Reliable Service Remoting](service-fabric-reliable-serviceremoting-diagnostics.md#performance-counters) and [Performance monitoring for Reliable Actors](service-fabric-reliable-actors-diagnostics.md#performance-counters).
58+
59+
Azure Monitor Logs is recommended for monitoring cluster level events. After you configure the [Log Analytics agent](service-fabric-diagnostics-oms-agent.md) with your workspace, you can collect:
60+
61+
- Performance metrics such as CPU Utilization.
62+
- .NET performance counters such as process level CPU utilization.
63+
- Service Fabric performance counters such as number of exceptions from a reliable service.
64+
- Container metrics such as CPU Utilization.
65+
66+
### Guest OS metrics
67+
68+
Metrics for the guest operating system (OS) that runs on Service Fabric cluster nodes must be collected through one or more agents that run on the guest OS. Guest OS metrics include performance counters that track guest CPU percentage or memory usage, both of which are frequently used for autoscaling or alerting.
69+
70+
A best practice is to use and configure the Azure Monitor agent to send guest OS performance metrics through the custom metrics API into the Azure Monitor metrics database. You can send the guest OS metrics to Azure Monitor Logs by using the same agent. Then you can query on those metrics and logs by using Log Analytics.
71+
72+
>[!NOTE]
73+
>The Azure Monitor agent replaces the Azure Diagnostics extension and Log Analytics agent for guest OS routing. For more information, see [Overview of Azure Monitor agents](/azure/azure-monitor/agents/agents-overview).
74+
75+
[!INCLUDE [horz-monitor-no-resource-logs](~/reusable-content/ce-skilling/azure/includes/azure-monitor/horizontals/horz-monitor-no-resource-logs.md)]
76+
77+
## Service Fabric logs and events
78+
79+
Service Fabric can collect the following logs:
80+
81+
- For Windows clusters, you can set up cluster monitoring with [Diagnostics Agent](service-fabric-diagnostics-event-aggregation-wad.md) and [Azure Monitor logs](service-fabric-diagnostics-oms-setup.md).
82+
- For Linux clusters, Azure Monitor Logs is also the recommended tool for Azure platform and infrastructure monitoring. Linux platform diagnostics require different configuration. For more information, see [Service Fabric Linux cluster events in Syslog](service-fabric-diagnostics-oms-syslog.md).
83+
- You can configure the Azure Monitor agent to send guest OS logs to Azure Monitor Logs, where you can query on them by using Log Analytics.
84+
- You can write Service Fabric container logs to *stdout* or *stderr* so they're available in Azure Monitor Logs.
85+
86+
### Service Fabric events
87+
88+
Service Fabric provides a comprehensive set of diagnostics events out of the box, which you can access through the EventStore or the operational event channel the platform exposes. These [Service Fabric events](service-fabric-diagnostics-events.md) illustrate actions done by the platform on different entities such as nodes, applications, services, and partitions. The same events are available on both Windows and Linux clusters.
89+
90+
On Windows, Service Fabric events are available from a single Event Tracing for Windows (ETW) provider with a set of relevant `logLevelKeywordFilters` used to pick between Operational and Data & Messaging channels. On Linux, Service Fabric events come through LTTng and are put into one Azure Storage table, from where they can be filtered as needed. Diagnostics can be enabled at cluster creation time, which creates a Storage table where the events from these channels are sent.
91+
92+
The events are sent through standard channels on both Windows and Linux and can be read by any monitoring tool that supports them, including Azure Monitor Logs. For more information, see [Azure Monitor logs integration](service-fabric-diagnostics-event-analysis-oms.md).
93+
94+
### Health monitoring
95+
96+
The Service Fabric platform includes a health model, which provides extensible health reporting for the status of entities in a cluster. Each node, application, service, partition, replica, or instance has a continuously updatable health status. Each time the health of a particular entity transitions, an event is also emitted. You can set up queries and alerts for health events in your monitoring tool, just like any other event.
97+
98+
## Partner logging solutions
99+
100+
Many events are written out through ETW providers and are extensible with other logging solutions. Examples are [Elastic Stack](https://www.elastic.co/products), especially if you're running a cluster in an offline environment, or [Dynatrace](https://www.dynatrace.com/). For a list of integrated partners, see [Azure Service Fabric Monitoring Partners](service-fabric-diagnostics-partners.md).
101+
102+
[!INCLUDE [horz-monitor-activity-log](~/reusable-content/ce-skilling/azure/includes/azure-monitor/horizontals/horz-monitor-activity-log.md)]
103+
104+
[!INCLUDE [horz-monitor-analyze-data](~/reusable-content/ce-skilling/azure/includes/azure-monitor/horizontals/horz-monitor-analyze-data.md)]
105+
106+
For an overview of common Service Fabric monitoring analytics scenarios, see [Diagnose common scenarios with Service Fabric](service-fabric-diagnostics-common-scenarios.md).
107+
108+
[!INCLUDE [horz-monitor-external-tools](~/reusable-content/ce-skilling/azure/includes/azure-monitor/horizontals/horz-monitor-external-tools.md)]
109+
110+
[!INCLUDE [horz-monitor-kusto-queries](~/reusable-content/ce-skilling/azure/includes/azure-monitor/horizontals/horz-monitor-kusto-queries.md)]
111+
112+
### Sample queries
113+
114+
The following queries return Service Fabric Events, including actions on nodes. For other useful queries, see [Service Fabric Events](service-fabric-tutorial-monitor-cluster.md#view-service-fabric-events-including-actions-on-nodes).
115+
116+
Return operational events recorded in the last hour:
117+
118+
```kusto
119+
ServiceFabricOperationalEvent
120+
| where TimeGenerated > ago(1h)
121+
| join kind=leftouter ServiceFabricEvent on EventId
122+
| project EventId, EventName, TaskName, Computer, ApplicationName, EventMessage, TimeGenerated
123+
| sort by TimeGenerated
124+
```
125+
126+
Return Health Reports with HealthState == 3 (Error), and extract more properties from the `EventMessage` field:
127+
128+
```kusto
129+
ServiceFabricOperationalEvent
130+
| join kind=leftouter ServiceFabricEvent on EventId
131+
| extend HealthStateId = extract(@"HealthState=(\S+) ", 1, EventMessage, typeof(int))
132+
| where TaskName == 'HM' and HealthStateId == 3
133+
| extend SourceId = extract(@"SourceId=(\S+) ", 1, EventMessage, typeof(string)),
134+
Property = extract(@"Property=(\S+) ", 1, EventMessage, typeof(string)),
135+
HealthState = case(HealthStateId == 0, 'Invalid', HealthStateId == 1, 'Ok', HealthStateId == 2, 'Warning', HealthStateId == 3, 'Error', 'Unknown'),
136+
TTL = extract(@"TTL=(\S+) ", 1, EventMessage, typeof(string)),
137+
SequenceNumber = extract(@"SequenceNumber=(\S+) ", 1, EventMessage, typeof(string)),
138+
Description = extract(@"Description='([\S\s, ^']+)' ", 1, EventMessage, typeof(string)),
139+
RemoveWhenExpired = extract(@"RemoveWhenExpired=(\S+) ", 1, EventMessage, typeof(bool)),
140+
SourceUTCTimestamp = extract(@"SourceUTCTimestamp=(\S+)", 1, EventMessage, typeof(datetime)),
141+
ApplicationName = extract(@"ApplicationName=(\S+) ", 1, EventMessage, typeof(string)),
142+
ServiceManifest = extract(@"ServiceManifest=(\S+) ", 1, EventMessage, typeof(string)),
143+
InstanceId = extract(@"InstanceId=(\S+) ", 1, EventMessage, typeof(string)),
144+
ServicePackageActivationId = extract(@"ServicePackageActivationId=(\S+) ", 1, EventMessage, typeof(string)),
145+
NodeName = extract(@"NodeName=(\S+) ", 1, EventMessage, typeof(string)),
146+
Partition = extract(@"Partition=(\S+) ", 1, EventMessage, typeof(string)),
147+
StatelessInstance = extract(@"StatelessInstance=(\S+) ", 1, EventMessage, typeof(string)),
148+
StatefulReplica = extract(@"StatefulReplica=(\S+) ", 1, EventMessage, typeof(string))
149+
```
150+
151+
Get Service Fabric operational events aggregated with the specific service and node:
152+
153+
```kusto
154+
ServiceFabricOperationalEvent
155+
| where ApplicationName != "" and ServiceName != ""
156+
| summarize AggregatedValue = count() by ApplicationName, ServiceName, Computer
157+
```
158+
159+
[!INCLUDE [horz-monitor-alerts](~/reusable-content/ce-skilling/azure/includes/azure-monitor/horizontals/horz-monitor-alerts.md)]
160+
161+
### Service Fabric alert rules
162+
163+
The following table lists some alert rules for Service Fabric. These alerts are just examples. You can set alerts for any metric, log entry, or activity log entry listed in the [Service Fabric monitoring data reference](monitor-service-fabric-reference.md) or the [List of Service Fabric events](service-fabric-diagnostics-event-generation-operational.md#application-events).
164+
165+
| Alert type | Condition | Description |
166+
|:---|:---|:---|
167+
| Node event | Node goes down | ServiceFabricOperationalEvent where EventID >= 25622 and EventID <= 25626. These Event IDs are found in the [Node events reference](service-fabric-diagnostics-event-generation-operational.md#node-events). |
168+
| Application event | Application upgrade rollback | ServiceFabricOperationalEvent where EventID == 29623 or EventID == 29624. These Event IDs are found in the [Application events reference](service-fabric-diagnostics-event-generation-operational.md#application-events). |
169+
| Resource health | Upgrade service unreachable/unavailable | Cluster goes to UpgradeServiceUnreachable state. |
170+
171+
[!INCLUDE [horz-monitor-advisor-recommendations](~/reusable-content/ce-skilling/azure/includes/azure-monitor/horizontals/horz-monitor-advisor-recommendations.md)]
172+
173+
## Related content
174+
175+
- See [Service Fabric monitoring data reference](monitor-service-fabric-reference.md) for a reference of the metrics, logs, and other important values created for Service Fabric.
176+
- See [Monitoring Azure resources with Azure Monitor](/azure/azure-monitor/essentials/monitor-azure-resource) for general details on monitoring Azure resources.
177+
- See the [List of Service Fabric events](service-fabric-diagnostics-event-generation-operational.md).

articles/service-fabric/toc.yml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -383,6 +383,8 @@
383383
href: service-fabric-cluster-resource-manager-inbuild-throttling.md
384384
- name: Monitoring and diagnostics
385385
items:
386+
- name: Monitor
387+
href: monitor-service-fabric.md
386388
- name: Monitoring overview
387389
href: service-fabric-diagnostics-overview.md
388390
- name: Application monitoring
@@ -977,6 +979,8 @@
977979
href: /azure/templates/microsoft.servicefabric/clusters
978980
- name: Service Fabric events
979981
href: service-fabric-diagnostics-event-generation-operational.md
982+
- name: Monitoring data reference
983+
href: monitor-service-fabric-reference.md
980984
- name: Configure cluster settings and fabric upgrade policy
981985
href: service-fabric-cluster-fabric-settings.md
982986
- name: Service model XML schema

0 commit comments

Comments
 (0)