Skip to content

Commit 4a83e90

Browse files
authored
Merge pull request #269168 from rcdun/rda/aoi-data-factory-how-to
Add How-To guide for connecting Data Factory to an AOI Data Product for ingestion
2 parents ee82e5d + de1c4da commit 4a83e90

File tree

7 files changed

+226
-68
lines changed

7 files changed

+226
-68
lines changed

articles/operator-insights/TOC.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -42,6 +42,8 @@
4242
items:
4343
- name: Set up an ingestion agent for uploading data
4444
href: set-up-ingestion-agent.md
45+
- name: Use Azure Data Factory for ingestion
46+
href: ingestion-with-data-factory.md
4547
- name: Use Azure Operator Insights Data Product dashboards
4648
href: dashboards-use.md
4749
- name: Query data in the Azure Operator Insights Data Product

articles/operator-insights/concept-mcc-data-product.md

Lines changed: 81 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -62,28 +62,39 @@ The following data types are provided for all Quality of Experience - Affirmed M
6262

6363
To use the Quality of Experience - Affirmed MCC Data Product:
6464

65-
1. Deploy the Data Product by following [Create an Azure Operator Insights Data Product](data-product-create.md).
66-
1. Configure your network to provide data by setting up an Azure Operator Insights ingestion agent on a virtual machine (VM).
65+
- Deploy the Data Product by following [Create an Azure Operator Insights Data Product](data-product-create.md).
66+
- Configure your network to provide data either using your own ingestion method, or by setting up the [Azure Operator Insights ingestion agent](ingestion-agent-overview.md).
67+
- Use the information in [Required ingestion configuration](#required-ingestion-configuration) when you're setting up ingestion.
68+
- We recommend the Azure Operator Insights ingestion agent for the `edr` data type. To ingest the `device` and `edr-validation` data types, you can use a separate instance of the ingestion agent, or set up your own ingestion method.
69+
- If you're using the Azure Operator Insights ingestion agent, also meet the requirements in [Requirements for the Azure Operator Insights ingestion agent](#requirements-for-the-azure-operator-insights-ingestion-agent).
70+
- Configure your Affirmed MCCs to send EDRs to the ingestion agent. See [Configuration for Affirmed MCCs](#configuration-for-affirmed-mccs).
71+
- If you're using the `edr-validation` data type, configure your Affirmed EMS to export performance management stats to a remote server. See [Configuration for Affirmed EMS](#configuration-for-affirmed-ems).
6772

68-
1. Read [Requirements for the Azure Operator Insights ingestion agent](#requirements-for-the-azure-operator-insights-ingestion-agent).
69-
1. [Install the Azure Operator Insights ingestion agent and configure it to upload data](set-up-ingestion-agent.md).
73+
### Required ingestion configuration
7074

71-
Alternatively, you can provide your own ingestion agent.
72-
1. Configure your Affirmed MCCs to send EDRs to the ingestion agent. See [Configuration for Affirmed MCCs](#configuration-for-affirmed-mccs).
75+
Use the information in this section to configure your ingestion method. Refer to the documentation for your chosen method to determine how to supply these values.
7376

74-
## Requirements for the Azure Operator Insights ingestion agent
77+
| Data type | Required container name | Requirements for data |
78+
|---------|---------|---------|
79+
| `edr` | `edr` | MCC EDR data. |
80+
| `device` | `device` | Device reference data. |
81+
| `edr-validation` | `edr-validation` | PM Stat data for `EDR_HTTP_STATS`, `EDR_FLOW_STATS`, and `EDR_SESSION_STATS` datasets. File name prefixes must match the name of the dataset. |
82+
83+
### Requirements for the Azure Operator Insights ingestion agent
84+
85+
Use the VM requirements to set up one or more VMs for the ingestion agent. Use the example configuration to configure the ingestion agent to upload data to the Data Product, as part of following [Install the Azure Operator Insights ingestion agent and configure it to upload data](set-up-ingestion-agent.md).
7586

76-
Use the VM requirements to set up a suitable VM for the ingestion agent. Use the example configuration to configure the ingestion agent to upload data to the Data Product, as part of following [Install the Azure Operator Insights ingestion agent and configure it to upload data](set-up-ingestion-agent.md).
87+
# [EDR ingestion](#tab/edr-ingestion)
7788

78-
### VM requirements
89+
#### VM requirements
7990

8091
Each agent instance must run on its own Linux VM. The number of VMs needed depends on the scale and redundancy characteristics of your deployment. This recommended specification can achieve 1.5-Gbps throughput on a standard D4s_v3 Azure VM. For any other VM spec, we recommend that you measure throughput at the network design stage.
8192

8293
Latency on the MCC to agent connection can negatively affect throughput. Latency should usually be low if the MCC and agent are colocated or the agent runs in an Azure region close to the MCC.
8394

8495
Talk to the Affirmed Support Team to determine your requirements.
8596

86-
Each VM running the agent must meet the following minimum specifications.
97+
Each VM running the agent must meet the following minimum specifications for EDR ingestion.
8798

8899
| Resource | Requirements |
89100
|----------|---------------------------------------------------------------------|
@@ -104,24 +115,58 @@ The agent doesn't buffer data, so if a persistent error or extended connectivity
104115

105116
For extra fault tolerance, you can deploy multiple instances of the ingestion agent and configure the MCC to switch to a different instance if the original instance becomes unresponsive, or to share EDR traffic across a pool of agents. For more information, see the [Affirmed Networks Active Intelligent vProbe System Administration Guide](https://manuals.metaswitch.com/vProbe/latest/vProbe_System_Admin/Content/02%20AI-vProbe%20Configuration/Generating_SESSION__BEARER__FLOW__and_HTTP_Transac.htm) (only available to customers with Affirmed support) or speak to the Affirmed Networks Support Team.
106117

107-
### Required agent configuration
118+
# [Performance management and device data ingestion](#tab/pm-stat-or-device-data-ingestion)
108119

109-
Use the information in this section when [setting up the agent and configuring the agent software](set-up-ingestion-agent.md#configure-the-agent-software).
120+
#### Performance management ingestion via an SFTP server
110121

111-
The ingestion agent must use MCC EDRs as a data source.
122+
If you're using the Azure Operator Insights ingestion agent to ingest performance management stats files for the `edr-validation` data type:
123+
- Configure the EMS to export performance management stats to an SFTP server.
124+
- Configure the ingestion agent to use SFTP pull from the SFTP server.
125+
- We recommend the following configuration settings in addition to the (required) settings in the previous table.
112126

113-
|Information | Configuration setting for Azure Operator Ingestion agent | Value |
114-
|---------|---------|---------|
115-
|Container in the Data Product input storage account |`sink.container_name` | `edr` |
127+
|Information | Configuration setting for Azure Operator Ingestion agent | Recommended value |
128+
| --------- | --------- | --------- |
129+
| [Settling time](ingestion-agent-overview.md#processing-files) | `source.sftp_pull.filtering.settling_time` | `60s` (upload files that haven't been modified in the last 60 seconds) |
130+
| Schedule for checking for new files | `source.sftp_pull.scheduling.cron` | `0 */5 * * * * *` (every 5 minutes) |
116131

117-
> [!IMPORTANT]
118-
> `sink.container_name` must be set exactly as specified here. You can change other configuration to meet your requirements.
132+
#### Device data ingestion via an SFTP server
133+
134+
If the device data is stored on an SFTP server, you can ingest device data by configuring an extra `sftp_pull` ingestion pipeline on the same ingestion agent instance that you're using for PM stat ingestion. You can choose your own value for `source.sftp_pull.scheduling.cron` for the device data pipeline, depending on how frequently you want the ingestion pipeline to check for new device data files.
135+
136+
> [!TIP]
137+
> For more information about all the configuration options for the ingestion agent, see [Configuration reference for Azure Operator Insights ingestion agent](ingestion-agent-configuration-reference.md).
138+
139+
#### VM requirements
140+
141+
Each agent instance running SFTP pull pipelines must run on a separate Linux VM to any agent instance used for EDR ingestion. The number of VMs needed depends on the scale and redundancy characteristics of your deployment.
142+
143+
As a guide, this table documents the throughput that the recommended specification on a standard D4s_v3 Azure VM can achieve.
144+
145+
| File count | File size (KiB) | Time (seconds) | Throughput (Mbps) |
146+
|------------|-----------------|----------------|-------------------|
147+
| 64 | 16,384 | 6 | 1,350 |
148+
| 1,024 | 1,024 | 10 | 910 |
149+
| 16,384 | 64 | 80 | 100 |
150+
| 65,536 | 16 | 300 | 25 |
151+
152+
Each Linux VM running the agent must meet the following minimum specifications for SFTP pull ingestion.
119153

120-
For more information about all the configuration options, see [Configuration reference for Azure Operator Insights ingestion agent](ingestion-agent-configuration-reference.md).
154+
| Resource | Requirements |
155+
|----------|---------------------------------------------------------------------|
156+
| OS | Red Hat Enterprise Linux 8.6 or later, or Oracle Linux 8.8 or later |
157+
| vCPUs | Minimum 4, recommended 8 |
158+
| Memory | Minimum 32 GB |
159+
| Disk | 30 GB |
160+
| Network | Connectivity to the SFTP server and to Azure |
161+
| Software | systemd, logrotate, and zip installed |
162+
| Other | SSH or alternative access to run shell commands |
163+
| DNS | (Preferable) Ability to resolve Microsoft hostnames. If not, you need to perform extra configuration when you set up the agent (described in [Map Microsoft hostnames to IP addresses for ingestion agents that can't resolve public hostnames](map-hostnames-ip-addresses.md).) |
164+
165+
---
121166

122-
## Configuration for Affirmed MCCs
167+
### Configuration for Affirmed MCCs
123168

124-
When you have installed and configured your ingestion agents, configure the MCCs to send EDRs to them.
169+
After installing and configuring your ingestion agents, configure the MCCs to send EDRs to them.
125170

126171
Follow the steps in "Generating SESSION, BEARER, FLOW, and HTTP Transaction EDRs" in the [Affirmed Networks Active Intelligent vProbe System Administration Guide](https://manuals.metaswitch.com/vProbe/latest) (only available to customers with Affirmed support), making the following changes:
127172

@@ -132,6 +177,21 @@ Follow the steps in "Generating SESSION, BEARER, FLOW, and HTTP Transaction EDRs
132177
- `encoding`: protobuf
133178
- `keep-alive`: 2 seconds
134179

180+
### Configuration for Affirmed EMS
181+
182+
If you're using the `edr-validation` data type, configure the EMS to export the relevant performance management statistics to a remote server. If you're using the Azure Operator Insights ingestion agent to ingest performance management statistics, the remote server must be an [SFTP server](set-up-ingestion-agent.md#prepare-the-sftp-server), otherwise the remote server needs to be accessible by your ingestion method.
183+
184+
1. Obtain the IP address, user, and password of the remote server.
185+
1. Configure the transfer of EMS statistics to a remote server
186+
- Use the instructions in [Copying Performance Management Statistics Files to Destination Server](https://manuals.metaswitch.com/MCC/13.1/Acuitas_Users_RevB/Content/Appendix%20Interfacing%20with%20Northbound%20Interfaces/Exported_Performance_Management_Data.htm#northbound_2817469247_308739) in the _Acuitas User's Guide_.
187+
- For `edr-validation`, you only need to export three CSV files. List these file names in the `opt/Affirmed/NMS/conf/pm/mcc.files.txt` file on the EMS:
188+
- `EDR_HTTP_STATS`
189+
- `EDR_FLOW_STATS`
190+
- `EDR_SESSION_STATS`
191+
192+
> [!IMPORTANT]
193+
> Increase the frequency of the cron job by reducing the `timeInterval` argument from `15` (default) to `5` minutes.
194+
135195
## Related content
136196

137197
- [Data Quality Monitoring](concept-data-quality-monitoring.md)

articles/operator-insights/concept-monitoring-mcc-data-product.md

Lines changed: 28 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -39,30 +39,44 @@ The following data type is provided as part of the Monitoring - Affirmed MCC Dat
3939
To use the Monitoring - Affirmed MCC Data Product:
4040

4141
1. Deploy the Data Product by following [Create an Azure Operator Insights Data Product](data-product-create.md).
42-
1. Configure your network to provide data by setting up an Azure Operator Insights ingestion agent on a virtual machine (VM).
42+
1. Configure your network to produce performance management data, as described in [Required network configuration](#required-network-configuration).
43+
1. Set up ingestion (data upload) from your network. For example, you could use the [Azure Operator Insights ingestion agent](ingestion-agent-overview.md) or [connect Azure Data Factory](ingestion-with-data-factory.md) to your Data Product.
44+
- Use the information in [Required ingestion configuration](#required-ingestion-configuration) when you're setting up ingestion.
45+
- If you're using the Azure Operator Insights ingestion agent, also meet the requirements in [Requirements for the Azure Operator Insights ingestion agent](#requirements-for-the-azure-operator-insights-ingestion-agent).
4346

44-
1. Read [Requirements for the Azure Operator Insights ingestion agent](#requirements-for-the-azure-operator-insights-ingestion-agent).
45-
1. [Install the Azure Operator Insights ingestion agent and configure it to upload data](set-up-ingestion-agent.md).
47+
### Required network configuration
4648

47-
Alternatively, you can provide your own ingestion agent.
49+
Configure the EMS server to export performance management data to a remote server. If you're using the Azure Operator Insights ingestion agent, the remote server must be an [SFTP server](set-up-ingestion-agent.md#prepare-the-sftp-server). If you're providing your own ingestion agent, the remote server needs to be accessible by your ingestion agent.
4850

49-
1. Configure the EMS server to export PMStats to a remote server. If you are using the Azure Operator Insights ingestion agent, the remote server must be an [SFTP server](set-up-ingestion-agent.md#prepare-the-sftp-server). If you are providing your own ingestion agent, the remote server just needs to be accessible by your ingestion agent.
50-
51-
1. IP address, user, and password of the remote server are required for this step.
52-
1. Follow the instructions in the section [Copying Performance Management Statistics Files to Destination Server](https://manuals.metaswitch.com/MCC/13.1/Acuitas_Users_RevB/Content/Appendix%20Interfacing%20with%20Northbound%20Interfaces/Exported_Performance_Management_Data.htm#northbound_2817469247_308739) to configure the transfer of EMS stats to the remote server.
51+
1. Obtain the IP address, user, and password of the remote server.
52+
1. Configure the transfer of EMS statistics to a remote server by following [Copying Performance Management Statistics Files to Destination Server](https://manuals.metaswitch.com/MCC/13.1/Acuitas_Users_RevB/Content/Appendix%20Interfacing%20with%20Northbound%20Interfaces/Exported_Performance_Management_Data.htm#northbound_2817469247_308739) in the _Acuitas User's Guide_.
5353

5454
> [!IMPORTANT]
5555
> Increase the frequency of the cron job by reducing the `timeInterval` argument from `15` (default) to `5` minutes.
5656
57-
57+
### Required ingestion configuration
58+
59+
Use the information in this section to configure your ingestion method. Refer to the documentation for your chosen method to determine how to supply these values.
5860

59-
## Requirements for the Azure Operator Insights ingestion agent
61+
| Data type | Required container name | Requirements for data |
62+
|---------|---------|---------|
63+
| `pmstats` | `pmstats` | Performance data from MCC nodes. File names must start with the dataset name. For example, `WORKFLOWPERFSTATSSLOT` data must be ingested in files whose names start with `WORKFLOWPERFSTATSSLOT`. |
6064

61-
Use the VM requirements to set up a suitable VM for the ingestion agent. Use the example configuration to configure the ingestion agent to upload data to the Data Product, as part of following [Install the Azure Operator Insights ingestion agent and configure it to upload data](set-up-ingestion-agent.md).
65+
If you're using the Azure Operator Insights ingestion agent:
66+
- Configure the ingestion agent to use SFTP pull from the SFTP server.
67+
- We recommend the following configuration settings in addition to the (required) settings in the previous table.
6268

63-
## Choosing agents and VMs
69+
|Information | Configuration setting for Azure Operator Ingestion agent | Recommended value |
70+
| --------- | --------- | --------- |
71+
| [Settling time](ingestion-agent-overview.md#processing-files) | `source.sftp_pull.filtering.settling_time` | `60s` (upload files that haven't been modified in the last 60 seconds) |
72+
| Schedule for checking for new files | `source.sftp_pull.scheduling.cron` | `0 */5 * * * * *` (every 5 minutes) |
6473

65-
An ingestion agent collects files from _ingestion pipelines_ that you configure on it. Ingestion pipelines include the details of the SFTP server, the files to collect from it and how to manage those files.
74+
> [!TIP]
75+
> For more information about all the configuration options for the ingestion agent, see [Configuration reference for Azure Operator Insights ingestion agent](ingestion-agent-configuration-reference.md).
76+
77+
### Requirements for the Azure Operator Insights ingestion agent
78+
79+
The Azure Operator Insights ingestion agent collects files from _ingestion pipelines_ that you configure on it. Ingestion pipelines include the details of the SFTP server, the files to collect from it and how to manage those files.
6680

6781
You must choose how to set up your agents, pipelines, and VMs using the following rules.
6882

@@ -84,11 +98,9 @@ As a guide, this table documents the throughput that the recommended specificati
8498

8599
For example, if you need to collect from two file sources, you could:
86100

87-
- Deploy one VM with one agent that collects from both file sources.
101+
- Deploy one VM with one agent, configured with two pipelines. Each pipeline collects from one file source.
88102
- Deploy two VMs, each with one agent. Each agent (and therefore each VM) collects from one file source.
89103

90-
### VM requirements
91-
92104
Each Linux VM running the agent must meet the following minimum specifications.
93105

94106
| Resource | Requirements |
@@ -102,23 +114,6 @@ Each Linux VM running the agent must meet the following minimum specifications.
102114
| Other | SSH or alternative access to run shell commands |
103115
| DNS | (Preferable) Ability to resolve Microsoft hostnames. If not, you need to perform extra configuration when you set up the agent (described in [Map Microsoft hostnames to IP addresses for ingestion agents that can't resolve public hostnames](map-hostnames-ip-addresses.md).) |
104116

105-
### Required agent configuration
106-
107-
Use the information in this section when [setting up the agent and configuring the agent software](set-up-ingestion-agent.md#configure-the-agent-software).
108-
109-
The ingestion agent must use SFTP pull as a data source.
110-
111-
|Information | Configuration setting for Azure Operator Ingestion agent | Value |
112-
|---------|---------|---------|
113-
|Container in the Data Product input storage account |`sink.container_name` | `pmstats` |
114-
| [Settling time](ingestion-agent-overview.md#processing-files) | `source.sftp_pull.filtering.settling_time` | `60s` (upload files that haven't been modified in the last 60 seconds) |
115-
| Schedule for checking for new files | `source.sftp_pull.scheduling.cron` | `0 */5 * * * * *` (every 5 minutes) |
116-
117-
> [!IMPORTANT]
118-
> `sink.container_name` must be set exactly as specified here. You can change other configuration to meet your requirements.
119-
120-
For more information about all the configuration options, see [Configuration reference for Azure Operator Insights ingestion agent](ingestion-agent-configuration-reference.md).
121-
122117
## Related content
123118

124119
- [Data Quality Monitoring](concept-data-quality-monitoring.md)

0 commit comments

Comments
 (0)