Skip to content

Commit 86310df

Browse files
Merge pull request #274865 from samtarver/ingestion-agent-docs-fixes
Ingestion agent docs fixes
2 parents 5119c9b + c21254a commit 86310df

File tree

3 files changed

+28
-14
lines changed

3 files changed

+28
-14
lines changed

articles/operator-insights/ingestion-agent-configuration-reference.md

Lines changed: 12 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,7 @@ ms.service: operator-insights
88
ms.topic: conceptual
99
ms.date: 12/06/2023
1010
---
11+
1112
# Configuration reference for Azure Operator Insights ingestion agent
1213

1314
This reference provides the complete set of configuration for the [Azure Operator Insights ingestion agent](ingestion-agent-overview.md), listing all fields with explanatory comments.
@@ -22,12 +23,12 @@ This reference shows two pipelines: one with an MCC EDR source and one with an S
2223

2324
```yaml
2425
# A unique identifier for this agent instance. Reserved URL characters must be percent-encoded. It's included in the upload path to the Data Product's input storage account.
25-
agent_id: agent01
26+
agent_id: agent01
2627
# Config for secrets providers. We support reading secrets from Azure Key Vault and from the VM's local filesystem.
2728
# Multiple secret providers can be defined and each must be given a unique name, which is referenced later in the config.
2829
# A secret provider of type `key_vault` which contains details required to connect to the Azure Key Vault and allow connection to the Data Product's input storage account. This is always required.
2930
# A secret provider of type `file_system`, which specifies a directory on the VM where secrets are stored. For example for an SFTP pull source, for storing credentials for connecting to an SFTP server.
30-
secret_providers:
31+
secret_providers:
3132
- name: data_product_keyvault_mi
3233
key_vault:
3334
vault_name: contoso-dp-kv
@@ -73,7 +74,7 @@ sink:
7374
# Optional A string giving an optional base path to use in the container in the Data Product's input storage account. Reserved URL characters must be percent-encoded. See the Data Product for what value, if any, is required.
7475
base_path: base-path
7576
sas_token:
76-
# This must reference a secret provider configured above.
77+
# This must reference a secret provider configured above.
7778
secret_provider: data_product_keyvault_mi
7879
# The name of a secret in the corresponding provider.
7980
# This will be the name of a secret in the Key Vault.
@@ -102,13 +103,13 @@ source:
102103
mcc_edrs:
103104
# The maximum amount of data to buffer in memory before uploading. Units are B, KiB, MiB, GiB, etc.
104105
message_queue_capacity: 32 MiB
105-
# Quick check on the maximum RAM that the agent should use.
106-
# This is a guide to check the other tuning parameters, rather than a hard limit.
106+
# Quick check on the maximum RAM that the agent should use.
107+
# This is a guide to check the other tuning parameters, rather than a hard limit.
107108
maximum_overall_capacity: 1216 MiB
108109
listener:
109110
# The TCP port to listen on. Must match the port MCC is configured to send to. Defaults to 36001.
110111
port: 36001
111-
# EDRs greater than this size are dropped. Subsequent EDRs continue to be processed.
112+
# EDRs greater than this size are dropped. Subsequent EDRs continue to be processed.
112113
# This condition likely indicates MCC sending larger than expected EDRs. MCC is not normally expected
113114
# to send EDRs larger than the default size. If EDRs are being dropped because of this limit,
114115
# investigate and confirm that the EDRs are valid, and then increase this value. Units are B, KiB, MiB, GiB, etc.
@@ -118,7 +119,7 @@ source:
118119
# corrupt EDRs to Azure. You should not need to change this value. Units are B, KiB, MiB, GiB, etc.
119120
hard_maximum_message_size: 100000 B
120121
batching:
121-
# The maximum size of a single blob (file) to store in the Data Product's input storage account.
122+
# The maximum size of a single blob (file) to store in the Data Product's input storage account.
122123
maximum_blob_size: 128 MiB. Units are B, KiB, MiB, GiB, etc.
123124
# The maximum time to wait when no data is received before uploading pending batched data to the Data Product's input storage account. Examples: 30s, 10m, 1h, 1d.
124125
blob_rollover_period: 5m
@@ -149,16 +150,17 @@ source:
149150
# Only for use with password authentication. The name of the file containing the password in the secrets_directory folder
150151
secret_name: sftp-user-password
151152
# Only for use with private key authentication. The name of the file containing the SSH key in the secrets_directory folder
152-
key_secret: sftp-user-ssh-key
153+
key_secret_name: sftp-user-ssh-key
153154
# Optional. Only for use with private key authentication. The passphrase for the SSH key. This can be omitted if the key is not protected by a passphrase.
154155
passphrase_secret_name: sftp-user-ssh-key-passphrase
155156
filtering:
156157
# The path to a folder on the SFTP server that files will be uploaded to Azure Operator Insights from.
157158
base_path: /path/to/sftp/folder
158159
# Optional. A regular expression to specify which files in the base_path folder should be ingested. If not specified, the agent will attempt to ingest all files in the base_path folder (subject to exclude_pattern, settling_time and exclude_before_time).
159-
include_pattern: "*\.csv$"
160+
include_pattern: ".*\.csv$" # Only include files which end in ".csv"
160161
# Optional. A regular expression to specify any files in the base_path folder which should not be ingested. Takes priority over include_pattern, so files which match both regular expressions will not be ingested.
161-
exclude_pattern: '\.backup$'
162+
# The exclude_pattern can also be used to ignore whole directories, but the pattern must still match all files under that directory. e.g. `^excluded-dir/.*$` or `^excluded-dir/` but *not* `^excluded-dir$`
163+
exclude_pattern: "^\.staging/|\.backup$" # Exclude all file paths that start with ".staging/" or end in ".backup"
162164
# A duration, such as "10s", "5m", "1h".. During an upload run, any files last modified within the settling time are not selected for upload, as they may still be being modified.
163165
settling_time: 1m
164166
# Optional. A datetime that adheres to the RFC 3339 format. Any files last modified before this datetime will be ignored.

articles/operator-insights/monitor-troubleshoot-ingestion-agent.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,8 @@ Metrics are reported in a simple human-friendly form.
3232

3333
To collect a diagnostics package, SSH to the Virtual Machine and run the command `/usr/bin/microsoft/az-aoi-ingestion-gather-diags`. This command generates a date-stamped zip file in the current directory that you can copy from the system.
3434

35+
If you have configured collection of logs through the Azure Monitor agent, you can view ingestion agent logs in the portal view of your Log Analytics workspace, and may not need to collect a diagnostics package to debug your issues.
36+
3537
> [!NOTE]
3638
> Microsoft Support might request diagnostics packages when investigating an issue. Diagnostics packages don't contain any customer data or the value of any credentials.
3739
@@ -117,6 +119,7 @@ Symptoms: No data appears in Azure Data Explorer. Logs of category `Ingestion` d
117119
118120
- Check that the agent is running on all VMs and isn't reporting errors in logs.
119121
- Check that files exist in the correct location on the SFTP server, and that they aren't being excluded due to file source config (see [Files are missing](#files-are-missing)).
122+
- Ensure that the configured SFTP user can read all directories under the `base_path`, which file source config doesn't exclude.
120123
- Check the network connectivity and firewall configuration between the ingestion agent VM and the Data Product's input storage account.
121124
122125
### Files are missing

articles/operator-insights/set-up-ingestion-agent.md

Lines changed: 13 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -139,6 +139,9 @@ On the SFTP server:
139139
140140
1. Ensure port 22/TCP to the VM is open.
141141
1. Create a new user, or determine an existing user on the SFTP server that the ingestion agent should use to connect to the SFTP server.
142+
- By default the ingestion agent searches every directory under the base path, so this user must be able to read all of them. Any directories that the user does not have permission to access must be excluded using the `exclude_pattern` configuration.
143+
> [!Note]
144+
> Implicitly excluding directories by not specifying them in the included pattern is not sufficient to stop the agent searching those directories. See [the configuration reference](ingestion-agent-configuration-reference.md) for more detail on excluding directories.
142145
1. Determine the authentication method that the ingestion agent should use to connect to the SFTP server. The agent supports:
143146
- Password authentication
144147
- SSH key authentication
@@ -277,7 +280,12 @@ The configuration you need is specific to the type of source and your Data Produ
277280
- `user`: the name of the user on the SFTP server that the agent should use to connect.
278281
- Depending on the method of authentication you chose in [Prepare the VMs](#prepare-the-vms), set either `password` or `private_key`.
279282
- For password authentication, set `secret_name` to the name of the file containing the password in the `secrets_directory` folder.
280-
- For SSH key authentication, set `key_secret` to the name of the file containing the SSH key in the `secrets_directory` folder. If the private key is protected with a passphrase, set `passphrase_secret_name` to the name of the file containing the passphrase in the `secrets_directory` folder.
283+
- For SSH key authentication, set `key_secret_name` to the name of the file containing the SSH key in the `secrets_directory` folder. If the private key is protected with a passphrase, set `passphrase_secret_name` to the name of the file containing the passphrase in the `secrets_directory` folder.
284+
- All secret files should have permissions of `600` (`rw-------`), and an owner of `az-aoi-ingestion` so only the ingestion agent and privileged users can read them.
285+
```
286+
sudo chmod 600 <secrets_directory>/*
287+
sudo chown az-aoi-ingestion <secrets_directory>/*
288+
```
281289
282290
For required or recommended values for other fields, refer to the documentation for your Data Product.
283291
@@ -327,11 +335,12 @@ If you're running the ingestion agent on an Azure VM or on an on-premises VM con
327335
To collect ingestion agent logs, follow [the Azure Monitor documentation to install the Azure Monitor Agent and configure log collection](../azure-monitor/agents/data-collection-text-log.md).
328336
329337
- These docs use the Az PowerShell module to create a logs table. Follow the [Az PowerShell module install documentation](/powershell/azure/install-azure-powershell) first.
330-
- The `YourOptionalColumn` section from the sample `$tableParams` JSON is unnecessary for the ingestion agent, and can be removed.
338+
- The `YourOptionalColumn` section from the sample `$tableParams` JSON is unnecessary for the ingestion agent, and can be removed.
331339
- When adding a data source to your data collection rule, add a `Custom Text Logs` source type, with file pattern `/var/log/az-aoi-ingestion/stdout.log`.
332-
- After adding the data collection rule, you can query these logs through the Log Analytics workspace. Use the following query to make them easier to work with:
340+
- We also recommend following [the documentation to add a `Linux Syslog` Data source](../azure-monitor/agents/data-collection-syslog.md) to your data collection rule, to allow for auditing of all processes running on the VM.
341+
- After adding the data collection rule, you can query the ingestion agent logs through the Log Analytics workspace. Use the following query to make them easier to work with:
333342
```
334-
RawAgentLogs_CL
343+
<CustomTableName>
335344
| extend RawData = replace_regex(RawData, '\\x1b\\[\\d{1,4}m', '') // Remove any color tags
336345
| parse RawData with TimeGenerated: datetime ' ' Level ' ' Message // Parse the log lines into the TimeGenerated, Level and Message columns for easy filtering
337346
| order by TimeGenerated desc

0 commit comments

Comments
 (0)