Skip to content

Conversation

@Fiery-Fenix
Copy link
Contributor

Description

This is a first part from the set of Azure Resource Log parsing implementations in azure_encoding extension.
This part is focused on generic Azure Resource Log parsing, without any Category-specific code that will be presented in the following PR's
Key point for this PR:

  • The code is actually a heavily refactored part of code from pkg/translator/azurelogs and has similar benchmark to compare performance
  • Code is performance optimized (using structs, partial parsing, etc.) to be able to handle large batches of logs
  • Log parsing was refactored in the way that allows parsing any JSON schema, because Azure Resource Logs not always follows common log schema. For example, Azure ServiceBus OperationalLogs has only 1 common field with mentioned schema - category, all other fields has different names
  • Parser is Category-oriented - it allows to have Category specific set of Resource Attributes, timestamp and log level parsers, as well as Category-specific fields to attributes translators
  • Parser implementation is using relatively fresh SemConv version and tries to be as much as possible SemConv-compatible

Link to tracking issue

Part of #41725.

Testing

Corresponding Unit Test were added. Also couple Benchmark test were added.

Documentation

Additional log-specific Readme was added.

Copy link
Contributor

@constanca-m constanca-m left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for starting this @Fiery-Fenix ! I would love if we could start this encoding extension with performance in mind from the start, since the translator logs made the mistake of not to. I left a couple of comments on the things I think we should improve

// attrPutMap is a helper function to set a map attribute with defined key,
// trying to parse it from raw value
// If parsing failed - no attribute will be set
func AttrPutMapIf(attrs pcommon.Map, attrKey string, attrValue any) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any is very bad performance wise, and giving that an encoding is an hot-path, this function would not work here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch, this function is intended to store a parsed JSON object into attribute, not an any value, I'll fix it.
Please keep in mind that this function is only for data that we don't know how to properly parse, i.e. we are not supporting it at the moment or we don't have defined structure for it (for example it's dynamic structure)

Comment on lines +179 to +190
if err := gojson.Unmarshal(r.Properties, &properties); err != nil {
errs = append(errs, err)
// If failed - trying to parse using a primitive value
var val any
if err = gojson.Unmarshal(r.Properties, &val); err == nil {
// Parsed, put primitive value as "properties" attribute
if err = attrs.PutEmpty(attributesAzureProperties).FromRaw(val); err != nil {
errs = append(errs, err)
// All attempts above - failed, put unparsable properties to log.Body
body.SetStr(string(r.Properties))
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a performance issue as well, we have no idea what properties is according to this code. Is this correct? We should use category to determine that, instead of unmarshaling to any

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we have no idea what properties is according to this code. Is this correct?

Yeah, in some cases properties is a string.

We need to double-check if the properties never changes for the same category. It may change. For example, Azure functions emits log events with different values for the same category—depending on the hosting plan (Azure Functions logs is the source of the most hostile categories in the ecosystem).

I'll look for sample logs with properties other than string.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I don't mean we need to unstructure immediately. But using []byte/json.RawMessage is already better than any. I want us to be cautious and only use any for very specific cases where we see no options.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's an example from the FunctionAppLogs category: https://github.com/elastic/integrations/blob/main/packages/azure_functions/data_stream/functionapplogs/_dev/test/pipeline/test-azure-functions-invalid-json.log

In this first case, the properties field holds a string value (which happens to be invalid JSON, but that's a separate issue).

However, in another instance of that same category https://github.com/elastic/integrations/blob/main/packages/azure_functions/data_stream/functionapplogs/_dev/test/pipeline/test-azure-functions-raw.log, the properties value is an actual object.

The schema varies depending on the function's hosting plan.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please keep in mind that this function is used only when we couldn't correctly map to defined structure, i.e. we haven't added support yet or it simply couldn't be parsed at all as @zmoog mentioned.
It's a last resort code, all known "properties" should be mapped to defined structure without any usage, you'll see how it implemented in next PR's
BTW it's refactored version of existing code in pkg/translator/azurelogs

// This will allow us to parse Azure Log Records in both formats:
// 1) As exported to Azure Event Hub, e.g. `{"records": [ {...}, {...} ]}`
// 2) As exported to Azure Blob Storage, e.g. `[ {...}, {...} ]`
rootPath, err := gojson.CreatePath(jsonPath)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting approach, I wasn't familiar with this. Did you benchmark it? Does it performance better than just checking the first bytes:

  • Given we only offer support for resource logs and logs from a storage file:
    • Resource logs -> has records in the first bytes
    • Storage -> simply has [

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will try to implement byte-checking approach for format auto-detection, seems like we couldn't rely on data source for format detection as @zmoog highlighted

Comment on lines +29 to +44
| `time`, `timestamp` | `log.timestamp` |
| `resourceId` | `cloud.resource_id` (resource attribute) |
| `tenantId` | `azure.tenant.id` (resource attribute) |
| `location` | `cloud.region` (resource attribute) |
| `operationName` | `azure.operation.name` (log attribute) |
| `operationVersion` | `azure.operation.version` (log attribute) |
| `category`, `type` | `azure.category` (log attribute) |
| `resultType` | `azure.result.type` (log attribute) |
| `resultSignature` | `azure.result.signature` (log attribute) |
| `resultDescription` | `azure.result.description` (log attribute) |
| `durationMs` | `azure.duration` (log attribute) |
| `callerIpAddress` | `network.peer.address` (log attribute) |
| `correlationId` | `azure.correlation_id` (log attribute) |
| `identity` | `azure.identity` (log attribute) |
| `Level` | `log.SeverityNumber` |
| `properties` | see mapping for each Category below |
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea on showing if the field is for a resource or a log.

Nit: If we add it as a column, that might improve readability a bit.


// This will allow us to parse Azure Log Records in both formats:
// 1) As exported to Azure Event Hub, e.g. `{"records": [ {...}, {...} ]}`
// 2) As exported to Azure Blob Storage, e.g. `[ {...}, {...} ]`
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which categories did you test that came as [ {...}, {...} ] on Blob Storage?

I recently collected sample logs on blob storage for the following categories:

  • kube-apiserver (AKS control plane logs)
  • Policy (activity logs from a subscription)
  • FlowLogFlowEvent (VNet flow logs)

kube-apiserver and Policy came to blob storage from diagnostic settings as newline separated objects:

{}
{}
{}

FlowLogFlowEvent came from the VNet exporter (not DS) as records—just like what we get on Event Hubs:

{"records": [ {...}, {...} ]}

So, my current understanding is we cannot rely on the source to determine the format (DS, DCR, VNet, etc). Ideally, the unmarshaler should accept both {"records": [ {...}, {...} ]} and the direct log {}.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, that's interesting...
I was testing it on Application Gateway Logs (ApplicationGatewayFirewallLog and ApplicationGatewayAccessLog)
I will try to implement some auto-detection functionality here, as seems we couldn't really rely on data source for format definition :(

Comment on lines +115 to +123
// Filter out categories based on provided configuration
if _, exclude := r.excludeCategories[logCategory]; exclude {
continue
}
if hasIncludes {
if _, include := r.includeCategories[logCategory]; !include {
continue
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. I've never had to filter log categories. What's the specific driver here? (No objections to the option, though!)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We faced this issue while using shared EventHub.
There was already existing export flow to EventHub for SIEM purposes, but we don't need all logs from it, only some specific categories.
Of course, it could be done by filter processor later in collector pipeline, but it's obviously very inefficient way to parse logs and than discard them later, instead of simply discarding them before parsing,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants