Commit e00e26e
Fleet Usage telemetry extension (elastic#145353)
## Summary
Closes elastic/ingest-dev#1261
Added a snippet to the telemetry that I added for each requirement.
Please review and let me know if any changes are needed.
Also asked a few questions below. @jlind23 @kpollich
6. is blocked by [elasticsearch
change](elastic/elasticsearch#91701) to give
kibana_system the missing privilege to read logs-elastic_agent* indices.
Took inspiration for task versioning from
https://github.com/elastic/kibana/pull/144494/files#diff-0c7c49bf5c55c45c19e9c42d5428e99e52c3a39dd6703633f427724d36108186
- [x] 1. Elastic Agent versions
Versions of all the Elastic Agent running: `agent.version` field on
`.fleet-agents` documents
```
"agent_versions": [
"8.6.0"
],
```
- [x] 2. Fleet server configuration
Think we can query for `.fleet-policies` where some `input` has `type:
'fleet-server'` for this, as well as use the `Fleet Server Hosts`
settings that we define via saved objects in Fleet
```
"fleet_server_config": {
"policies": [
{
"input_config": {
"server": {
"limits.max_agents": 10000
},
"server.runtime": "gc_percent:20"
}
}
]
}
```
- [x] 3. Number of policies
Count of `.fleet-policies` index
To confirm, did we mean agent policies here?
```
"agent_policies": {
"count": 7,
```
- [x] 4. Output type contained in those policies
Collecting this from ts logic, querying from `.fleet-policies` index.
The alternative would be to write a painless script (because the
`outputs` are an object with dynamic keys, we can't do an aggregation
directly).
```
"agent_policies": {
"output_types": [
"elasticsearch"
]
}
```
Did we mean to just collect the types here, or any other info? e.g.
output urls
- [x] 5. Average number of checkin failures
We only have the most recent checkin status and timestamp on
`.fleet-agents`.
Do we mean here to publish the total last checkin failure count? E.g. 3
if 3 agents are in failure checkin status currently.
Or do we mean to publish specific info for all agents
(`last_checkin_status`, `last_checkin` time, `last_checkin_message`)?
Are the only statuses `error` and `degraded` that we want to send?
```
"agent_last_checkin_status": {
"error": 0,
"degraded": 0
},
```
- [ ] 6. Top 3 most common errors in the Elastic Agent logs
Do we mean here elastic-agent logs only, or fleet-server logs as well
(maybe separately)?
I found an alternative way to query the message field using sampler and
categorize text aggregation:
```
GET logs-elastic_agent*/_search
{
"size": 0,
"query": {
"bool": {
"must": [
{
"term": {
"log.level": "error"
}
},
{
"range": {
"@timestamp": {
"gte": "now-1h"
}
}
}
]
}
},
"aggregations": {
"message_sample": {
"sampler": {
"shard_size": 200
},
"aggs": {
"categories": {
"categorize_text": {
"field": "message",
"size": 10
}
}
}
}
}
}
```
Example response:
```
"aggregations": {
"message_sample": {
"doc_count": 112,
"categories": {
"buckets": [
{
"doc_count": 73,
"key": "failed to unenroll offline agents",
"regex": ".*?failed.+?to.+?unenroll.+?offline.+?agents.*?",
"max_matching_length": 36
},
{
"doc_count": 7,
"key": """stderr panic close of closed channel n ngoroutine running Stop ngithub.com/elastic/beats/v7/libbeat/cmd/instance Beat launch.func5 \n\t/go/src/github.com/elastic/beats/libbeat/cmd/instance/beat.go n
```
- [x] 7. Number of checkin failure over the past period of time
I think this is almost the same as #5. The difference would be to report
new failures happened only in the last hour, or report all agents in
failure state. (which would be an increasing number if the agent stays
in failed state).
Do we want these 2 separate telemetry fields?
EDIT: removed the last1hr query, instead added a new field to report
agents enrolled per policy (top 10). See comments below.
```
"agent_checkin_status": {
"error": 3,
"degraded": 0
},
"agents_per_policy": [2, 1000],
```
- [x] 8. Number of Elastic Agent and number of fleet server
This is already there in the existing telemetry:
```
"agents": {
"total_enrolled": 0,
"healthy": 0,
"unhealthy": 0,
"offline": 0,
"total_all_statuses": 1,
"updating": 0
},
"fleet_server": {
"total_enrolled": 0,
"healthy": 0,
"unhealthy": 0,
"offline": 0,
"updating": 0,
"total_all_statuses": 0,
"num_host_urls": 1
},
```
### Checklist
- [ ] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios
Co-authored-by: Kibana Machine <[email protected]>1 parent f1cdc08 commit e00e26e
File tree
10 files changed
+783
-204
lines changed- x-pack/plugins/fleet/server
- collectors
- integration_tests
- services
- telemetry
10 files changed
+783
-204
lines changedLines changed: 83 additions & 2 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
7 | 7 | | |
8 | 8 | | |
9 | 9 | | |
10 | | - | |
| 10 | + | |
11 | 11 | | |
| 12 | + | |
12 | 13 | | |
13 | 14 | | |
14 | 15 | | |
| |||
20 | 21 | | |
21 | 22 | | |
22 | 23 | | |
23 | | - | |
24 | 24 | | |
25 | 25 | | |
26 | 26 | | |
| |||
47 | 47 | | |
48 | 48 | | |
49 | 49 | | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
Lines changed: 46 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
7 | 7 | | |
8 | 8 | | |
9 | 9 | | |
| 10 | + | |
| 11 | + | |
10 | 12 | | |
11 | 13 | | |
12 | 14 | | |
| |||
84 | 86 | | |
85 | 87 | | |
86 | 88 | | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
11 | 11 | | |
12 | 12 | | |
13 | 13 | | |
14 | | - | |
| 14 | + | |
15 | 15 | | |
16 | 16 | | |
17 | 17 | | |
18 | 18 | | |
19 | | - | |
| 19 | + | |
20 | 20 | | |
| 21 | + | |
21 | 22 | | |
22 | 23 | | |
23 | 24 | | |
| |||
26 | 27 | | |
27 | 28 | | |
28 | 29 | | |
29 | | - | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
30 | 53 | | |
31 | 54 | | |
32 | 55 | | |
33 | | - | |
| 56 | + | |
34 | 57 | | |
35 | 58 | | |
36 | 59 | | |
| |||
41 | 64 | | |
42 | 65 | | |
43 | 66 | | |
44 | | - | |
| 67 | + | |
45 | 68 | | |
46 | 69 | | |
47 | 70 | | |
| |||
0 commit comments