-
Notifications
You must be signed in to change notification settings - Fork 6
[AWS Master] Kubernetes: add logging stack #1063
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AWS Master] Kubernetes: add logging stack #1063
Conversation
Logging Stack ResearchCandidates:
Useful links:
Vitctoria Logs
Loki
ELK ELK vs Loki vs VictoriaLogs:
Summary I spent some time configuring Loki until I realized that Loki (even configured to use S3) still uses PV (https://www.reddit.com/r/grafana/comments/1kvsgro/loki_with_s3_still_needs_pvcs_pvs_really/). These PVs I expect to be high-ops. So we cannot use on-prem distributed storage for them since our networking is slow. I stopped considering Loki at this point since I don't see a way how to make it high available (these Persistent Volumes) But maybee there is a way. I just stopped researching further I managed to deploy Victoria Logs helm chart in 5 minutes. It had kubernetes logs sent via vector out of the box. It couldn't have been easier. I then spent 2 days trying to understand how to make it highly available (VictoriaMetrics/VictoriaLogs#33) and faced a few bugs / question wrt. its helm charts (VictoriaMetrics/helm-charts#2214, VictoriaMetrics/helm-charts#2219). All in all, it seems to be a good solution that works our of the box, can be HA, easy to maintain and is powerful (enough for us). Helm charts are still WIP (as I can see) but I am fine with this (after my experience with ELK helm charts). |
ELK stores all the ingested logs in PVs, and allows moving the historical data to object storage via snapshots - https://www.elastic.co/docs/reference/elasticsearch/index-lifecycle-actions/ilm-searchable-snapshot .
Loki is very hard to manage comparing to VictoriaLogs, since it consists of many interconnected micro-services with very complex configs, which are mostly undocumented and tend to break with every new release. See https://grafana.com/docs/loki/latest/get-started/architecture/ .
VictoriaLogs stores data to the built-in database optimized for typical logs' workloads. ELK and Loki do exactly the same - they store data into built-in databases. The difference is that VictoriaLogs uses more efficient database format, which needs less disk space, RAM and CPU, comparing to ELK and Loki. See https://itnext.io/how-do-open-source-solutions-for-logs-work-elasticsearch-loki-and-victorialogs-9f7097ecbc2f and https://itnext.io/why-victorialogs-is-a-better-alternative-to-grafana-loki-7e941567c4d5 for details. The better approach to select the needed logging solution is to configure and run multiple solutions on your particular production workload and then choose the best solution for the given production workload. I recommend evaluating the official Helm charts for every tested solution: |
Hi @valyala and thank you for your feedback. I have a question however. When you say that ELK stores logs in PVs you imply Elasticsearch that uses PVs, right? I am not aware of anything else using PVs in this Stack (except Elasticsearch) 🤔 |
matusdrobuliak66
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! nice work
sanderegg
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice work.
so I understand you want to replace Graylog with this stack.
Are we also getting all the same loging facilities that we have in Graylog?
at least from what I know:
- the docker engine
- the machines syslogs
mrnicegyu11
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks a lot for this! if you gain more experience with configuring vector, talk to me in case you think it is "better" than fluentd for the docker-swarm usecase ;)
very nice indeed
this is the full list actually. |
Machine sys log yes --> https://vector.dev/docs/reference/configuration/sources/syslog/ Docker engine (@sanderegg do you mean docker logs?) --> https://vector.dev/docs/reference/configuration/sources/docker_logs/ See all sources we can scrape https://vector.dev/docs/reference/configuration/sources/ |
Robust, production ready and Highly Available, but also powerful and flexible.
I believe so, yes vectordotdev/vector#4868
I agree. I don't see reasons atm why we would need graylog. |
What do these changes do?
Add Highly Available Logging Stack to Kubernetes (scraping, storing and querying)
Minor:
Technology choice
Logging Backend / Frontend(read more in comments) --> Victoria Logs
Logging shipper (read more in comments) --> vector.dev
Next steps
We can use Victoria Logs Datasource to visualize and query logs in Grafana https://github.com/VictoriaMetrics/victorialogs-datasource
Related issue/s
Related PR/s
Checklist
Service is monitored (via prometheus and grafana)not applicable atmService's Public URL is included in maintenance modenot applicable atmService's Public URL is included in testing mode not applicableatm