-
Notifications
You must be signed in to change notification settings - Fork 178
chore: Redis Enterprise modern library usage #1538
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
chore: Redis Enterprise modern library usage #1538
Conversation
aalhour
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First pass, no linting errors locally, a few typos and an issue with alerts, everything else is minor.
| redisEnterpriseNodes: | ||
| link.link.new('Redis Enterprise nodes', '/d/' + this.grafana.dashboards['redis-enterprise-node-overview.json'].uid), | ||
| redisEnterpriseDatabases: | ||
| link.link.new('Redis Enterprise databases', '/d/' + this.grafana.dashboards['redis-enterprise-database-overview.jsonn'].uid), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo? .jsonn --> .json
| refresh, | ||
| period, | ||
| ), | ||
| 'redis-enterprise-database-overview.jsonn': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo? .jsonn --> .json
| clusterTotalRequests: { | ||
| name: 'Cluster total requests', | ||
| nameShort: 'Total requests', | ||
| type: 'raw', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this also be of type gauge?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope at least as currently written I want this query with its aggregation to not be modified:
- counter: is wrapped into `<rangeFunction>(<base>[<interval>])`
- gauge: no transformation
- histogram: wrapped into `histogram_quantile(0.95, <aggFunction>(rate(<base>[<interval>])) by (le,<agg>))`
- info: no transformation
- raw: no transformation. You can write your own complex expression and make sure is kept as is.
Also, regardless of type (with the exception of `raw`), additional wrapper can be added when aggLevel `group` or `instance` is selected:
`<aggFunction> by (<agg>) (<base>)`
I don't want these to do further aggregations since all of the raw expressions are already doing aggregations
| clusterTotalMemoryUsed: { | ||
| name: 'Cluster total memory used', | ||
| nameShort: 'Total memory used', | ||
| type: 'raw', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this also be of type gauge?
| clusterTotalConnections: { | ||
| name: 'Cluster total connections', | ||
| nameShort: 'Total connections', | ||
| type: 'raw', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this also be of type gauge?
| databaseReadRequests: { | ||
| name: 'Database read requests', | ||
| nameShort: 'Read requests', | ||
| type: 'raw', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this also be of type gauge?
| databaseCacheHitRatio: { | ||
| name: 'Database cache hit ratio', | ||
| nameShort: 'Cache hit ratio', | ||
| type: 'raw', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this also be of type gauge?
| databaseMemoryUtilization: { | ||
| name: 'Database memory utilization', | ||
| nameShort: 'Memory utilization', | ||
| type: 'raw', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this also be of type gauge?
| expr: ||| | ||
| bdb_avg_latency / 1000 > %(alertsDatabaseHighLatencyMs)s | ||
| ||| % $._config, | ||
| bdb_avg_latency{%(filteringSelector)s} / 1000 > %(alertsDatabaseHighLatencyMs)s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we divide by 1000 that would convert the value to seconds, but the alert threshold is in milliseconds - alertsDatabaseHighLatencyMs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah this seems like a bug with the original implementation. Think it should be bdb_avg_latency * 1000 based off the context.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch! 🚀
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome! Thank you!
| + g.panel.timeSeries.standardOptions.withUnit('percent'), | ||
|
|
||
| nodesNodeCPUUtilizationPanel: | ||
| commonlib.panels.generic.timeSeries.base.new( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are your thoughts on how this would look versus the commonlib.panels.cpu.timeSeries.base.new? I see that the base is empty in the common lib but it has utilization by core and mode overrides.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea!
| + g.panel.barGauge.queryOptions.withTargets([ | ||
| signals.overview.nodeUp.asTarget(), | ||
| ]) | ||
| + g.panel.barGauge.panelOptions.withDescription('Up/down status for each node in the cluster.') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I just realised and commented on in the couchDB PR, this is duplicating the descriptions unnecessarily. Should be imported by default I believe, but if not you can just call the signal's description directly, e.g. signals.overview.nodeUp.description
| local signals = this.signals, | ||
|
|
||
| // Overview panels - Status panels | ||
| overviewNodesUpPanel: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it's just me, but I find this info a little cluttered (for this, the db up , and shards up panels)
Specifically here, the legends are a little cluttered with cluster - number, plus the value of 1 for each row, which doesn't really match the visualisation being somwhere not 100%
Could we drop the cluster prefix and make the instance name much clearer and consistent across?
Same thing for the absolute value, is it up or not, and if so the visualisation shouldn't be in-between :)
| clusterTotalMemoryUsed: { | ||
| name: 'Cluster total memory used', | ||
| nameShort: 'Total memory used', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As this is a total value panel, we need to remember to stack the values as this could grow very hard to read with multiple DBs
Additionally, and this is just my opinion at this point, but given that the cardinality is per-database, it may be worth doing this one as a 12-width right-aligned table legend with min/max/avg etc values?
| clusterTotalConnections: { | ||
| name: 'Cluster total connections', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above, both on stacking and table legend concerns
| + g.panel.timeSeries.panelOptions.withDescription('Ratio of database cache key hits against hits and misses.') | ||
| + g.panel.timeSeries.standardOptions.withUnit('percent'), | ||
|
|
||
| overviewClusterEvictionsVsExpiredObjectsPanel: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 on potential illegible legend with many databases defined
| sources: { | ||
| prometheus: { | ||
| expr: 'node_avg_latency{%(queriesSelector)s}', | ||
| legendCustomTemplate: '{{ redis_cluster }} - node: {{ node }}', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another inconsistent handling of instance / node here with how the legend is written (as compared to the panels at the top row)
| panels.overviewClusterEvictionsVsExpiredObjectsPanel { gridPos+: { w: 8, h: 6 } }, | ||
| ]), | ||
|
|
||
| overviewNodesKPIsRow: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually thinking about it some more, for the overview page I think it's OK to sum it, and we could perhaps do a table on the nodes dashboard?
| + g.panel.timeSeries.panelOptions.withDescription('Memory utilization % for each node in the cluster.') | ||
| + g.panel.timeSeries.standardOptions.withUnit('percent'), | ||
|
|
||
| overviewNodeCPUUtilizationPanel: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one shows legend in yet another way (without cluster this time), which I think looks a fair bit cleaner personally, but we'd want to be a bit more consistent, at least in the KPIs row.
| ]), | ||
|
|
||
| nodesMetricsRow: | ||
| g.panel.row.new('Node metrics') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Github is throwing a fit whenever I try to do multi-line select right now, but could you please give all the panels in this row a consistency pass on their legends (these ones just have the raw node label)
| ]), | ||
|
|
||
| databasesCRDBRow: | ||
| g.panel.row.new('Active-Active') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this name tell us?

This PR refactors the Redis-Enterprise mixin to use the signals API to create its dashboards and follow best practices using g.libsonnet and commonlib.
If you need access for testdata we can provide it at keithschmitty.grafana.net
Screenshots with minimal setup (no loadgen so some metrics will not be populated in these screenshots)
Screenshots
OverviewNodes


Database
Logs

The way my lab was configured didn't allow for logs but it should work