Conversation
Start the RFC for Resilient Deployment Signed-off-by: Jim Fitzpatrick <jfitzpat@redhat.com>
Signed-off-by: Jim Fitzpatrick <jfitzpat@redhat.com>
| This includes deploying multiply replicas of the data plane components, Authorino & Limitador. | ||
| In the case of Limitador, this also includes being able to persist counters. | ||
|
|
||
| As Kuadrant we want to provide a user experience that makes deploying a more resilient product possible. |
There was a problem hiding this comment.
would we call it an opinionated data plane configuration? To a certain extent we are setting it up based on our own understanding of these components and I would say common topology setups (AZ for example).
rfcs/0000-resilient_deployment.md
Outdated
| - PodDisruptionBudget | ||
| - Resource Limits | ||
| - Replicas | ||
| With Authoring and Limitador there are different ways of configuring some of the features. |
There was a problem hiding this comment.
| With Authoring and Limitador there are different ways of configuring some of the features. | |
| With Authorino and Limitador there are different ways of configuring some of the features. |
|
|
||
| ### New installations | ||
| This is by far the simplest configuration. | ||
| By default, the all resilience configuration will be blank. |
There was a problem hiding this comment.
It might not be blank for a new installation right? If I install the operator and then create the kuadrant CR for first time with the resilience block filled in?
There was a problem hiding this comment.
I still think this is a simple configuration as we are going from nothing to something well defined. Is this perhaps what you were trying to convey. No kuadrant CR means no existing configuration ?
rfcs/0000-resilient_deployment.md
Outdated
| Why we do this is to simplify the reconciliation of configuration. | ||
|
|
||
| ### SRE reaction events | ||
| From time to time SRE teams would need to change configurations for other to address issues. |
There was a problem hiding this comment.
| From time to time SRE teams would need to change configurations for other to address issues. | |
| From time to time SRE teams may need to change some configuration to address specific issues. |
rfcs/0000-resilient_deployment.md
Outdated
| From time to time SRE teams would need to change configurations for other to address issues. | ||
| We should allow the user configure the deployment to a state that is we regard as not resilient. | ||
| The one example that comes to mind is scaling replicas to zero for address some issue. | ||
| The user should be allowed to that, but the Kuadrant CR should state the configuration is below expected spec. |
There was a problem hiding this comment.
| The user should be allowed to that, but the Kuadrant CR should state the configuration is below expected spec. | |
| The user should be allowed to do that, but the Kuadrant CR should state the configuration is below expected spec. |
There was a problem hiding this comment.
I am not sure if this is needed. I think it would be reasonable to expect alerts to be set up based on the expected number of replicas and dashboards rather than needing this to expressed as a warning in the status
There was a problem hiding this comment.
In one sense, I agree, and in the other this is more for use. There has being a number time were I have seen an issue arise from testing (nightly, or RC), and the one of the first thing asked for is copy of the configure in question. I have never seen the alerts being asked for. I think both alerts, and good status are required. An alert will give information faster, but a status can give more detail.
| ### The user wants more | ||
| The user knows more about their deployment than we ever can. | ||
| We need to allow the user to modify the configuration to suit their needs. | ||
| When a user modifies the configuration, the kuadrant-operator should not reconcile back the changes. |
There was a problem hiding this comment.
This is true to a point. If they delete the PDB, I would expect we put it back for example. If we have an explicit value in the kuadrant CRD that should also be reconciled back IMO. but for this feature in general I agree. But I wanted to call out the deletion as I don't think we want a situation where a PDB is deleted by accident and in order to get it back I first have to remove the config from the Kuadrant CR
There was a problem hiding this comment.
That is good to call out.
| When a user modifies the configuration, the kuadrant-operator should not reconcile back the changes. | ||
|
|
||
| If the user changes are below the minimum spec that we state, this should be reflected in the status. | ||
| If the user changes the configuration at all it should be reflected in the status. |
There was a problem hiding this comment.
I wonder can we just generalise this in some way. I think given the user is making a change we don't have to say "its below minimum" I think we just want to reflect "custom configuration applied to limitador" for example ?
There was a problem hiding this comment.
If it was generalized, does the status lose meaning? I think the detail adds value here.
| If the user changes are below the minimum spec that we state, this should be reflected in the status. | ||
| If the user changes the configuration at all it should be reflected in the status. | ||
| However, when the spec is more than what we regard as minimum the status should be a highlight not warning or error. | ||
| How this is reflected in the status of the Kuadrant CR is unknown currently. |
There was a problem hiding this comment.
Yeah so I think perhaps (see above) we just aim to have a consistent status for when the configuration has been changed (whether below or above expectations) and a status for when it is as expected
| The authorino CR allows the setting of replicas `spec.replicas`. | ||
| This should be set to 2. | ||
| The user should have the power to modify this number to what they want. | ||
| If the number is less than the minimum we recommend (2), the kuadrant CR should report an error in the status. |
There was a problem hiding this comment.
see above. don't think we need a different status for each just a "config changed status"
rfcs/0000-resilient_deployment.md
Outdated
|
|
||
|
|
||
| ## spec.resilience.rateLimiting | ||
| For `spec.resilience.rateLimiting` we can make use of most features within the limitador CR, but will be requiring modifying the limitador deployment for the `topologySpreadConstraints`. |
There was a problem hiding this comment.
| For `spec.resilience.rateLimiting` we can make use of most features within the limitador CR, but will be requiring modifying the limitador deployment for the `topologySpreadConstraints`. | |
| For `spec.resilience.rateLimiting` we can make use of most features within the limitador CR, but it will be required to modify the limitador deployment for the `topologySpreadConstraints`. |
Address a number of issues raised during the review process Signed-off-by: Jim Fitzpatrick <jfitzpat@redhat.com>
| apiVersion: kuadrant.io/v1beta1 | ||
| kind: Kuadrant | ||
| spec: | ||
| resilience: |
There was a problem hiding this comment.
Given We have observability and mTLS. Should we add an option here for
resilience:
enable: true
I see it is discussed here: Kuadrant/kuadrant-operator#1329 (comment)
There was a problem hiding this comment.
I meant to responded sooner about this. For the enable field I agree to having it there, but what would be the behavior in this case. With in the resilience section there is 3 other fields. Two with are boolean values, and the third being a config object. One of the boolean fields requires the config object. So the question is, if the user only sets enable: true what is expected to happen?
There was a problem hiding this comment.
Given the storage config is needed. I see this as effectively a short cut to say the same thing as
authorization: True
rateLimiting: True
counterStorage: {"config"...}
So when they set the top level enable value to true, we should treat it as though they have set authorization and rateLimiting to true and so expect counterStorage to be explicitly set also: Valid would be:
resilience:
enable: true
counterStorage: {"config"....}
One other note here. Authorino can do more than just authorization so I am not sure we should name this authorization and perhaps the short hand auth is a better option
closes: #117