Skip to content

Latest commit

 

History

History
995 lines (524 loc) · 49.3 KB

File metadata and controls

995 lines (524 loc) · 49.3 KB

API Reference

Packages

grove.io/v1alpha1

Resource Types

AutoScalingConfig

AutoScalingConfig defines the configuration for the horizontal pod autoscaler.

Appears in:

Field Description Default Validation
minReplicas integer MinReplicas is the lower limit for the number of replicas for the target resource.
It will be used by the horizontal pod autoscaler to determine the minimum number of replicas to scale-in to.
maxReplicas integer maxReplicas is the upper limit for the number of replicas to which the autoscaler can scale up.
It cannot be less that minReplicas.
metrics MetricSpec array Metrics contains the specifications for which to use to calculate the
desired replica count (the maximum replica count across all metrics will
be used). The desired replica count is calculated multiplying the
ratio between the target value and the current value by the current
number of pods. Ergo, metrics used must decrease as the pod count is
increased, and vice versa. See the individual metric source types for
more information about how each type of metric must respond.
If not set, the default metric will be set to 80% average CPU utilization.

CliqueStartupType

Underlying type: string

CliqueStartupType defines the order in which each PodClique is started.

Validation:

  • Enum: [CliqueStartupTypeAnyOrder CliqueStartupTypeInOrder CliqueStartupTypeExplicit]

Appears in:

Field Description
CliqueStartupTypeAnyOrder CliqueStartupTypeAnyOrder defines that the cliques can be started in any order. This allows for concurrent starts of cliques.
This is the default CliqueStartupType.
CliqueStartupTypeInOrder CliqueStartupTypeInOrder defines that the cliques should be started in the order they are defined in the PodGang Cliques slice.
CliqueStartupTypeExplicit CliqueStartupTypeExplicit defines that the cliques should be started after the cliques defined in PodClique.StartsAfter have started.

ClusterTopology

ClusterTopology defines the topology hierarchy for the cluster. This resource is immutable after creation.

Field Description Default Validation
apiVersion string grove.io/v1alpha1
kind string ClusterTopology
metadata ObjectMeta Refer to Kubernetes API documentation for fields of metadata.
spec ClusterTopologySpec Spec defines the topology hierarchy specification.

ClusterTopologySpec

ClusterTopologySpec defines the topology hierarchy specification.

Appears in:

Field Description Default Validation
levels TopologyLevel array Levels is an ordered list of topology levels from broadest to narrowest scope.
The order in this list defines the hierarchy (index 0 = broadest level).
This field is immutable after creation.
MaxItems: 7
MinItems: 1

ErrorCode

Underlying type: string

ErrorCode is a custom error code that uniquely identifies an error.

Appears in:

HeadlessServiceConfig

HeadlessServiceConfig defines the config options for the headless service.

Appears in:

Field Description Default Validation
publishNotReadyAddresses boolean PublishNotReadyAddresses if set to true will publish the DNS records of pods even if the pods are not ready.
if not set, it defaults to true.
true

LastError

LastError captures the last error observed by the controller when reconciling an object.

Appears in:

Field Description Default Validation
code ErrorCode Code is the error code that uniquely identifies the error.
description string Description is a human-readable description of the error.
observedAt Time ObservedAt is the time at which the error was observed.

LastOperationState

Underlying type: string

LastOperationState is a string alias for the state of the last operation.

Appears in:

Field Description
Processing LastOperationStateProcessing indicates that the last operation is in progress.
Succeeded LastOperationStateSucceeded indicates that the last operation succeeded.
Error LastOperationStateError indicates that the last operation completed with errors and will be retried.

LastOperationType

Underlying type: string

LastOperationType is a string alias for the type of the last operation.

Appears in:

Field Description
Reconcile LastOperationTypeReconcile indicates that the last operation was a reconcile operation.
Delete LastOperationTypeDelete indicates that the last operation was a delete operation.

PodClique

PodClique is a set of pods running the same image.

Field Description Default Validation
apiVersion string grove.io/v1alpha1
kind string PodClique
metadata ObjectMeta Refer to Kubernetes API documentation for fields of metadata.
spec PodCliqueSpec Spec defines the specification of a PodClique.
status PodCliqueStatus Status defines the status of a PodClique.

PodCliqueRollingUpdateProgress

PodCliqueRollingUpdateProgress provides details about the ongoing rolling update of the PodClique.

Appears in:

Field Description Default Validation
updateStartedAt Time UpdateStartedAt is the time at which the rolling update started.
updateEndedAt Time UpdateEndedAt is the time at which the rolling update ended.
It will be set to nil if the rolling update is still in progress.
podCliqueSetGenerationHash string PodCliqueSetGenerationHash is the PodCliqueSet generation hash corresponding to the PodCliqueSet spec that is being rolled out.
While the update is in progress PodCliqueStatus.CurrentPodCliqueSetGenerationHash will not match this hash. Once the update is complete the
value of this field will be copied to PodCliqueStatus.CurrentPodCliqueSetGenerationHash.
podTemplateHash string PodTemplateHash is the PodClique template hash corresponding to the PodClique spec that is being rolled out.
While the update is in progress PodCliqueStatus.CurrentPodTemplateHash will not match this hash. Once the update is complete the
value of this field will be copied to PodCliqueStatus.CurrentPodTemplateHash.
readyPodsSelectedToUpdate PodsSelectedToUpdate ReadyPodsSelectedToUpdate captures the pod names of ready Pods that are either currently being updated or have been previously updated.

PodCliqueScalingGroup

PodCliqueScalingGroup is the schema to define scaling groups that is used to scale a group of PodClique's. An instance of this custom resource will be created for every pod clique scaling group defined as part of PodCliqueSet.

Field Description Default Validation
apiVersion string grove.io/v1alpha1
kind string PodCliqueScalingGroup
metadata ObjectMeta Refer to Kubernetes API documentation for fields of metadata.
spec PodCliqueScalingGroupSpec Spec is the specification of the PodCliqueScalingGroup.
status PodCliqueScalingGroupStatus Status is the status of the PodCliqueScalingGroup.

PodCliqueScalingGroupConfig

PodCliqueScalingGroupConfig is a group of PodClique's that are scaled together. Each member PodClique.Replicas will be computed as a product of PodCliqueScalingGroupConfig.Replicas and PodCliqueTemplateSpec.Spec.Replicas. NOTE: If a PodCliqueScalingGroupConfig is defined, then for the member PodClique's, individual AutoScalingConfig cannot be defined.

Appears in:

Field Description Default Validation
name string Name is the name of the PodCliqueScalingGroupConfig. This should be unique within the PodCliqueSet.
It allows consumers to give a semantic name to a group of PodCliques that needs to be scaled together.
cliqueNames string array CliqueNames is the list of names of the PodClique's that are part of the scaling group.
replicas integer Replicas is the desired number of replicas for the scaling group at template level.
This allows one to control the replicas of the scaling group at startup.
If not specified, it defaults to 1.
1
minAvailable integer MinAvailable serves two purposes:
Gang Scheduling:
It defines the minimum number of replicas that are guaranteed to be gang scheduled.
Gang Termination:
It defines the minimum requirement of available replicas for a PodCliqueScalingGroup.
Violation of this threshold for a duration beyond TerminationDelay will result in termination of the PodCliqueSet replica that it belongs to.
Default: If not specified, it defaults to 1.
Constraints:
MinAvailable cannot be greater than Replicas.
If ScaleConfig is defined then its MinAvailable should not be less than ScaleConfig.MinReplicas.
1
scaleConfig AutoScalingConfig ScaleConfig is the horizontal pod autoscaler configuration for the pod clique scaling group.
topologyConstraint TopologyConstraint TopologyConstraint defines topology placement requirements for PodCliqueScalingGroup.
Must be equal to or stricter than parent PodCliqueSet constraints.

PodCliqueScalingGroupReplicaRollingUpdateProgress

PodCliqueScalingGroupReplicaRollingUpdateProgress provides details about the rolling update progress of ready replicas of PodCliqueScalingGroup that have been selected for update.

Appears in:

Field Description Default Validation
current integer Current is the index of the PodCliqueScalingGroup replica that is currently being updated.
completed integer array Completed is the list of indices of PodCliqueScalingGroup replicas that have been updated to the latest PodCliqueSet spec.

PodCliqueScalingGroupRollingUpdateProgress

PodCliqueScalingGroupRollingUpdateProgress provides details about the ongoing rolling update of the PodCliqueScalingGroup.

Appears in:

Field Description Default Validation
updateStartedAt Time UpdateStartedAt is the time at which the rolling update started.
updateEndedAt Time UpdateEndedAt is the time at which the rolling update ended.
podCliqueSetGenerationHash string PodCliqueSetGenerationHash is the PodCliqueSet generation hash corresponding to the PodCliqueSet spec that is being rolled out.
While the update is in progress PodCliqueScalingGroupStatus.CurrentPodCliqueSetGenerationHash will not match this hash. Once the update is complete the
value of this field will be copied to PodCliqueScalingGroupStatus.CurrentPodCliqueSetGenerationHash.
updatedPodCliques string array UpdatedPodCliques is the list of PodClique names that have been updated to the latest PodCliqueSet spec.
readyReplicaIndicesSelectedToUpdate PodCliqueScalingGroupReplicaRollingUpdateProgress ReadyReplicaIndicesSelectedToUpdate provides the rolling update progress of ready replicas of PodCliqueScalingGroup that have been selected for update.
PodCliqueScalingGroup replicas that are either pending or unhealthy will be force updated and the update will not wait for these replicas to become ready.
For all ready replicas, one replica is chosen at a time to update, once it is updated and becomes ready, the next ready replica is chosen for update.

PodCliqueScalingGroupSpec

PodCliqueScalingGroupSpec is the specification of the PodCliqueScalingGroup.

Appears in:

Field Description Default Validation
replicas integer Replicas is the desired number of replicas for the PodCliqueScalingGroup.
If not specified, it defaults to 1.
1
minAvailable integer MinAvailable specifies the minimum number of ready replicas required for a PodCliqueScalingGroup to be considered operational.
A PodCliqueScalingGroup replica is considered "ready" when its associated PodCliques have sufficient ready or starting pods.
If MinAvailable is breached, it will be used to signal that the PodCliqueScalingGroup is no longer operating with the desired availability.
MinAvailable cannot be greater than Replicas. If ScaleConfig is defined then its MinAvailable should not be less than ScaleConfig.MinReplicas.
It serves two main purposes:
1. Gang Scheduling: MinAvailable defines the minimum number of replicas that are guaranteed to be gang scheduled.
2. Gang Termination: MinAvailable is used as a lower bound below which a PodGang becomes a candidate for Gang termination.
If not specified, it defaults to 1.
1
cliqueNames string array CliqueNames is the list of PodClique names that are configured in the
matching PodCliqueScalingGroup in PodCliqueSet.Spec.Template.PodCliqueScalingGroupConfigs.

PodCliqueScalingGroupStatus

PodCliqueScalingGroupStatus is the status of the PodCliqueScalingGroup.

Appears in:

Field Description Default Validation
replicas integer Replicas is the observed number of replicas for the PodCliqueScalingGroup.
scheduledReplicas integer ScheduledReplicas is the number of replicas that are scheduled for the PodCliqueScalingGroup.
A replica of PodCliqueScalingGroup is considered "scheduled" when at least MinAvailable number
of pods in each constituent PodClique has been scheduled.
0
availableReplicas integer AvailableReplicas is the number of PodCliqueScalingGroup replicas that are available.
A PodCliqueScalingGroup replica is considered available when all constituent PodClique's have
PodClique.Status.ReadyReplicas greater than or equal to PodClique.Spec.MinAvailable
0
updatedReplicas integer UpdatedReplicas is the number of PodCliqueScalingGroup replicas that correspond with the latest PodCliqueSetGenerationHash. 0
selector string Selector is the selector used to identify the pods that belong to this scaling group.
observedGeneration integer ObservedGeneration is the most recent generation observed by the controller.
lastErrors LastError array LastErrors captures the last errors observed by the controller when reconciling the PodClique.
conditions Condition array Conditions represents the latest available observations of the PodCliqueScalingGroup by its controller.
currentPodCliqueSetGenerationHash string CurrentPodCliqueSetGenerationHash establishes a correlation to PodCliqueSet generation hash indicating
that the spec of the PodCliqueSet at this generation is fully realized in the PodCliqueScalingGroup.
rollingUpdateProgress PodCliqueScalingGroupRollingUpdateProgress RollingUpdateProgress provides details about the ongoing rolling update of the PodCliqueScalingGroup.

PodCliqueSet

PodCliqueSet is a set of PodGangs defining specification on how to spread and manage a gang of pods and monitoring their status.

Field Description Default Validation
apiVersion string grove.io/v1alpha1
kind string PodCliqueSet
metadata ObjectMeta Refer to Kubernetes API documentation for fields of metadata.
spec PodCliqueSetSpec Spec defines the specification of the PodCliqueSet.
status PodCliqueSetStatus Status defines the status of the PodCliqueSet.

PodCliqueSetReplicaRollingUpdateProgress

PodCliqueSetReplicaRollingUpdateProgress captures the progress of a rolling update for a specific PodCliqueSet replica.

Appears in:

Field Description Default Validation
replicaIndex integer ReplicaIndex is the replica index of the PodCliqueSet that is being updated.
updateStartedAt Time UpdateStartedAt is the time at which the rolling update started for this PodCliqueSet replica index.

PodCliqueSetRollingUpdateProgress

PodCliqueSetRollingUpdateProgress captures the progress of a rolling update of the PodCliqueSet.

Appears in:

Field Description Default Validation
updateStartedAt Time UpdateStartedAt is the time at which the rolling update started for the PodCliqueSet.
updateEndedAt Time UpdateEndedAt is the time at which the rolling update ended for the PodCliqueSet.
updatedPodCliqueScalingGroups string array UpdatedPodCliqueScalingGroups is a list of PodCliqueScalingGroup names that have been updated to the desired PodCliqueSet generation hash.
updatedPodCliques string array UpdatedPodCliques is a list of PodClique names that have been updated to the desired PodCliqueSet generation hash.
currentlyUpdating PodCliqueSetReplicaRollingUpdateProgress CurrentlyUpdating captures the progress of the PodCliqueSet replica that is currently being updated.

PodCliqueSetSpec

PodCliqueSetSpec defines the specification of a PodCliqueSet.

Appears in:

Field Description Default Validation
replicas integer Replicas is the number of desired replicas of the PodCliqueSet. 0
template PodCliqueSetTemplateSpec Template describes the template spec for PodGangs that will be created in the PodCliqueSet.

PodCliqueSetStatus

PodCliqueSetStatus defines the status of a PodCliqueSet.

Appears in:

Field Description Default Validation
observedGeneration integer ObservedGeneration is the most recent generation observed by the controller.
conditions Condition array Conditions represents the latest available observations of the PodCliqueSet by its controller.
lastErrors LastError array LastErrors captures the last errors observed by the controller when reconciling the PodCliqueSet.
replicas integer Replicas is the total number of PodCliqueSet replicas created.
updatedReplicas integer UpdatedReplicas is the number of replicas that have been updated to the desired revision of the PodCliqueSet. 0
availableReplicas integer AvailableReplicas is the number of PodCliqueSet replicas that are available.
A PodCliqueSet replica is considered available when all standalone PodCliques within that replica
have MinAvailableBreached condition = False AND all PodCliqueScalingGroups (PCSG) within that replica
have MinAvailableBreached condition = False.
0
hpaPodSelector string Selector is the label selector that determines which pods are part of the PodGang.
PodGang is a unit of scale and this selector is used by HPA to scale the PodGang based on metrics captured for the pods that match this selector.
podGangStatuses PodGangStatus array PodGangStatuses captures the status for all the PodGang's that are part of the PodCliqueSet.
currentGenerationHash string CurrentGenerationHash is a hash value generated out of a collection of fields in a PodCliqueSet.
Since only a subset of fields is taken into account when generating the hash, not every change in the PodCliqueSetSpec will
be accounted for when generating this hash value. A field in PodCliqueSetSpec is included if a change to it triggers
a rolling update of PodCliques and/or PodCliqueScalingGroups.
Only if this value is not nil and the newly computed hash value is different from the persisted CurrentGenerationHash value
then a rolling update needs to be triggerred.
rollingUpdateProgress PodCliqueSetRollingUpdateProgress RollingUpdateProgress represents the progress of a rolling update.

PodCliqueSetTemplateSpec

PodCliqueSetTemplateSpec defines a template spec for a PodGang. A PodGang does not have a RestartPolicy field because the restart policy is predefined: If the number of pods in any of the cliques falls below the threshold, the entire PodGang will be restarted. The threshold is determined by either:

  • The value of "MinReplicas", if specified in the ScaleConfig of that clique, or
  • The "Replicas" value of that clique

Appears in:

Field Description Default Validation
cliques PodCliqueTemplateSpec array Cliques is a slice of cliques that make up the PodGang. There should be at least one PodClique.
cliqueStartupType CliqueStartupType StartupType defines the type of startup dependency amongst the cliques within a PodGang.
If it is not defined then default of CliqueStartupTypeAnyOrder is used.
CliqueStartupTypeAnyOrder Enum: [CliqueStartupTypeAnyOrder CliqueStartupTypeInOrder CliqueStartupTypeExplicit]
priorityClassName string PriorityClassName is the name of the PriorityClass to be used for the PodCliqueSet.
If specified, indicates the priority of the PodCliqueSet. "system-node-critical" and
"system-cluster-critical" are two special keywords which indicate the
highest priorities with the former being the highest priority. Any other
name must be defined by creating a PriorityClass object with that name.
If not specified, the pod priority will be default or zero if there is no default.
headlessServiceConfig HeadlessServiceConfig HeadlessServiceConfig defines the config options for the headless service.
If present, create headless service for each PodGang.
topologyConstraint TopologyConstraint TopologyConstraint defines topology placement requirements for PodCliqueSet.
terminationDelay Duration TerminationDelay is the delay after which the gang termination will be triggered.
A gang is a candidate for termination if number of running pods fall below a threshold for any PodClique.
If a PodGang remains a candidate past TerminationDelay then it will be terminated. This allows additional time
to the kube-scheduler to re-schedule sufficient pods in the PodGang that will result in having the total number of
running pods go above the threshold.
Defaults to 4 hours.
podCliqueScalingGroups PodCliqueScalingGroupConfig array PodCliqueScalingGroupConfigs is a list of scaling groups for the PodCliqueSet.

PodCliqueSpec

PodCliqueSpec defines the specification of a PodClique.

Appears in:

Field Description Default Validation
roleName string RoleName is the name of the role that this PodClique will assume.
podSpec PodSpec Spec is the spec of the pods in the clique.
replicas integer Replicas is the number of replicas of the pods in the clique. It cannot be less than 1.
minAvailable integer MinAvailable serves two purposes:
1. It defines the minimum number of pods that are guaranteed to be gang scheduled.
2. It defines the minimum requirement of available pods in a PodClique. Violation of this threshold will result in termination of the PodGang that it belongs to.
If MinAvailable is not set, then it will default to the template Replicas.
startsAfter string array StartsAfter provides you a way to explicitly define the startup dependencies amongst cliques.
If CliqueStartupType in PodGang has been set to 'CliqueStartupTypeExplicit', then to create an ordered start amongst PodClique's StartsAfter can be used.
A forest of DAG's can be defined to model any start order dependencies. If there are more than one PodClique's defined and StartsAfter is not set for any of them,
then their startup order is random at best and must not be relied upon.
Validations:
1. If a StartsAfter has been defined and one or more cycles are detected in DAG's then it will be flagged as validation error.
2. If StartsAfter is defined and does not identify any PodClique then it will be flagged as a validation error.
autoScalingConfig AutoScalingConfig ScaleConfig is the horizontal pod autoscaler configuration for a PodClique.

PodCliqueStatus

PodCliqueStatus defines the status of a PodClique.

Appears in:

Field Description Default Validation
observedGeneration integer ObservedGeneration is the most recent generation observed by the controller.
lastErrors LastError array LastErrors captures the last errors observed by the controller when reconciling the PodClique.
replicas integer Replicas is the total number of non-terminated Pods targeted by this PodClique.
readyReplicas integer ReadyReplicas is the number of ready Pods targeted by this PodClique. 0
updatedReplicas integer UpdatedReplicas is the number of Pods that have been updated and are at the desired revision of the PodClique. 0
scheduleGatedReplicas integer ScheduleGatedReplicas is the number of Pods that have been created with one or more scheduling gate(s) set.
Sum of ReadyReplicas and ScheduleGatedReplicas will always be <= Replicas.
0
scheduledReplicas integer ScheduledReplicas is the number of Pods that have been scheduled by the kube-scheduler. 0
hpaPodSelector string Selector is the label selector that determines which pods are part of the PodClique.
PodClique is a unit of scale and this selector is used by HPA to scale the PodClique based on metrics captured for the pods that match this selector.
conditions Condition array Conditions represents the latest available observations of the clique by its controller.
currentPodCliqueSetGenerationHash string CurrentPodCliqueSetGenerationHash establishes a correlation to PodCliqueSet generation hash indicating
that the spec of the PodCliqueSet at this generation is fully realized in the PodClique.
currentPodTemplateHash string CurrentPodTemplateHash establishes a correlation to PodClique template hash indicating
that the spec of the PodClique at this template hash is fully realized in the PodClique.
rollingUpdateProgress PodCliqueRollingUpdateProgress RollingUpdateProgress provides details about the ongoing rolling update of the PodClique.

PodCliqueTemplateSpec

PodCliqueTemplateSpec defines a template spec for a PodClique.

Appears in:

Field Description Default Validation
name string Name must be unique within a PodCliqueSet and is used to denote a role.
Once set it cannot be updated.
More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/names#names
labels object (keys:string, values:string) Labels is a map of string keys and values that can be used to organize and categorize
(scope and select) objects. May match selectors of replication controllers
and services.
More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/labels
annotations object (keys:string, values:string) Annotations is an unstructured key value map stored with a resource that may be
set by external tools to store and retrieve arbitrary metadata. They are not
queryable and should be preserved when modifying objects.
More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/annotations
topologyConstraint TopologyConstraint TopologyConstraint defines topology placement requirements for PodClique.
Must be equal to or stricter than parent resource constraints.
spec PodCliqueSpec Specification of the desired behavior of a PodClique.
More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#spec-and-status

PodGangPhase

Underlying type: string

PodGangPhase represents the phase of a PodGang.

Validation:

  • Enum: [Pending Starting Running Failed Succeeded]

Appears in:

Field Description
Pending PodGangPending indicates that the pods in a PodGang have not yet been taken up for scheduling.
Starting PodGangStarting indicates that the pods are bound to nodes by the scheduler and are starting.
Running PodGangRunning indicates that the all the pods in a PodGang are running.
Failed PodGangFailed indicates that one or more pods in a PodGang have failed.
This is a terminal state and is typically used for batch jobs.
Succeeded PodGangSucceeded indicates that all the pods in a PodGang have succeeded.
This is a terminal state and is typically used for batch jobs.

PodGangStatus

PodGangStatus defines the status of a PodGang.

Appears in:

Field Description Default Validation
name string Name is the name of the PodGang.
phase PodGangPhase Phase is the current phase of the PodGang. Enum: [Pending Starting Running Failed Succeeded]
conditions Condition array Conditions represents the latest available observations of the PodGang by its controller.

PodsSelectedToUpdate

PodsSelectedToUpdate captures the current and previous set of pod names that have been selected for update in a rolling update.

Appears in:

Field Description Default Validation
current string Current captures the current pod name that is a target for update.
completed string array Completed captures the pod names that have already been updated.

TopologyConstraint

TopologyConstraint defines topology placement requirements.

Appears in:

Field Description Default Validation
packDomain TopologyDomain PackDomain specifies the topology domain for grouping replicas.
Controls placement constraint for EACH individual replica instance.
Must be one of: region, zone, datacenter, block, rack, host, numa
Example: "rack" means each replica independently placed within one rack.
Note: Does NOT constrain all replicas to the same rack together.
Different replicas can be in different topology domains.
Enum: [region zone datacenter block rack host numa]

TopologyDomain

Underlying type: string

TopologyDomain represents a level in the cluster topology hierarchy.

Appears in:

Field Description
region TopologyDomainRegion represents the region level in the topology hierarchy.
zone TopologyDomainZone represents the zone level in the topology hierarchy.
datacenter TopologyDomainDataCenter represents the datacenter level in the topology hierarchy.
block TopologyDomainBlock represents the block level in the topology hierarchy.
rack TopologyDomainRack represents the rack level in the topology hierarchy.
host TopologyDomainHost represents the host level in the topology hierarchy.
numa TopologyDomainNuma represents the numa level in the topology hierarchy.

TopologyLevel

TopologyLevel defines a single level in the topology hierarchy. Maps a platform-agnostic domain to a platform-specific node label key, allowing workload operators a consistent way to reference topology levels when defining TopologyConstraint's.

Appears in:

Field Description Default Validation
domain TopologyDomain Domain is a platform provider-agnostic level identifier.
Must be one of: region, zone, datacenter, block, rack, host, numa
Enum: [region zone datacenter block rack host numa]
Required: {}
key string Key is the node label key that identifies this topology domain.
Must be a valid Kubernetes label key (qualified name).
Examples: "topology.kubernetes.io/zone", "kubernetes.io/hostname"
MaxLength: 63
MinLength: 1
Pattern: ^(([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9]/)?([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9]$
Required: {}

operator.config.grove.io/v1alpha1

AuthorizerConfig

AuthorizerConfig defines the configuration for the authorizer admission webhook.

Appears in:

Field Description Default Validation
enabled boolean Enabled indicates whether the authorizer is enabled.
exemptServiceAccountUserNames string array ExemptServiceAccountUserNames is a list of service account usernames that are exempt from authorizer checks.
Each service account username name in ExemptServiceAccountUserNames should be of the following format:
system:serviceaccount::. ServiceAccounts are represented in this
format when checking the username in authenticationv1.UserInfo.Name.

CertProvisionMode

Underlying type: string

CertProvisionMode defines how webhook certificates are provisioned.

Validation:

  • Enum: [auto manual]

Appears in:

Field Description
auto CertProvisionModeAuto enables automatic certificate generation and management via cert-controller.
cert-controller automatically generates self-signed certificates and stores them in the Secret.
manual CertProvisionModeManual expects certificates to be provided externally (e.g., by cert-manager, cluster admin).

ClientConnectionConfiguration

ClientConnectionConfiguration defines the configuration for constructing a client.

Appears in:

Field Description Default Validation
qps float QPS controls the number of queries per second allowed for this connection.
burst integer Burst allows extra queries to accumulate when a client is exceeding its rate.
contentType string ContentType is the content type used when sending data to the server from this client.
acceptContentTypes string AcceptContentTypes defines the Accept header sent by clients when connecting to the server,
overriding the default value of 'application/json'. This field will control all connections
to the server used by a particular client.

ControllerConfiguration

ControllerConfiguration defines the configuration for the controllers.

Appears in:

Field Description Default Validation
podCliqueSet PodCliqueSetControllerConfiguration PodCliqueSet is the configuration for the PodCliqueSet controller.
podClique PodCliqueControllerConfiguration PodClique is the configuration for the PodClique controller.
podCliqueScalingGroup PodCliqueScalingGroupControllerConfiguration PodCliqueScalingGroup is the configuration for the PodCliqueScalingGroup controller.

DebuggingConfiguration

DebuggingConfiguration defines the configuration for debugging.

Appears in:

Field Description Default Validation
enableProfiling boolean EnableProfiling enables profiling via host:port/debug/pprof/ endpoints.

LeaderElectionConfiguration

LeaderElectionConfiguration defines the configuration for the leader election.

Appears in:

Field Description Default Validation
enabled boolean Enabled specifies whether leader election is enabled. Set this
to true when running replicated instances of the operator for high availability.
leaseDuration Duration LeaseDuration is the duration that non-leader candidates will wait
after observing a leadership renewal until attempting to acquire
leadership of the occupied but un-renewed leader slot. This is effectively the
maximum duration that a leader can be stopped before it is replaced
by another candidate. This is only applicable if leader election is
enabled.
renewDeadline Duration RenewDeadline is the interval between attempts by the acting leader to
renew its leadership before it stops leading. This must be less than or
equal to the lease duration.
This is only applicable if leader election is enabled.
retryPeriod Duration RetryPeriod is the duration leader elector clients should wait
between attempting acquisition and renewal of leadership.
This is only applicable if leader election is enabled.
resourceLock string ResourceLock determines which resource lock to use for leader election.
This is only applicable if leader election is enabled.
resourceName string ResourceName determines the name of the resource that leader election
will use for holding the leader lock.
This is only applicable if leader election is enabled.
resourceNamespace string ResourceNamespace determines the namespace in which the leader
election resource will be created.
This is only applicable if leader election is enabled.

LogFormat

Underlying type: string

LogFormat defines the format of the log.

Appears in:

Field Description
json LogFormatJSON is the JSON log format.
text LogFormatText is the text log format.

LogLevel

Underlying type: string

LogLevel defines the log level.

Appears in:

Field Description
debug DebugLevel is the debug log level, i.e. the most verbose.
info InfoLevel is the default log level.
error ErrorLevel is a log level where only errors are logged.

NetworkAcceleration

NetworkAcceleration defines the configuration for network acceleration features.

Appears in:

Field Description Default Validation
autoMNNVLEnabled boolean AutoMNNVLEnabled indicates whether automatic MNNVL (Multi-Node NVLink) support is enabled.
When enabled, the operator will automatically create and manage ComputeDomain resources
for GPU workloads. If the cluster doesn't have the NVIDIA DRA driver installed,
the operator will exit with a non-zero exit code.
Default: false

PodCliqueControllerConfiguration

PodCliqueControllerConfiguration defines the configuration for the PodClique controller.

Appears in:

Field Description Default Validation
concurrentSyncs integer ConcurrentSyncs is the number of workers used for the controller to concurrently work on events.

PodCliqueScalingGroupControllerConfiguration

PodCliqueScalingGroupControllerConfiguration defines the configuration for the PodCliqueScalingGroup controller.

Appears in:

Field Description Default Validation
concurrentSyncs integer ConcurrentSyncs is the number of workers used for the controller to concurrently work on events.

PodCliqueSetControllerConfiguration

PodCliqueSetControllerConfiguration defines the configuration for the PodCliqueSet controller.

Appears in:

Field Description Default Validation
concurrentSyncs integer ConcurrentSyncs is the number of workers used for the controller to concurrently work on events.

SchedulerConfiguration

SchedulerConfiguration configures scheduler profiles and which is the default.

Appears in:

Field Description Default Validation
profiles SchedulerProfile array Profiles is the list of scheduler profiles. Each profile has a backend name and optional config.
The kube-scheduler backend is always enabled; use profile name "kube-scheduler" to configure or set it as default.
Valid profile names: "kube-scheduler", "kai-scheduler". Use defaultProfileName to designate the default backend. If not set, defaulting sets it to "kube-scheduler".
defaultProfileName string DefaultProfileName is the name of the default scheduler profile. If unset, defaulting sets it to "kube-scheduler".

SchedulerName

Underlying type: string

SchedulerName defines the name of the scheduler backend (used in OperatorConfiguration scheduler.profiles[].name).

Appears in:

Field Description
kai-scheduler SchedulerNameKai is the KAI scheduler backend.
kube-scheduler SchedulerNameKube is the profile name for the Kubernetes default scheduler in OperatorConfiguration.

SchedulerProfile

SchedulerProfile defines a scheduler backend profile with optional backend-specific config.

Appears in:

Field Description Default Validation
name SchedulerName Name is the scheduler profile name. Valid values: "kube-scheduler", "kai-scheduler".
For the Kubernetes default scheduler use "kube-scheduler"; Pod.Spec.SchedulerName will be set to "default-scheduler".
Enum: [kai-scheduler kube-scheduler]
Required: {}
config RawExtension Config holds backend-specific options. The operator unmarshals it into the config type for this backend (see backend config types).

Server

Server contains information for HTTP(S) server configuration.

Appears in:

Field Description Default Validation
bindAddress string BindAddress is the IP address on which to listen for the specified port.
port integer Port is the port on which to serve requests.

ServerConfiguration

ServerConfiguration defines the configuration for the HTTP(S) servers.

Appears in:

Field Description Default Validation
webhooks WebhookServer Webhooks is the configuration for the HTTP(S) webhook server.
healthProbes Server HealthProbes is the configuration for serving the healthz and readyz endpoints.
metrics Server Metrics is the configuration for serving the metrics endpoint.

TopologyAwareSchedulingConfiguration

TopologyAwareSchedulingConfiguration defines the configuration for topology-aware scheduling.

Appears in:

Field Description Default Validation
enabled boolean Enabled indicates whether topology-aware scheduling is enabled.
levels TopologyLevel array Levels is an ordered list of topology levels from broadest to narrowest scope.
Used to create/update the TopologyAwareScheduling CR at operator startup.

WebhookServer

WebhookServer defines the configuration for the HTTP(S) webhook server.

Appears in:

Field Description Default Validation
bindAddress string BindAddress is the IP address on which to listen for the specified port.
port integer Port is the port on which to serve requests.
serverCertDir string ServerCertDir is the directory containing the server certificate and key.
secretName string SecretName is the name of the Kubernetes Secret containing webhook certificates.
The Secret must contain tls.crt, tls.key, and ca.crt.
grove-webhook-server-cert
certProvisionMode CertProvisionMode CertProvisionMode controls how webhook certificates are provisioned. auto Enum: [auto manual]