A simple yet comprehensive tool to automatically delete and report on Nutanix Kubernetes Platform (NKP) clusters that do not meet a specific criteria. Useful for cleaning up resources and managing costs in a lab/demo environment, similar to common "cloud cleaner" tools. Available as a NKP Catalog Application, Helm Chart and Container Image.
- 🚀 Simple "one-click" catalog installation and tight integration with NKP features
- 📋 Flexible rulesets and custom criteria
- 🔔 Notifications (Slack currently supported, more to come)
- 📈 Trend analysis, compliance monitoring and historical data tracking
- 📊 Built-in web dashboard and administration console
- 🔥 Prometheus metrics, NKP monitoring integration and Grafana dashboard
- 🖥️ Also runs as a standalone console application
Note
This is a personal project and is not supported/endorsed by, or otherwise connected to Nutanix
See the documentation at docs/nkp.md for details on how to deploy the application as a NKP catalog application, running inside the NKP Management Cluster itself. This is the recommended way to run the application as it includes the scheduled tasks, web interface and analytics with no further configuration needed.
You can however run the application from a Docker container or direct from the CLI. These options are discussed below.
- Any cluster without an
expires
label will be deleted. - Any cluster that is older than the value specified in the
expires
label will be deleted.- This label takes values of the format
n<unit>
, where unit is one of:h
- Hoursd
- Daysw
- Weeksy
- Years
- For example,
12h
,2d
,1w
,1y
- This label takes values of the format
- A set of additional labels and acceptable regex patterns can be provided. Any cluster without matching labels will be deleted.
- For example, the default Helm Chart configuration defines a required
owner
label. Any cluster without anowner
label will be deleted.
- For example, the default Helm Chart configuration defines a required
Note
The default for both the CLI tool and the NKP Application is to run in "dry-run" mode, and will just show what would be deleted. To actually delete the clusters you must pass in the --delete
flag to the delete-clusters
command, or explicitly enable the deletion.delete
value in the Helm chart / NKP application.
The management cluster is always excluded from deletion, and a configuration file can be provided that accepts a list of regex-based namespaces or cluster names that will be excluded. For example:
excluded_namespace_patterns:
- ^default$
- .*-prod$
protected_cluster_patterns:
- ^production-.*
- .*-prod$
- critical-.*
In addition to the core expires
label, any number of additional required labels can be defined in the configuration file. These are defined as a list with the following keys:
name
: Name of the labeldescription
: Optional description of the labelregex
: Optional regex to validate the label value against. If omitted, any value is accepted.
A cluster will be marked for deletion if:
- Any of these labels is not present,
- Or is in an incorrect format.
Some examples provided in the example configuration file:
extra_labels:
# Cluster owner. Note no regex is provided, so any value is accepted.
- description: Cluster owner identifier
name: owner
# A numeric cost centre ID
- description: Numeric cost centre ID
name: cost_centre
regex: "^([0-9]+)$"
# A project ID
- description: Project identifier (alphanumeric with hyphens)
name: project
regex: "^[a-zA-Z0-9-]+$"
# An environment type which must be one of 4 values
- description: Environment type
name: environment
regex: "^(dev|test|staging|prod)$"
These can also be viewed in the Web UI, along with the other matching rules and list of clusters scheduled for deletion.
Although the preferred method of deployment and configuration is as an NKP application, you can still run the tool from the CLI or container image:
Usage: nkp-cluster-cleaner [OPTIONS] COMMAND [ARGS]...
NKP Cluster Cleaner - Delete CAPI clusters based on label criteria.
Options:
--version Show the version and exit.
--help Show this message and exit.
Commands:
collect-analytics Collect analytics snapshot for historical tracking...
delete-clusters Delete CAPI clusters that match deletion criteria.
generate-config Generate an example configuration file.
list-clusters List CAPI clusters that match deletion criteria.
notify Send notifications for clusters approaching deletion.
serve Start the web server for the cluster cleaner UI.
-
You must pass in a valid
kubeconfig
context with admin privileges to the NKP management cluster. This can be done by e.g. setting theKUBECONFIG
environment variable or using the--kubeconfig
parameter to commands. -
To pass in a custom configuration file, use the
--config /path/to/config.yaml
argument to any command. A sample configuration file can be created withnkp-cluster-cleaner generate-config /path/to/config.yaml
.
As this tool is intended to be used as a container inside a Kubernetes deployment, you can pass configuration values using environment variables as well as the CLI flags documented with the --help
flag.
Each variable accepted is simply the flag name, converted to uppercase and with dash characters changed to underscores. For example:
CLI flag example | Environment variable equivalent |
---|---|
--config |
CONFIG |
--critical-threshold |
CRITICAL_THRESHOLD |
--slack-icon-emoji |
SLACK_ICON_EMOJI |
There is a bundled web interface that displays the cluster deletion status, protection rules, analytics and general configuration. Start the built-in Flask-based webserver with the serve
command that takes the usual arguments to specify port and bind host etc:
Usage: nkp-cluster-cleaner serve [OPTIONS]
Start the web server for the cluster cleaner UI.
Options:
--config PATH Path to configuration file for protection rules
--kubeconfig PATH Path to kubeconfig file (default: ~/.kube/config or
$KUBECONFIG)
--host TEXT Host to bind to (default: 127.0.0.1)
--port INTEGER Port to bind to (default: 8080)
--debug Enable debug mode
--prefix TEXT URL prefix for all routes (e.g., /foo for
/foo/clusters)
--redis-password TEXT Redis password for authentication
--redis-username TEXT Redis username for authentication
--redis-db INTEGER Redis database number (default: 0)
--redis-port INTEGER Redis port (default: 6379)
--redis-host TEXT Redis host (default: redis)
--no-redis Do not connect to Redis and disable notification
history/analytics
--help Show this message and exit.
The NKP Cluster Cleaner includes an analytics dashboard that provides historical tracking, trends analysis, and reporting capabilities. It uses a Redis-based data collector that creates periodic snapshots of cluster state.
Data is collected by running nkp-cluster-cleaner collect-analytics
. If you deploy the NKP application or Helm Chart into your cluster, it will automatically configure a CronJob to collect data, along with a bundled Valkey server.
Historical data is stored with a configurable retention period. The default is to store data for 90 days, but you can change this by passing the --keep-days
argument to the collect-analytics
command.
This tool makes use of a Redis datastore for analytics data, and for tracking notifications. If you deploy from the Helm chart or NKP application, this is automatically deployed and configured for you. If you want to make use of an alternative Redis/Valkey service or are not using the Helm chart, you can provide connection details when running various commands:
Argument | Type | Description |
---|---|---|
--redis-host |
TEXT | Redis host (default: redis) |
--redis-port |
INTEGER | Redis port (default: 6379) |
--redis-db |
INTEGER | Redis database number (default: 0) |
--redis-password |
TEXT | Redis password for authentication |
--redis-username |
TEXT | Redis username for authentication |
Prometheus metrics for all collected analytics data is exposed under the /metrics
endpoint. A ServiceMonitor can be created using the Helm chart for automatic discovery and incorporation of data into the Prometheus stack used by NKP.
A sample Grafana dashboard is provided that can be integrated into the NKP Grafana stack. For more information, see the Helm Chart documentation.
- The container image can be pulled from GitHub Container Registry:
ghcr.io/markround/nkp-cluster-cleaner:<TAG>
- Available tags are:
- Branch (e.g.
main
,feature/xxx
, etc.) - Release tag (e.g.
0.14.0
) - Latest released version (e.g.
latest
)
- Branch (e.g.
- Full list on the packages page
The ENTRYPOINT
for the container is the application itself, so you only need to pass in the arguments. Any additional configuration files can be provided as volume mounts. For example, to list clusters with a custom configuration file and your default kubeconfig
you'd run something like:
docker run --rm \
-v ~/.kube/config:/app/config/kubeconfig:ro \
-v ./my-config.yaml:/app/config/config.yaml:ro \
ghcr.io/markround/nkp-cluster-cleaner:latest \
list-clusters \
--kubeconfig /app/config/kubeconfig \
--config /app/config/config.yaml
pip install -r requirements.txt
pip install -e .