|
| 1 | +--- |
| 2 | +layout: blog |
| 3 | +title: 'Kubernetes v1.31: Accelerating Cluster Performance with Consistent Reads from Cache' |
| 4 | +date: 2024-08-15 |
| 5 | +slug: consistent-read-from-cache-beta |
| 6 | +author: > |
| 7 | + Marek Siarkowicz (Google) |
| 8 | +--- |
| 9 | + |
| 10 | +Kubernetes is renowned for its robust orchestration of containerized applications, |
| 11 | +but as clusters grow, the demands on the control plane can become a bottleneck. |
| 12 | +A key challenge has been ensuring strongly consistent reads from the etcd datastore, |
| 13 | +requiring resource-intensive quorum reads. |
| 14 | + |
| 15 | +Today, the Kubernetes community is excited to announce a major improvement: |
| 16 | +_consistent reads from cache_, graduating to Beta in Kubernetes v1.31. |
| 17 | + |
| 18 | +### Why consistent reads matter |
| 19 | + |
| 20 | +Consistent reads are essential for ensuring that Kubernetes components have an accurate view of the latest cluster state. |
| 21 | +Guaranteeing consistent reads is crucial for maintaining the accuracy and reliability of Kubernetes operations, |
| 22 | +enabling components to make informed decisions based on up-to-date information. |
| 23 | +In large-scale clusters, fetching and processing this data can be a performance bottleneck, |
| 24 | +especially for requests that involve filtering results. |
| 25 | +While Kubernetes can filter data by namespace directly within etcd, |
| 26 | +any other filtering by labels or field selectors requires the entire dataset to be fetched from etcd and then filtered in-memory by the Kubernetes API server. |
| 27 | +This is particularly impactful for components like the kubelet, |
| 28 | +which only needs to list pods scheduled to its node - but previously required the API Server and etcd to process all pods in the cluster. |
| 29 | + |
| 30 | +### The breakthrough: Caching with confidence |
| 31 | + |
| 32 | +Kubernetes has long used a watch cache to optimize read operations. |
| 33 | +The watch cache stores a snapshot of the cluster state and receives updates through etcd watches. |
| 34 | +However, until now, it couldn't serve consistent reads directly, as there was no guarantee the cache was sufficiently up-to-date. |
| 35 | + |
| 36 | +The _consistent reads from cache_ feature addresses this by leveraging etcd's |
| 37 | +[progress notifications](https://etcd.io/docs/v3.5/dev-guide/interacting_v3/#watch-progress) |
| 38 | +mechanism. |
| 39 | +These notifications inform the watch cache about how current its data is compared to etcd. |
| 40 | +When a consistent read is requested, the system first checks if the watch cache is up-to-date. |
| 41 | +If the cache is not up-to-date, the system queries etcd for progress notifications until it's confirmed that the cache is sufficiently fresh. |
| 42 | +Once ready, the read is efficiently served directly from the cache, |
| 43 | +which can significantly improve performance, |
| 44 | +particularly in cases where it would require fetching a lot of data from etcd. |
| 45 | +This enables requests that filter data to be served from the cache, |
| 46 | +with only minimal metadata needing to be read from etcd. |
| 47 | + |
| 48 | +**Important Note:** To benefit from this feature, your Kubernetes cluster must be running etcd version 3.4.31+ or 3.5.13+. |
| 49 | +For older etcd versions, Kubernetes will automatically fall back to serving consistent reads directly from etcd. |
| 50 | + |
| 51 | +### Performance gains you'll notice |
| 52 | + |
| 53 | +This seemingly simple change has a profound impact on Kubernetes performance and scalability: |
| 54 | + |
| 55 | +* **Reduced etcd Load:** Kubernetes v1.31 can offload work from etcd, |
| 56 | + freeing up resources for other critical operations. |
| 57 | +* **Lower Latency:** Serving reads from cache is significantly faster than fetching |
| 58 | + and processing data from etcd. This translates to quicker responses for components, |
| 59 | + improving overall cluster responsiveness. |
| 60 | +* **Improved Scalability:** Large clusters with thousands of nodes and pods will |
| 61 | + see the most significant gains, as the reduction in etcd load allows the |
| 62 | + control plane to handle more requests without sacrificing performance. |
| 63 | + |
| 64 | +**5k Node Scalability Test Results:** In recent scalability tests on 5,000 node |
| 65 | +clusters, enabling consistent reads from cache delivered impressive improvements: |
| 66 | + |
| 67 | +* **30% reduction** in kube-apiserver CPU usage |
| 68 | +* **25% reduction** in etcd CPU usage |
| 69 | +* **Up to 3x reduction** (from 5 seconds to 1.5 seconds) in 99th percentile pod LIST request latency |
| 70 | + |
| 71 | +### What's next? |
| 72 | + |
| 73 | +With the graduation to beta, consistent reads from cache are enabled by default, |
| 74 | +offering a seamless performance boost to all Kubernetes users running a supported |
| 75 | +etcd version. |
| 76 | + |
| 77 | +Our journey doesn't end here. Kubernetes community is actively exploring |
| 78 | +pagination support in the watch cache, which will unlock even more performance |
| 79 | +optimizations in the future. |
| 80 | + |
| 81 | +### Getting started |
| 82 | + |
| 83 | +Upgrading to Kubernetes v1.31 and ensuring you are using etcd version 3.4.31+ or |
| 84 | +3.5.13+ is the easiest way to experience the benefits of consistent reads from |
| 85 | +cache. |
| 86 | +If you have any questions or feedback, don't hesitate to reach out to the Kubernetes community. |
| 87 | + |
| 88 | +**Let us know how** _consistent reads from cache_ **transforms your Kubernetes experience!** |
| 89 | + |
| 90 | +Special thanks to @ah8ad3 and @p0lyn0mial for their contributions to this feature! |
0 commit comments