As an agile open source project, Kubernetes continues to evolve, as does the cloud computing landscape. Keeping up with the latest versions isn’t practical for many organizations, and there are good reasons to not keep up with the very latest version, particularly in the first few weeks after a release. Nevertheless, it’s not a great idea to get too far behind, not only because you may miss out on important security, compatibility, and performance updates, but also because support for older versions ends. If you’re using Amazon Elastic Kubernetes Service (EKS), for example, standard support for 1.24 ended January 31, 2024, 1.25 ends May 1, 2024, followed by 1.26 on June 11, and 1.27 on July 24.
EKS is a managed Kubernetes service from Amazon that many organizations use to deploy, manage, and scale containerized applications. This guide walks through the steps you’ll need to take to upgrade your EKS clusters. It includes guidance on when and how to complete these upgrades as well as tools that can make it easier for you to upgrade safely and securely.
The Kubernetes community follows an N-2 support policy, which means that they provide security fixes and bug patches for the three most recent minor versions. They release new minor versions about three times a year, and minor versions are usually under standard support (for Amazon EKS) for the first 14 months after release. After that, the minor release enters extended support for the next 12 months (at an additional cost per cluster hour). Once the extended support period ends, your EKS cluster is automatically upgraded to the oldest currently supported extended version. This scenario is far from ideal as it gives you little option regarding your upgrade process and you’ll still need to upgrade to a newer release in the near future.
Here’s the schedule for the next few Kubernetes versions:
Source: Amazon EKS Kubernetes versions
For most organizations, expect to assess new Kubernetes releases regularly. Many teams manage multiple versions in different environments. For example, you may test out a new version in your development environment for at least a week or two, and follow that process for test and staging environments. Before pushing a new version to production, make sure you have at least a week of data, so you know you won’t run into unexpected snags on go-live.
Each Kubernetes version includes the control plane and the data plane; make sure that both your control plane and your data plane are running the same Kubernetes minor version. Kubernetes allows some skew between versions, but support varies by Kubernetes component and for different cluster development tools.
For upgrade purposes, this is the correct order for upgrades. However, we recommend that your dev/stage/test clusters all look as close to production as possible for typical day-to-day operations.
You’ll want to upgrade your development environment first. This ensures you are keeping up with the latest K8s updates. If you encounter critical issues with the latest version, you can identify problems quickly and find solutions before pushing the latest EKS version to staging.
The next environment to upgrade is often your staging environment. This is where any remaining issues that haven’t been fixed in the development environment should be caught. This is the last step before production, so it’s often best to allow a “soak time” for changes here — we allow one to two weeks.
The goal is to keep your production version aligned as closely to staging as possible. This makes your developers’ lives easier because they don’t need to worry about maintaining code for too many versions. After the agreed upon “soak time,” there should be very little risk in upgrading the production environment, so upgrade it promptly. Don’t fall into the trap of not completing the upgrade cycle because you’re worried about moving it to production.
Some practitioners recommend not installing the latest minor version until patch .2. In other words, they might suggest waiting to install the latest Kubernetes version, such as 1.30.0, until 1.30.2 is available. From there, you can begin the upgrade process, moving from dev to staging and then to production. This recommendation stems from years of experience — by the .2 version, extensive testing is complete and major issues have already been discovered and resolved. Often, once you have completed the dev upgrade and rolled it out to staging, the .3 release is available.
EKS customers are responsible for initiating upgrades for the cluster control plane and data plane. While AWS handles control plane upgrades, you are responsible for the data plane, including Fargate pods and add-ons.
EKS supports in-place cluster upgrades, which preserves resource and configuration consistency. It minimizes user disruption and retains information about existing workloads and resources. You can only upgrade one minor version at a time.
If you need to make multiple version updates, you’ll have to do sequential upgrades. This approach can increase the risk of downtime. Consider evaluating a blue/green cluster upgrade strategy in this case, where one environment (blue) runs the current Kubernetes version and another environment (green) runs the new Kubernetes version.
AWS manages the EKS control plane upgrade process to ensure a seamless transition from one Kubernetes version to the next. These are the steps AWS goes through to upgrade the EKS control plane:
To upgrade an EKS cluster, we recommend you go through the following steps:
EKS Kubernetes version documentation provides a detailed list of changes for each version, which you should use to build a checklist for each upgrade. For guidance on specific EKS version upgrades, check the documentation to identify important changes and considerations for each version.
Before you begin a cluster upgrade, make sure you understand what versions of Kubernetes components are in use. Inventory your cluster components, particularly the ones that interact with the Kubernetes API directly. Your typical cluster includes multiple workloads that rely on the Kubernetes API, which provide important functionalities. These cluster components typically include:
Make sure you check for any other workloads or add-ons that interact directly with the Kubernetes API. You can sometimes identify critical cluster components by looking at namespaces that end in *-system. Next, refer to the documentation of those critical components to evaluate version compatibility and whether there are any prerequisites for upgrading. Some components may require you to make updates or adjust your configuration before you upgrade your cluster.
Here are some common add-ons (linked to upgrade documentation):
Some add-ons, such as the VPC CNI plugin and kube-proxy, can be installed via Amazon EKS Add-ons, which provides an alternative to add-on management through the EKS API. You might consider managing those addons this way, as this approach enables you to update add-on versions with a single command. For example:
aws eks update-addon —cluster-name my-cluster —addon-name vpc-cni —addon-version version-number \
--service-account-role-arn arn:aws:iam::111122223333:role/role-name —configuration-values '{}' —resolve-conflicts PRESERVE
To check whether you have any EKS Add-ons, type:
aws eks list-addons --cluster-name <cluster name> --output table
You’ll see output that looks similar to this:
— — — — — — — — —
| ListAddons |
+----------------+
|| addons ||
|+--------------+|
|| coredns ||
|| kube-proxy ||
|| vpc-cni ||
|+--------------+|
Note: Amazon does not automatically upgrade EKS Add-ons during a control plane upgrade. You must initiate EKS add-on updates and select the version you want to update to. Make sure you pick a compatible version from all available versions using this guidance on add-on version compatibility. Remember, you can only upgrade Amazon EKS Add-ons one minor version at a time.
AWS requires several specific resources in your account to upgrade a control plane, including:
Make sure your subnets have enough IP addresses to upgrade the cluster:
CLUSTER=<cluster name>
aws ec2 describe-subnets --subnet-ids \
$(aws eks describe-cluster --name ${CLUSTER} \
--query 'cluster.resourcesVpcConfig.subnetIds' \
--output text) \
--query 'Subnets[*].[SubnetId,AvailabilityZone,AvailableIpAddressCount]' \
--output table
----------------------------------------------------
| DescribeSubnets |
+---------------------------+--------------+-------+
| subnet-0ce25bacdb030ce4f | us-west-2a | 8136 |
| subnet-0c173097d592e96e4 | us-west-2c | 8051 |
| subnet-06a36d93ad471d420 | us-west-2b | 8127 |
+---------------------------+--------------+-------+
(You can use the VPC CNI Metrics Helper to create a CloudWatch dashboard for virtual private cloud (VPC) metrics.)
The cloud native ecosystem continues to expand and mature, so it’s unsurprising that there are a lot of open source tools available to help teams navigate Kubernetes upgrades. Here are a few options you can use to help you with your EKS upgrades, with some examples and descriptions.
Pluto is an open source tool from Fairwinds that looks for the use of deprecated apiVersions. Pluto supports scanning a live cluster, manifest files, and helm charts. It also provides a GitHub Action that you can include in your CI process. Pluto will tell you whether you can upgrade safely against API paths, checking to see whether you are calling deprecated or removed API paths in your configuration or Helm charts. You can run Pluto against local files using the command:
pluto detect-files
You can also check Helm using the command:
pluto detect-helm -owide
It’s pretty easy to add this to CI; this is helpful for people who manage many clusters.
$ pluto detect-all-in-cluster -o wide 2>/dev/null
NAME NAMESPACE KIND VERSION REPLACEMENT DEPRECATED DEPRECATED IN REMOVED REMOVED IN
testing/viahelm viahelm Ingress networking.k8s.io/v1beta1 networking.k8s.io/v1 true v1.19.0 true v1.22.0
webapp default Ingress networking.k8s.io/v1beta1 networking.k8s.io/v1 true v1.19.0 true v1.22.0
eks.privileged PodSecurityPolicy policy/v1beta1 true v1.21.0 false v1.25.0
This combines all available in-cluster detections, showing results from Helm releases and API resources.
NAME KIND VERSION REPLACEMENT REMOVED DEPRECATED REPL AVAIc
eks.privileged PodSecurityPolicy policy/v1beta1 false true true
Once you identify which workloads and manifests need updating, you may find that you need to change the resource version in your manifest files (for example, change networking.k8s.io/v1beta1 to networking.k8s.io/v1). This may require you to update the resource specification as well. You may need to do additional research, depending on which resource you are replacing.
If a resource type is remaining the same and only the API version needs to be updated, you can use the kubectl-convert
command to convert your manifest files automatically. For example, if you want to convert an older Deployment to apps/v1, type the command:
kubectl-convert -f <file> --output-version <group>/<version>
Refer to install kubectl convert plugin on the Kubernetes website if you would like more information.
Nova is another open source utility from Fairwinds that helps you check your Helm releases to see if there are upgrades needed. Typically, the CNI and other dependencies are installed with Helm. Nova is a fast method you can use to ensure you are running the latest version. As always, check the patch notes to verify support for the version you are targeting.
Install the golang binary and run it against your cluster.
$ go install github.com/fairwindsops/nova@latest
$ nova find
Release Name Installed Latest Old Deprecated
============ ========= ====== === ==========
cert-manager v0.11.0 v0.15.2 true false
insights-agent 0.21.0 0.21.1 true false
grafana 2.1.3 3.1.1 true false
metrics-server 2.8.8 2.11.1 true false
nginx-ingress 1.25.0 1.40.3 true false
To check for outdated container images instead of helm releases:
$ nova find --container
Container Name Current Version Old Latest Latest Minor Latest Patch
============== =============== === ====== ============= =============
k8s.gcr.io/coredns/coredns v1.8.0 true v1.8.6 v1.8.6 v1.8.6
k8s.gcr.io/etcd 3.4.13-0 true 3.5.3-0 3.4.13-0 3.4.13-0
k8s.gcr.io/kube-apiserver v1.21.1 true v1.23.6 v1.23.6 v1.21.12
k8s.gcr.io/kube-controller-manager v1.21.1 true v1.23.6 v1.23.6 v1.21.12
k8s.gcr.io/kube-proxy v1.21.1 true v1.23.6 v1.23.6 v1.21.12
k8s.gcr.io/kube-scheduler v1.21.1 true v1.23.6 v1.23.6 v1.21.12
Officially called KubePug/Deprecations
, this open source tool is designed to help users evaluate the health and performance of their K8s clusters. It functions as a kubectl plugin and includes these capabilities:
kubeconfig
or the active cluster.Run the following command to install kubepug as a Krew plugin:
kubectl krew install deprecations
eksup is a command line interface (CLI) that is designed to provide users with comprehensive information and tools to prepare clusters for an upgrade. It can help streamline the upgrade process by providing relevant insights and actions.
A CLI to aid in upgrading Amazon EKS clusters
Usage: eksup <COMMAND>
Commands:
analyze Analyze an Amazon EKS cluster for potential upgrade issues
create Create artifacts using the analysis data
help Print this message or the help of the given subcommand(s)
Options:
-h, --help Print help
-V, --version Print version
GoNoGo is another open source tool from Fairwinds. It helps you define and discover whether an add-on installed with Helm is safe to upgrade.
gonogo --help
The Kubernetes Add-On Upgrade Validation Bundle is a spec that can be used to define and then discover if an add-on upgrade is safe to perform.
Usage:
gonogo [flags]
gonogo [command]
Available Commands:
check Check for Helm releases that can be updated
completion Generate the autocompletion script for the specified shell
help Help about any command
version Prints the current version of the tool.
Flags:
-h, --help help for gonogo
-v, --v Level number for the log level verbosity
Use "gonogo [command] --help" for more information about a command.
Another community supported open source tool you can use is Velero, which enables you to create backups of existing clusters and then apply the backups to a new cluster. AWS resources, including IAM, are not included in a Velero backup, so you will need to recreate them.
PodDisruptionBudgets
and topologySpreadConstraints
To make sure your workloads remain available during a data plane upgrade, you need to configure PodDisruptionBudgets
and topologySpreadConstraints
appropriately. Keep in mind that not all workloads demand the same level of availability, so assess your workload’s scale and requirements.
If workloads are distributed across multiple Availability Zones and hosts with topology spreads, that improves the likelihood that migrations to the new data plane will happen without disruptions.
This is an example of a workload configuration that guarantees 80% of replicas are consistently available, spreading replicas across zones and hosts efficiently:
# Source: basic-demo/templates/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: demo-basic-demo
labels:
app.kubernetes.io/name: basic-demo
app.kubernetes.io/instance: demo
spec:
selector:
matchLabels:
app.kubernetes.io/name: basic-demo
app.kubernetes.io/instance: demo
template:
metadata:
labels:
app.kubernetes.io/name: basic-demo
app.kubernetes.io/instance: demo
spec:
topologySpreadConstraints:
- maxSkew: 1
topologyKey: kubernetes.io/hostname
whenUnsatisfiable: ScheduleAnyway
- maxSkew: 1
topologyKey: zone
whenUnsatisfiable: DoNotSchedule
containers:
- name: basic-demo
image: "quay.io/fairwinds/docker-demo:1.2.0"
imagePullPolicy: Always
ports:
- name: http
containerPort: 8080
protocol: TCP
# Source: basic-demo/templates/pdb.yaml
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: demo-basic-demo
spec:
minAvailable: 80%
selector:
matchLabels:
app.kubernetes.io/name: basic-demo
app.kubernetes.io/instance: demo
The AWS Resilience Hub now includes EKS as a supported resource. This provides a single place where you can define, validate, and track the resilience of your applications. This helps you avoid unnecessary downtime caused by infrastructure, software, or operational disruptions.
Managed Node Groups and Karpenter both simplify node upgrades, taking different approaches. Managed node groups automate node provisioning and lifecycle management, which means you can create, automatically update, or terminate nodes with a single operation.
Karpenter creates new nodes automatically using the latest compatible EKS Optimized Amazon Machine Image (AMI). When EKS releases updated EKS Optimized AMIs or you upgrade the cluster, Karpenter starts using these images automatically. It also uses Node Expiry to update nodes. You can configure Karpenter to use custom AMIs, but keep in mind that if you do, you’re responsible for the version of kubelet.
Self-managed node groups are Amazon Elastic Compute Cloud (EC2) instances that were deployed in your account and attached to the cluster outside of the EKS service. Usually, these node groups are deployed and managed by some form of automation tooling, such as eksctl, kOps, and EKS Blueprints. Refer to your tools documentation to upgrade self-managed node groups.
Unsurprisingly, new versions of Kubernetes introduce significant changes to your Amazon EKS cluster. Remember that once you upgrade a cluster, you can’t downgrade it. And you can only create new clusters for Kubernetes versions that are currently supported by EKS. If you are concerned about this risk, you may want to consider backing up the cluster before an upgrade.
Although you may feel like you only have time to focus on the current version of Kubernetes, it’s important to monitor for new releases and identify significant changes. For example, the most important change for migrating from 1.23 to 1.24 was the removal of the Dockershim from the kubelet. Dockershim was an adapter of sorts between Kubernetes and Docker. This code, embedded in the kubelet to allow the kubelet to talk to the docker daemon (even though the docker daemon was not compliant with the Open Container Initiative (OCI)) was removed in 1.24. This means the kubelet now communicates directly with the container runtime using the container runtime interface (CRI) when launching and managing containers on the nodes. EKS AMIs only have containerd as the runtime as of version 1.24. Preparing for substantial changes like these requires additional time and planning.
Review all of the documented modifications for the version you plan to upgrade to, noting any required upgrade procedures. Make sure you also pay attention to requirements or processes tailored specifically to Amazon EKS managed clusters (check the Kubernetes changelog). This approach will help you have a smoother upgrade process and minimize potential disruptions.
Below is a list of some of the most well-known changes (many of which are breaking) in Kubernetes versions, starting with v1.24. This is not a complete list; please refer to the release notes for your version of Kubernetes.
Kubernetes v1.24
Kubernetes v1.25
Kubernetes v1.26
Kubernetes v1.27
Kubernetes v1.28
Kubernetes v1.29
Hopefully, the information outlined in this guide is useful to you. Consistently upgrading Kubernetes requires research and effort; you need to ensure that you have time to test your environments with each minor release. If you follow these steps, you should be in good shape to undertake upgrading EKS clusters. If you need help with your next EKS upgrade, reach out. Our team has the Kubernetes expertise to make the upgrade easy for you and make your Kubernetes infrastructure more efficient at the same time, saving you time and money.
*** This is a Security Bloggers Network syndicated blog from Fairwinds | Blog authored by Stevie Caldwell. Read the original post at: https://www.fairwinds.com/blog/guide-securely-upgrading-eks-clusters