Photo by Hamish Weir on UnsplashUpdates:2020/08/27: Git repo updated with only private subnets support option.IntroductionWe define Auto Fleet Spotting as a way to provide support for Auto Scaling of a Fleet of Spot Instances on AWS EKS.This implementation is based on the official upstream Terraform AWS EKS implementation and was extended to provide an easy way for the deployment of EKS clusters with Kubernetes 1.17.9 in any region with Auto Fleet Spotting support.With this simple AWS EKS deployment automation solution with Docker you’ll not only be able to save time by EKS deployments and enforce some security best practices, but also you can reduce your Kubernetes costs by up to 90%.I’m not joking, we’re using this implementation for developing and running our own Kubernautic Platform, which is the basis of our Rancher Shared as a Service offering.We’re using the same implementation for other dedicated Kubernetes clusters running on AWS EKS which are running with a ratio of 98% for our Spot to On-Demand Instances and scale roughly from 6 to 40 Spot Instances during the day. Here is the Grafana view on the node count of a real world project over the last 7 days.TL;DRIf you’d like to try this implementation, all what you need is to clone the extended AWS Terraform EKS repo and run the docker/eks container to deploy the latest EKS Kubernetes 1.17.9 version (at this time of writing at 20/08/16). If you’d like to build your own image, please refer to this section of the README file in the repo.# Do this on your own risk, if you might want to trust me :-)$ git clone https://github.com/arashkaffamanesh/terraform-aws-eks.git& cd terraform-aws-eks$ docker run -it –rm -v "$HOME/.aws/:/root/.aws" -v "$PWD:/tmp" kubernautslabs/eks -c "cd /tmp; ./2-deploy-eks-cluster.sh"Run EKS without the Docker ImageIf these Prerequisites are met on your machine, you can run the deployment script directly without running the docker container:$ ./2-deploy-eks-cluster.shWould you like to have a cup of coffee, tee or a cold German beer?The installation should take about 15 minutes, time enough to enjoy a delicious cup of coffee, tee or a cold German beer ;-)Please let me describe what the deployment scripts mainly do and how you can extend the cluster with a new worker node group for On-Demand or additional Spot Instances and deploy additional add-ons like Prometheus, HashiCorp Vault, Tekton for CI/CD and GitOps, HelmOps, etc.The first script `./1-clone-base-module.sh` copies the base folder from the root of the repo to a new folder which is the cluster name which you provide after running the script and provides a README file in the cluster folder with the needed steps to run the EKS deployment step by step.The second script `2-deploy-eks-cluster.sh` makes the life easier and automates all the steps which is needed to deploy an EKS cluster. The main work which the deployment script does is to ask you about the cluster name and region and apply some few sed operations to substitute some values in the configuration files like setting the S3 bucket name, the cluster name and region and create a key pair, the S3 bucket and then run `make apply` to deploy the cluster with terraform and deploy the cluster autoscaler by setting the autoscaling group name in the cluster-autoscaler-asg-name.yml.Why using S3?The S3 bucket is very useful, if you want to work with your team to extend the cluster, it mainly saves the cluster state file `terraform.tfstate` in the S3 bucket, the S3 bucket should have a unique name and will be defined as the cluster name in the backend.tf file:https://medium.com/media/cc96ff9a3a086aaf41b1262a673d7070/hrefOne of the essential parts to get autoscaling working with spot instances is to define the right IAM Policy Document and attach it to the worker autoscaling group of the worker node group IAM role:https://medium.com/media/4a26caa805dd79cdb52ed85ad56fa92a/hrefHow does the whole thing work?Each cluster may have one or more Spot Instances or On-Demand Worker groups, sometimes referred as node pools as well. The initial installation will deploy only one worker group named spot-1, which we can easily extend after the initial deployment and run make apply again.If you’d like to add a new On-Demand worker group, you’ve to extend the `main.tf` file in your cluster module (in terraform a folder is named module) as shown below and run make plan followed with make apply:$ make plan$ make applyhttps://medium.com/media/6a117e59f75001d9794bd3edfed50df8/hrefUsing Private SubnetsI extended the VPC module definition to make use of private subnets with a larger /20 network address range, which allows us to use up-to 4094 IPs for our pods.https://medium.com/media/98f725c41df79cb9caeace191ca538eb/hrefAdd-OnsSome additional components which we refer as add-ons are provided in the addons folder in the repo. Each addon should have a README file about how to use it.TestingTo test how the auto-scaling of spot instances works, you can deploy the provided nginx-to-scaleout.yaml from the addons folder and scale it to 10 replicas and see how the spot instances are scaled-out and decrease the number of replicas to 1 and see how the spot instances are scaled-in back (the scaled-in process may take up to 5 minutes or longer, or even don't happen, if something was not configured properly!).k create -f ../addons/nginx-to-scaleout.yamlk scale deployment nginx-to-scaleout –replicas=10k get pods -wk get nodesConclusionRunning an EKS cluster with autoscaling support for spot instances or on-demand instances is a matter of running a single docker run command, even if you don’t have terraform, the aws-iam-authenticator or the kubectl-terraform plugin installed. All needed configurations are provided as code, we didn’t even had to login to the AWS web console to setup the EKS cluster and configure the permissions for the cluster auto-scaler for for our worker groups.This is the true Infrastructure as Code made with few lines of code and minor adaptions to the upstream Terraform EKS implementation to run EKS with support of Autoscaling of Spot and On-Demand instances.We are using this implementation with Kubernetes 1.15 today, the in-place migration from 1.15 directly to 1.17 can’t work directly, we’re going to use blue green deployments with traffic routing to migrate our workloads and our users who are using our Kubernautic Platform to a new environment, users would be able to use a full-fledged, fully isolated virtual Kubernetes cluster running in a Stateful Namespace, stay tuned!Questions?Join us on Kubernauts Slack channel or add a comment here.We’re hiring!We are looking for engineers who love to work in Open Source communities like Kubernetes, Rancher, Docker, etc.If you wish to work on such projects and get involved to build the NextGen Kubernautic Platform, please do visit our job offerings page or ping us on twitter.Related resourcesTerraform AWS EKSTerraform Provider EKSHey Docker run EKS, with Auto Fleet Spotting Support, please! was originally published in Kubernauts on Medium, where people are continuing the conversation by highlighting and responding to this story.
Quelle: blog.kubernauts.io
Published by