How to Set Up a Kubernetes Cluster with Terraform on AWS

How to Set Up a Kubernetes Cluster with Terraform on AWS

Estimated Time: 60–90 minutes (including provisioning wait times)


Overview

Kubernetes has become the de facto standard for container orchestration, and Terraform is the industry-leading Infrastructure as Code (IaC) tool. In this tutorial, you’ll provision a production-grade Amazon EKS (Elastic Kubernetes Service) cluster on AWS using Terraform — from scratch. By the end, you’ll have a fully functional cluster with worker nodes, a VPC, and `kubectl` access configured on your local machine.

We’ll cover:

  • What you need before starting (tools, accounts, IAM permissions)
  • Setting up the Terraform project structure
  • Creating the VPC and networking layer
  • Provisioning the EKS control plane
  • Adding managed node groups
  • Configuring `kubectl` and verifying the cluster
  • A troubleshooting section for the most common pitfalls

  • Prerequisites

    Make sure you have the following before you begin:

    | Tool / Account | Purpose | Get It |
    |—|—|—|
    | **AWS Account** | Cloud provider for provisioning EC2, VPC, and EKS | [aws.amazon.com](https://aws.amazon.com) |
    | **AWS CLI** (v2+) | Authenticate Terraform with AWS | `brew install awscli` / `apt install awscli` |
    | **Terraform** (v1.5+) | Infrastructure as Code engine | [terraform.io/downloads](https://developer.hashicorp.com/terraform/downloads) |
    | **kubectl** (v1.28+) | Kubernetes command-line tool | `brew install kubectl` / `apt install kubectl` |
    | **aws-iam-authenticator** or **awscli v2 helper** | Authenticate `kubectl` with EKS | Bundled with `aws eks update-kubeconfig` (AWS CLI v2) |
    | **SSH key pair (optional)** | Debug worker nodes via SSH | Created in AWS EC2 console |

    IAM Permissions Required (attach to your user or role):

    {
      "Version": "2012-10-17",
      "Statement": [
        {
          "Effect": "Allow",
          "Action": [
            "ec2:*",
            "eks:*",
            "iam:*",
            "autoscaling:*",
            "cloudformation:*",
            "kms:*"
          ],
          "Resource": "*"
        }
      ]
    }

    **Production note:** Restrict these permissions to specific resources and use conditions in a real environment. The above is for learning purposes.


    Step 1 — Configure AWS Credentials

    Authenticate your CLI session so Terraform can make API calls on your behalf.

    Option A — Profile (recommended):

    aws configure --profile nova-tech-lab

    You’ll be prompted for:

    AWS Access Key ID: AKIA...
    AWS Secret Access Key: wJalrX...
    Default region name: us-east-1
    Default output format: json

    Option B — Environment variables (CI/CD friendly):

    export AWS_ACCESS_KEY_ID="AKIA..."
    export AWS_SECRET_ACCESS_KEY="wJalrX..."
    export AWS_DEFAULT_REGION="us-east-1"

    Verify your setup:

    aws sts get-caller-identity
    # -> { "Account": "123456789012", "UserId": "AIDA...", "Arn": "arn:aws:iam::123456789012:user/your-user" }

    Step 2 — Create the Terraform Project Structure

    Create a clean directory and initialize the project:

    mkdir -p ~/projects/eks-terraform && cd ~/projects/eks-terraform
    touch main.tf variables.tf outputs.tf terraform.tfvars

    Your layout will look like this:

    eks-terraform/
    ├── main.tf              # Core infrastructure (VPC, EKS, node groups)
    ├── variables.tf          # Input variables
    ├── terraform.tfvars      # Variable values (keep out of version control)
    └── outputs.tf            # Useful output values (kubeconfig, cluster name)

    Step 3 — Declare Providers and Backend

    Open `main.tf` in your editor and add the provider configuration:

    terraform {
      required_version = ">= 1.5"
    
      required_providers {
        aws = {
          source  = "hashicorp/aws"
          version = "~> 5.0"
        }
        kubernetes = {
          source  = "hashicorp/kubernetes"
          version = "~> 2.23"
        }
      }
    
      # Optional: state stored locally. Replace with S3 backend for teams.
      backend "local" {
        path = "terraform.tfstate"
      }
    }
    
    provider "aws" {
      region = var.aws_region
    }

    In `variables.tf`:

    variable "aws_region" {
      description = "AWS region to deploy resources"
      type        = string
      default     = "us-east-1"
    }
    
    variable "cluster_name" {
      description = "Name of the EKS cluster"
      type        = string
      default     = "nova-tech-eks"
    }
    
    variable "cluster_version" {
      description = "Kubernetes version for the cluster"
      type        = string
      default     = "1.30"
    }
    
    variable "instance_types" {
      description = "EC2 instance types for node group"
      type        = list(string)
      default     = ["t3.medium"]
    }
    
    variable "desired_node_count" {
      description = "Desired number of worker nodes"
      type        = number
      default     = 2
    }
    
    variable "min_node_count" {
      type    = number
      default = 1
    }
    
    variable "max_node_count" {
      type    = number
      default = 4
    }

    In `terraform.tfvars`:

    aws_region      = "us-east-1"
    cluster_name    = "nova-tech-eks"
    cluster_version = "1.30"
    instance_types  = ["t3.medium"]

    Step 4 — Create a Custom VPC for EKS

    EKS requires a well-configured VPC. We’ll use the official AWS VPC Terraform module. Add this to `main.tf`:

    module "vpc" {
      source  = "terraform-aws-modules/vpc/aws"
      version = "5.8.1"
    
      name = "${var.cluster_name}-vpc"
      cidr = "10.0.0.0/16"
    
      azs             = ["us-east-1a", "us-east-1b", "us-east-1c"]
      private_subnets = ["10.0.1.0/24", "10.0.2.0/24", "10.0.3.0/24"]
      public_subnets  = ["10.0.101.0/24", "10.0.102.0/24", "10.0.103.0/24"]
    
      enable_nat_gateway           = true
      enable_vpn_gateway           = false
      enable_dns_hostnames         = true
      enable_dns_support           = true
      single_nat_gateway           = false
      one_nat_gateway_per_az       = false
    
      public_subnet_tags = {
        "kubernetes.io/role/elb" = "1"
      }
    
      private_subnet_tags = {
        "kubernetes.io/role/internal-elb" = "1"
      }
    
      tags = {
        Environment = "dev"
        Project     = var.cluster_name
      }
    }

    **Why this matters:** EKS automatically provisions Load Balancers in public subnets and internal load balancers in private subnets. The tags above tell EKS which subnets to use.


    Step 5 — Provision the EKS Cluster (Control Plane)

    Still in `main.tf`, add the EKS module:

    module "eks" {
      source  = "terraform-aws-modules/eks/aws"
      version = "20.8.5"
    
      cluster_name    = var.cluster_name
      cluster_version = var.cluster_version
    
      vpc_id     = module.vpc.vpc_id
      subnet_ids = module.vpc.private_subnets
    
      cluster_endpoint_public_access  = true
      cluster_endpoint_private_access = false
    
      # Control plane security group: allow all traffic from VPC
      cluster_security_group_additional_rules = {
        ingress_self = {
          description = "Allow ingress from cluster itself"
          protocol    = "-1"
          from_port   = 0
          to_port     = 0
          type        = "ingress"
          cidr_blocks = ["10.0.0.0/16"]
        }
      }
    
      # Enable EKS-managed add-ons
      cluster_addons = {
        coredns = {
          most_recent = true
        }
        kube-proxy = {
          most_recent = true
        }
        vpc-cni = {
          most_recent = true
        }
      }
    }

    Step 6 — Add Managed Node Groups

    Worker nodes run your actual container workloads. Attach them to the EKS module block:

    Append inside the `module “eks” { }` block:

      eks_managed_node_groups = {
        main = {
          desired_size = var.desired_node_count
          min_size     = var.min_node_count
          max_size     = var.max_node_count
    
          instance_types = var.instance_types
    
          # Use a custom AMI with a sensible disk size
          block_device_mappings = {
            xvda = {
              device_name = "/dev/xvda"
              ebs = {
                volume_size           = 40
                volume_type           = "gp3"
                encrypted             = true
                delete_on_termination = true
              }
            }
          }
    
          tags = {
            "kubernetes.io/cluster/${var.cluster_name}" = "owned"
          }
        }
      }

    The full EKS module now looks like this (skeleton):

    module "eks" {
      source                   = "terraform-aws-modules/eks/aws"
      version                  = "20.8.5"
      cluster_name             = var.cluster_name
      cluster_version          = var.cluster_version
      vpc_id                   = module.vpc.vpc_id
      subnet_ids               = module.vpc.private_subnets
      cluster_endpoint_public_access = true
      cluster_addons           = { ... }
      eks_managed_node_groups  = { main = { ... } }
    }

    Step 7 — Configure Data Sources and Outputs

    In `outputs.tf`, add values you’ll need after provisioning:

    output "cluster_endpoint" {
      description = "Endpoint for your EKS Kubernetes API"
      value       = module.eks.cluster_endpoint
    }
    
    output "cluster_name" {
      description = "EKS cluster name"
      value       = module.eks.cluster_name
    }
    
    output "cluster_certificate_authority_data" {
      description = "Base64-encoded certificate data required to communicate with the cluster"
      value       = module.eks.cluster_certificate_authority_data
    }
    
    output "region" {
      description = "AWS region"
      value       = var.aws_region
    }

    Also add a data source in `main.tf` to fetch the caller identity (used by the Kubernetes provider):

    data "aws_caller_identity" "current" {}

    Step 8 — Deploy the Infrastructure

    Now the exciting part — apply the Terraform plan:

    cd ~/projects/eks-terraform
    
    terraform init
    # -> Initializing modules...
    # -> Terraform has been successfully initialized!
    
    terraform plan
    # Review the output — you should see ~80+ resources to be created

    If everything looks good:

    terraform apply -auto-approve

    This will take 15–25 minutes (EKS control plane provisioning is the bottleneck). Grab a coffee. Terraform will show progress as resources are created:

    module.vpc.aws_vpc.this: Creating...
    module.vpc.aws_subnet.private[0]: Creation complete...
    ...
    module.eks.module.eks_cluster.aws_eks_cluster.this: Still creating... [10m elapsed]
    ...
    Apply complete! Resources: 84 added, 0 changed, 0 destroyed.

    Step 9 — Configure kubectl

    Once the apply is complete, configure `kubectl` to talk to your new cluster:

    aws eks update-kubeconfig \
      --region $(terraform output -raw region) \
      --name $(terraform output -raw cluster_name)

    Expected output:

    Added new context arn:aws:eks:us-east-1:123456789012:cluster/nova-tech-eks to /home/user/.kube/config

    Test connectivity:

    kubectl cluster-info
    # -> Kubernetes control plane is running at https://...
    # -> CoreDNS is running at https://...
    
    kubectl get nodes
    # -> NAME                          STATUS   ROLES    AGE   VERSION
    # -> ip-10-0-1-xx.ec2.internal    Ready    <none>   5m    v1.30.0-eks-...
    # -> ip-10-0-2-xx.ec2.internal    Ready    <none>   5m    v1.30.0-eks-...

    Step 10 — Deploy a Smoke-Test Application

    Verify that the cluster can schedule pods and expose services:

    kubectl create deployment nginx-test \
      --image=nginx:alpine \
      --replicas=3
    
    kubectl expose deployment nginx-test \
      --type=LoadBalancer \
      --port=80 \
      --target-port=80
    
    kubectl get pods -w
    # -> nginx-test-xxxxx-xxxxx   1/1   Running
    # -> nginx-test-xxxxx-xxxxx   1/1   Running
    # -> nginx-test-xxxxx-xxxxx   1/1   Running

    Once the LoadBalancer is provisioned (1–2 minutes):

    kubectl get svc nginx-test
    # -> NAME         TYPE           EXTERNAL-IP     PORT(S)
    # -> nginx-test   LoadBalancer   a1234-....elb.amazonaws.com   80:31234/TCP
    
    curl http://a1234-....elb.amazonaws.com
    # -> Welcome to nginx!

    Clean up the test deployment when done:

    kubectl delete deployment nginx-test
    kubectl delete svc nginx-test

    Step 11 — Clean Up (Avoid Surprise Bills)

    To destroy everything and avoid ongoing AWS charges:

    terraform destroy -auto-approve

    This tears down the node groups, the EKS control plane, the VPC, and all associated resources. Confirm you see:

    Destroy complete! Resources: 84 destroyed.

    ⚠️ **Important:** If you skip `terraform destroy`, an EKS cluster running continuously in `us-east-1` with `t3.medium` nodes costs roughly **$0.25–$0.40/hour** (~$200–$300/month).


    Troubleshooting

    Here are the most common issues you’ll encounter and how to fix them.

    1. “Error creating EKS cluster: UnauthorizedException”

    Cause: Your IAM user/role doesn’t have sufficient permissions.

    Fix: Attach the `AmazonEKSClusterPolicy` and `AmazonEKSAdminPolicy` managed policies, or use the IAM policy block from the Prerequisites section above. Verify with:

    aws iam list-attached-user-policies --user-name YOUR_USER

    2. “Timeout waiting for EKS cluster to become ready”

    Cause: EKS control plane provisioning is slow, or the VPC configuration is wrong (missing DNS hostnames, no NAT gateway for private subnets).

    Fix:

  • Ensure your Terraform VPC has `enable_dns_hostnames = true` and `enable_dns_support = true`.
  • Private subnets **must** have a route to a NAT Gateway for the control plane to communicate with nodes.
  • Increase Terraform timeouts: `cluster_timeout = { create = “45m” }` inside the EKS module.
  • 3. “Node group creation failed: Unhealthy nodes”

    Cause: The worker nodes can’t register with the EKS control plane.

    Common checks:

  • The node group’s security group allows outbound traffic (port 443) to the EKS endpoint.
  • The node instance role has the `AmazonEKSWorkerNodePolicy`, `AmazonEKS_CNI_Policy`, and `AmazonEC2ContainerRegistryReadOnly` policies attached.
  • The node group uses an EKS-optimized AMI (the module handles this automatically).
  • # Check node status
    kubectl describe node <node-name>
    
    # Check node group events
    aws eks describe-nodegroup \
      --cluster-name nova-tech-eks \
      --nodegroup-name main

    4. “kubectl: connect: connection refused”

    Cause: The EKS endpoint is not publicly accessible, or your `kubeconfig` is stale.

    Fix:

    # Re-generate kubeconfig
    aws eks update-kubeconfig --region us-east-1 --name nova-tech-eks
    
    # Verify public endpoint
    aws eks describe-cluster --name nova-tech-eks \
      --query "cluster.resourcesVpcConfig.endpointPublicAccess"
    # -> true

    5. “NoCredentialProviders: no valid providers in chain”

    Cause: AWS credentials are not configured in your environment.

    Fix:

    aws configure
    # OR
    export AWS_ACCESS_KEY_ID="..."
    export AWS_SECRET_ACCESS_KEY="..."

    6. “Failed to create EBS volume: the availability zone does not exist”

    Cause: You specified an AZ that isn’t enabled in your AWS account.

    Fix: Check enabled AZs:

    aws ec2 describe-availability-zones --region us-east-1

    Then update your `main.tf` to use only enabled zones in the VPC module.


    Next Steps

    Now that your EKS cluster is up and running, here’s what you can do next:

    | Task | Suggested Tool / Approach |
    |—|—|
    | Deploy a real application | `kubectl apply -f deployment.yaml` |
    | Install Ingress Controller | `helm install ingress-nginx ingress-nginx/ingress-nginx` |
    | Add monitoring | `helm install prometheus prometheus-community/kube-prometheus-stack` |
    | Enable cluster autoscaling | Deploy the **Cluster Autoscaler** or **Karpenter** |
    | Store Terraform state remotely | Use an S3 backend with DynamoDB locking |
    | Set up CI/CD | Use GitHub Actions or GitLab CI to run `terraform apply` on merge |


    Conclusion

    You’ve just provisioned a fully functional Kubernetes cluster on AWS EKS using Terraform, with:

  • ✅ A properly tagged VPC with public and private subnets
  • ✅ A managed EKS control plane (version 1.30)
  • ✅ Managed node groups with auto-scaling
  • ✅ `kubectl` connectivity and a smoke-tested deployment
  • This pattern is battle-tested and used by teams shipping to production daily. The same Terraform code can be adapted for staging, QA, and production environments by parameterizing variables and swapping in an S3 backend.

    Next time you need an EKS cluster, you can have one running in under 30 minutes — fully automated, version-controlled, and repeatable.


    *Tutorial by the Nova Tech Cloud Team. We help companies build, secure, and scale cloud infrastructure. [Contact us](https://nova-tech.cloud/contact) for consulting, training, or managed DevOps services.*

    Similar Posts

    Leave a Reply

    Your email address will not be published. Required fields are marked *