Skip to content

Creating the AWS EC2 Environment (Overview)

Overview

This guide explains how to set up a 2-node K3s cluster on AWS, using one m5.xlarge instance for the management (control plane) node and one g5.2xlarge instance for the GPU worker node, both running Ubuntu 22.04. We will provision an Amazon EFS file system and configure Let’s Encrypt certificates, ensuring that the cluster can be stopped, started, and scaled up or down at any time without breaking inbound HTTPS connections.

Key Components

  1. AWS VPC

    • Create an isolated Virtual Private Cloud (VPC) for security and separation from other environments.
    • This ensures each customer or project has its own dedicated network.
  2. Security Group

    • Open external ports for HTTP and HTTPS (ports 80 and 443).
    • Allow internal communication on ports required by K3s, Grafana, and Prometheus (commonly 6443, 3000, 9090, 9093, etc.).
    • Keep the cluster traffic isolated to only the ports it needs.
  3. SSH Key Pairs

    • Generate new key pairs for the EC2 instances.
    • Important for secure remote administration of each node.
  4. EC2 Instances

    • One m5.xlarge for the management/control-plane node.
    • One g5.2xlarge for the GPU worker node (if GPU workloads are required).
    • Both run Ubuntu 22.04.
    • After launch, update the OS with apt update && apt upgrade, but do not upgrade to a newer major release (e.g., do not move from Ubuntu 22.04 to 22.10).
  5. EFS File System

    • Provides a shared file system accessible by all cluster nodes.
    • Must be mounted automatically at every reboot or instance stop/start.
  6. Domain Name & Certificates

    • Use Let’s Encrypt for TLS certificates on your K3s Ingress (or any exposed HTTPS services).
    • Because public IP addresses change when stopping/starting EC2 instances, leverage AWS Elastic IPs or Route53 with a custom domain (or subdomain).
    • Ensures your browser trusts the certificates to avoid security warnings.
  7. Network Connectivity

    • All nodes must be able to reach each other privately (within the VPC).
    • Public accessibility for HTTP/HTTPS or other required endpoints.
  8. High Availability & Scalability

    • Ability to add and remove worker nodes anytime.
    • Instructions for converting the single control-plane node into a High-Availability control plane by adding additional management nodes—either at initial setup or post-deployment.

Installation Flow

Below is the recommended order of operations for a successful deployment:

  1. Set Up or Log Into Your AWS Account

    • Make sure you have the correct IAM permissions to create VPCs, security groups, and EC2 instances.
  2. Create a Dedicated VPC

    • Configure subnets, internet gateway, and route tables as needed for both public and private access.
    • Keep resources isolated from other projects or customers.
  3. Create a Security Group

    • Allow inbound ports for SSH (22), HTTP (80), and HTTPS (443).
    • Enable internal ports for K3s communication, Grafana, Prometheus, and any other internal services.
  4. Generate and Store SSH Keys

    • Use AWS Cloud Shell (or your local machine) to create a new key pair.
    • Upload or reference these keys when launching EC2 instances.
  5. Launch Your EC2 Instances

    • One m5.xlarge (management node).
    • One g5.2xlarge (GPU worker node).
    • Use Ubuntu 22.04 AMIs.
    • Attach an IAM role if necessary for EFS or Route53 integration (optional but recommended).
  6. Initial Instance Configuration

    • SSH into each instance using the new key pair.
    • Run sudo apt update && sudo apt upgrade -y to install the latest patches.
    • (Do not upgrade to a new Ubuntu major release.)
  7. Set Up EFS

    • Create a new EFS file system (100GB).
    • Mount EFS on both EC2 nodes and ensure it auto-mounts on reboot (e.g., using /etc/fstab).
  8. Set Up DNS & Certificates

    • Reserve Elastic IP(s) for your management node (and any additional nodes you want publicly accessible).
    • (Optional) Configure a domain or subdomain in AWS Route53.
    • Install Let’s Encrypt (e.g., via Certbot or a K3s Ingress Controller that supports ACME) to generate trusted certificates for the cluster Ingress.
    • Confirm that your browser trusts the site via HTTPS without warnings.
  9. Scale the Cluster (Optional)

    • Add worker nodes by repeating the Launch & Join procedure.
    • Remove worker nodes gracefully when no longer needed.
  10. Enable High Availability (Optional)

    • When ready, add a second or third management AWS EC2 node to convert K3s to HA mode.
    • This can be done at initial installation or anytime later.
  11. Test the Setup

    • Confirm all nodes can ping each other.
    • Ensure EFS mounts persist after reboots.
    • Validate Let’s Encrypt certificates.
    • Test scaling in/out and confirm the cluster sees new/removed nodes.

Additional Considerations

  • IAM Roles & Permissions:
    Depending on your organization’s policies, you might need to create or attach IAM roles to allow EC2 to access EFS or Route53.
  • Automating with Infrastructure as Code:
    You can streamline repetitive tasks and reusability by using AWS CloudFormation, Terraform, or similar tooling, but this guide focuses on AWS Cloud Shell Terminal commands for clarity.
  • Monitoring & Logging:
    Consider setting up AWS CloudWatch for logs or metrics, as well as Prometheus and Grafana dashboards once the K3s cluster is up.

Prerequisites

This guide uses AWS commands to create the environment. You will need:

  • An AWS Account with permissions to create VPCs, subnets, security groups, EC2 instances, EFS, etc.
  • An AWS Cloud Shell or a local machine with AWS CLI configured and credentials properly set.
  • Your user or role has permissions to create VPCs, subnets, internet gateways, route tables, security groups, key pairs, EFS resources, and EC2 instances.