Adding New Nodes to your K3s cluster¶
This guide outlines the process of adding new nodes to your existing K3s cluster using the AWS CloudShell interface. We'll cover the steps necessary to create and provision an AWS EC2 instance and then connect it to your K3s cluster.
Prerequisites¶
- An existing and functional K3s cluster.
- AWS account with appropriate permissions.
- AWS CloudShell access configured with your AWS credentials.
- Familiarity with basic Linux commands and AWS concepts.
Step 1: Set Up Environment Variables in AWS CloudShell¶
To simplify the AWS CLI commands, we'll define several environment variables within your CloudShell session. Copy and paste the following commands into your CloudShell terminal to set these variables.
You can create a file named EnvVars
with the variable assignments and then source the file. Make sure to replace the example values with your actual configuration.. For Example:
# AWS Information
export REGION="us-east-2" # AWS Default Region
export CIDR_VPC="10.0.0.0/24" # VPC CIDR
export CIDR_SUBNET="10.0.0.0/24" # VPC Subnet
export SSH_KEY_NAME="MVAI-SSH-Key" # SSH Key Pair Name
export SG_NAME="MVAIsg" # Security Group Name
export VPC_NAME="MemVergeAI-VPC" # VPN Name
export SUBNET_NAME="MemVergeAI-Subnet" # Subnet Name
export RT_NAME="MemVergeAI-RouteTable" # Routing Table Name
export IG_NAME="MemVergeAI-IGW" # Ingress Name
export FILE_SYSTEM_NAME="MemVergeAI-EFS" # EFS File System Name
export VPC_ID="vpc-01bdeafcc0ce883e5" # VPC ID
export SUBNET_ID="subnet-01f24fb72235228ed" # Subnet ID
export IGW_ID="igw-06ffc82ccff0bf75f" # Ingress ID
export RT_ID="rtb-05e6d9dcaf649f7ff" # Routing Table ID
export SG_ID="sg-00dbfae93e065b028" # Security Group ID
export AMI_ID="ami-0c3b809fcf2445b6a" # AMI Image ID for Ubuntu 22.04
export FILE_SYSTEM_ID="fs-06089fdf3a7751a5f" # EFS File System ID
export EFS_DNSNAME="fs-06089fdf3a7751a5f.efs.us-east-2.amazonaws.com" # EFS File System DNS Fully Qualified Name
# GPU Worker Node Info
export INSTANCE_TYPE="g5.2xlarge"
export INSTANCE_NAME="MemVergeAI-GPU-Worker02"
Source the environment variables:
Verify that the variables are set correctly:
Step 2: Create and Configure the EC2 Instance¶
Use the AWS CLI to launch a new EC2 instance with the defined parameters. This command creates a g5.2xlarge
instance in your VPC, assigns it to your security group, associates it with your SSH key, and tags it with a name.
WORKER_INSTANCE_ID=$(aws ec2 run-instances \
--image-id $AMI_ID \
--count 1 \
--region $REGION \
--instance-type g5.2xlarge \
--key-name $SSH_KEY_NAME \
--security-group-ids $SG_ID \
--subnet-id $SUBNET_ID \
--associate-public-ip-address \
--tag-specifications 'ResourceType=instance,Tags=[{Key=Name,Value=MemVergeAI-GPU-Worker02}]' \
--block-device-mappings "[
{
\"DeviceName\": \"/dev/sda1\",
\"Ebs\": {
\"VolumeSize\": 60,
\"VolumeType\": \"gp3\"
}
}
]" \
--query 'Instances[0].InstanceId' \
--output text)
echo "New GPU worker instance ID: $WORKER_INSTANCE_ID"
- Explanation:
--image-id
: Specifies the AMI to use (Ubuntu 22.04 in this example).--count 1
: Launches one instance.--instance-type
: Sets the instance type (defaults to g5.2xlarge).--key-name
: Associates the instance with your SSH key for secure access.--security-group-ids
: Assigns the instance to your existing security group.--subnet-id
: Launches the instance in your specified subnet.--associate-public-ip-address
: Requests a public IP address for the instance.--tag-specifications
: Adds a tag to the instance for identification.--region
: Specifies the AWS region.--block-device-mappings
: Sets the OS boot drive size to 60GB and typegp3
.
Get the Instance ID & IP Address¶
After running the run-instances
command, note the Instance ID from the output. We'll use this to retrieve the public IP address. Alternatively, go to the AWS EC2 console and find your newly created instance.
Example:
aws ec2 describe-instances \
--filters "Name=vpc-id,Values=$VPC_ID" \
--query "Reservations[].Instances[].{ID:InstanceId,Name:Tags[?Key=='Name']|[0].Value,State:State.Name,PublicIP:PublicIpAddress,PrivateIP:PrivateIpAddress}" \
--output table
Example:
----------------------------------------------------------------------------------------------------
| DescribeInstances |
+---------------------+---------------------------+-------------+----------------+-----------------+
| ID | Name | PrivateIP | PublicIP | State |
+---------------------+---------------------------+-------------+----------------+-----------------+
| i-0770b293b7b6383e0| MemVergeAI-Management01 | 10.0.0.156 | 3.20.192.186 | running |
| i-02a83e5064fccd806| MemVergeAI-GPU-Worker01 | 10.0.0.9 | 3.128.242.144 | running |
| i-04d3204eab2b5eb6c| MemVergeAI-GPU-Worker02 | 10.0.0.51 | 18.117.72.15 | running |
+---------------------+---------------------------+-------------+----------------+-----------------+
Continue once all instances are in the running
state.
Save the Public IP address to the NEW_NODE_IP
variable:
SSH to the new Instance¶
SSH to the new host.
Example:
Renaming AWS EC2 Hostnames (Optional)¶
The default hostnames created by AWS are not intuitive for the MemVerge.ai cluster. You can rename your AWS EC2 instances to more intuitive hostnames like mvai-nvgpu02
. This will make your cluster management more manageable.
-
Update the hostname on each instance
SSH into each EC2 instance and run the following commands:
Replace "new-hostname" with your desired hostname (e.g., MemVerge.ai-mgmt, MemVerge.ai-node001).
-
Update /etc/hosts file
Edit the /etc/hosts file and add a line with the new hostname below the default
127.0.0.1 localhost
line: -
Update DNS settings (Optional)
If you're using Amazon Route 53 or another DNS service, update the DNS records to reflect the new hostnames.
-
Reboot the host:
-
When the system boots, verify the new hostname is correct:
Updating /etc/hosts on All Nodes¶
To ensure proper communication between nodes in your cluster, you must add the hostnames and IP addresses of all nodes to the /etc/hosts
file on each system. This step is crucial when not using DNS for hostname resolution. If you use DNS, this step is not required. Ensure your DNS entries are correct.
-
Gather the private IP addresses and hostnames of all nodes in your cluster using
ip a
. -
SSH into each node (management and worker nodes). The default user for Ubuntu Linux is
ubuntu
: -
On each node, edit the /etc/hosts file:
-
Add entries for all nodes in your cluster. The format is:
For example, add these lines:
# MemVerge.ai Cluster IP Addresses and Hostnames 10.0.0.156 mvai-mgmt 10.0.0.9 mvai-nvgpu01 10.0.0.51 mvai-nvgpu02
Add an entry for each node in your cluster, including the node you're currently editing.
-
Save the file and exit the editor.
-
Repeat steps 2-5 for each node in your cluster.
-
Verify the changes by pinging other nodes using their hostnames:
Ensure that each node can ping all other nodes using their hostnames.
By adding these entries to
/etc/hosts
on all systems, you ensure that each node can resolve the hostnames of other nodes in the cluster. This is crucial for Kubernetes and other cluster components to communicate properly.Remember to update the
/etc/hosts
file on all nodes whenever you add or remove nodes from your cluster. While this manual process works well for smaller, static clusters, using DNS is generally preferred for larger or more dynamic environments.
Step 3: Join the New Node to the K3s Cluster¶
Get K3s Server Token and Address. On your K3s management server node, retrieve the K3s server token and server address:
On the new node instance, run the following command, replacing <K3S_URL>
with the K3s server address (e.g., https://mvai-mgmt:6443
) and <K3S_TOKEN>
with the node token:
Check Node Status. On your K3s management server, verify that the new node has joined the cluster and is in Ready
state:
Example:
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
mvai-mgmt Ready control-plane,master 23h v1.31.6+k3s1
mvai-nvgpu01 Ready <none> 23h v1.31.6+k3s1
mvai-nvgpu02 Ready <none> 22s v1.31.6+k3s1
Mounting the EFS Volume on Management and GPU Worker Nodes¶
-
Install NFS Utilities
On Ubuntu 22.04, the EFS mount requires
nfs-common
: -
Create a Mount Directory
Create a local mount point (e.g.,
/mnt/efs
) on each node: -
Determine the EFS Mount Endpoint
3.1. Using EFS DNS Name
By default, Amazon EFS provides a DNS name in the format:For instance, if your
$FILE_SYSTEM_ID
isfs-06089fdf3a7751a5f
and your$REGION
isus-east-2
, the EFS endpoint would be:3.2. Optional: Using the Mount Target IP
As shown in your creation output, the IpAddress might be10.0.0.190
. You can mount using that IP directly, but it’s generally better to rely on the DNS name for high availability and automatic failover between Availability Zones. -
Mount the EFS File System
Use the following command example to mount EFS on each node. Replace the DNS with the one displayed in the previous step:
Replace:
fs-06089fdf3a7751a5f.efs.us-east-2.amazonaws.com
with your actual EFS DNS endpoint./mnt/efs
with the directory you wish to mount on, if different.
Tip: Confirm the mount is successful:
You should see an entry similar to:
fs-06089fdf3a7751a5f.efs.us-east-2.amazonaws.com:/ nfs4 … /mnt/efs
-
Persist the Mount in
/etc/fstab
To ensure the EFS file system automatically remounts after reboot or instance stop/start, add an
/etc/fstab
entry on each node:echo "fs-06089fdf3a7751a5f.efs.us-east-2.amazonaws.com:/ /mnt/efs nfs4 defaults,_netdev 0 0" | sudo tee -a /etc/fstab
_netdev
ensures the system knows this mount requires a network connection before mounting.- You can add additional options (e.g.,
rsize=1048576
,wsize=1048576
) if needed, but the above defaults typically suffice.
Once added, test the
fstab
entry by unmounting and remounting:
If successful, EFS should remount without errors. Use df -hT | grep efs
to confirm the file system is mounted.
Verify the Node is Available in MemVerge.ai¶
Login to the MemVerge.ai Management UI Console and verify you can see the new node in the nodes list https://mvai-mgmt/dashboard/nodes
Summary¶
Congratulations! You have successfully added a new node to the Kubernetes cluster and you should now see the node in MemVerge.ai's UI.