AWS Public Sector Blog

How to put a supercomputer in the hands of every scientist

Leveraging the Amazon Web Services (AWS) Cloud gives you access to virtually unlimited infrastructure and offers many choices of infrastructure suitable for high performance computing (HPC) workloads. HPC on AWS helps remove long queues and waiting times so you don’t have to choose availability over performance.

Read on to learn how to use AWS ParallelCluster to set up and manage an HPC cluster in a flexible, elastic, and repeatable way. See how you can make your favorite HPC application readily available in that cluster. As an example, we use the open-source application Palabos (AGPLv3 license).

Figure 1. The AWS Parallel Cluster architecture.

Figure 1. The AWS Parallel Cluster architecture to be used.

For this walkthrough, have the following available:

  • An AWS account
  • Web browser (i.e. Chrome, Edge, Safari, or Firefox)
  • Optional: Basic Linux Knowledge

Step A

Create an Amazon Elastic Compute Cloud (Amazon EC2) Instance; the service provides secure, resizable compute capacity in the cloud.

1. In the AWS Management Console search bar, type Amazon EC2.

2. Choose EC2 to open the EC2 dashboard and navigate to the Instances. Select the Launch Instance button.

Figure 2. In the EC2 dashboard, under Instances, select the Launch Instance button.

Figure 2. In the EC2 dashboard, under Instances, select the Launch Instance button.

3. On the left side menu, select Community AMIs. In the search bar, enter aws-parallelcluster-2.9.1-amzn2-hvm-x86 and select enter. When the AMI appears, choose Select.

4. Pick the c5.2xlarge instance then choose Next: Configure Instance Details.

5. In the Network section, select the same VPC ID and same Subnet ID from your AWS Cloud9 Instance—an AWS provided Integrated Development Environment (IDE) that acts as a managed terminal in the cloud.

6. Choose Next: Add Storage. Select Add New Volume and in Size type 100. In Volume Type, select General Purpose SSD.

Figure 3. When you select Add New Volume, under Size, add in 100, and under Volume Type, select General Purpose SSD.

Figure 3. When you select Add New Volume, under Size, add in 100, and under Volume Type, select General Purpose SSD.

7. Choose Next: Add Tags.

8. Choose Add Tag. Then as a Key, input the word Name. For Value, input My HPC Blog Instance. This will be the name of your instance.

9. Choose Next: Configure Security Groups.

10. Select the Create a new security group check box, and if desired, change the Security Group name. The type should be ssh, protocol TCP, port range 22 and the source 0.0.0.0/0.

11. Choose Review and Launch. Ignore any warnings messages.

12. On the review page, choose Launch.

13. A new window opens, indicating to select an existing key pair. Select the first Combo box and select Create a new key pair. On the Key pair name, type KeyPairHPCBlog. Do click on Download Key Pair and then select Launch Instance. Be sure to store it on a known folder since you are using this key to connect to the instance later.

Figure 4. In the Key pair name, type KeyPairHPCBlog and then select Download Key Pair and then Launch Instance.

Figure 4. In the Key pair name, type KeyPairHPCBlog and then select Download Key Pair and then Launch Instance.

Step B

Now we prepare the software and attach volume to the instance.

1. Go to the EC2 dashboard. Select the instance you just created by using the checkbox next to the instance name.

Figure 5. Select the checkbox next to the instance name.

Figure 5. Select the checkbox next to the instance name.

2. Select Actions then Connect.

3. On the Connect window, select the SSH CLIENT tab. You find the IP address and SSH command to connect to your instance.

Figure 6. Select the SSH client tab to find the IP address and SSH command to connect to your instance.

Figure 6. Select the SSH client tab to find the IP address and SSH command to connect to your instance.

4. Open a terminal window on your laptop or in AWS Cloud9. Connect to the instance using ssh—specify the name of the downloaded keys from the previous step and the instance’s IP Address.

ssh -i "KeyPairHPCBlog.pem" ec2-user@instanceIP

5. List the volumes of your instance by using the command lsblk. Write down your 100GB volume name (i.e. xvdb).

Figure 7. List the volumes of your instance. Write the name of your 100GB volume; here it is xvdb.

Figure 7. List the volumes of your instance. Write the name of your 100GB volume; here it is xvdb.

6. Create a file system on the volume by using the command mkfs -t. Specify your volume name.

sudo mkfs -t xfs /dev/volumename

7. Create a mount point directory for the volume by using the mkdir The mount point is where the volume is located in the file system tree and where you read and write files to after you mount the volume (ie /apps).

sudo mkdir /apps

8. Use the following command to mount the volume at the directory you created in the previous step.

sudo mount /dev/volumename /apps

9. Use the lsblk command again to verify your volume and the /apps partition.

Figure 8. Use the lsblk command to verify your volume.

Figure 8. Use the lsblk command to verify your volume.

10. Add permission for your ec2-user to write on the /apps.

sudo chown -R ec2-user /apps
sudo chgrp -R ec2-user /apps

11. Change your directory to /apps by using the command cd /apps.

  • Proceed with the software installation. For this example, we will use Palabos, download a source package, build it, and run it.
  • Download the software by executing the wget command as follows:

wget https://gitlab.com/unigespc/palabos/-/archive/v2.2.1/palabos-v2.2.1.tar.gz

  • When the download finishes, de-compress your file by executing tar

tar xvfz palabos-v2.2.1.tar.gz

12. Go to the following directory. Run cmake and make command to compile our example.

Here we used Bousinessq approximation (Guide to Palabos) that takes approximately 1h:32 mins on 20 node (c5.18xlarge). Concerned about the cost? Select a smaller showcase.

cd palabos-v2.2.1/examples/showCases/boussinesqThermal3d/build
cmake ..
make

13. To visualize our results in the cloud, we are going to download open-source Paraview (permissive BSD license). Use the following command:

wget -O - -q "https://www.paraview.org/paraview-downloads/download.php?submit=Download&version=v5.8&type=binary&os=Linux&downloadFile=ParaView-5.8.1-MPI-Linux-Python3.7-64bit.tar.gz" >> ParaView-5.8.1-MPI-Linux-Python3.7-64bit.tar.gz

  • Now extract the archive using:

Note: Paraview is placed in the same filesystem as Palabos and then backed up with the same EBS snapshot (See Step C, part 7).

14. Create a snapshot of your volume that contains a copy of our software ready for use.

  • In the EC2 dashboard, select Elastic Block Store and then select Volumes. Search for your volume.

15. Select your Volume and click on Actions > Create Snapshot. Use a description for your snapshot and then select Create Snapshot. Write down the ID of the snapshot for later use. See more in the AWS documentation.

Figure 9. Under Actions, select Create Snapshot.

Figure 9. Under Actions, select Create Snapshot.

Step C

Install the AWS Command Line Interface (AWS CLI). We recommend creating an AWS Cloud9 environment, as some packages come pre-installed.

1. In the AWS Management Console, use the search bar to locate AWS Cloud9.

Figure 10. Locate AWS Cloud9 by using the AWS Management Console search bar.

Figure 10. Locate AWS Cloud9 by using the AWS Management Console search bar.

2. Choose Create Environment. Name your environment MyCloud9HPCBlogEnv. Select Next Step. Leave the options by default and select Next Step.

3. Review and select Create Environment. This takes few minutes to set up.

Figure 11.

Figure 11. The AWS Cloud9 interface.

4. On the terminal window, install the AWS CLI by running the following command:

sudo pip3 install awscli -U --user

5. Upload your Key File to your AWS Cloud9 (File > Upload Local Files). Then run the following command on terminal to change permissions.

$ chmod 400 KeyPairHPCBlog.pem

6. Use Amazon Simple Storage Service (Amazon S3) to store software, an object storage that offer a secure, scalable, durable way for uses cases such as web sites, data lakes, and archives among others, such as HPC application input data and results. On a terminal window, run the following command:

$ aws s3api create-bucket --bucket myhpcblogs3bucket --region us-east-1

  • You can see that the API responded with the name of our created bucket.

Figure 12. The API responds with the name of your created bucket; here, it is myhpcblogs3bucket.

Figure 12. The API responds with the name of your created bucket; here it is myhpcblogs3bucket.

  • Create a file called postInstall.sh to install ImageMagick (prereq. For Paraview) with the following content:

#!/bin/bash
sudo yum install -y ImageMagick

  • Run the following terminal command to upload this file to your bucket:

aws s3 cp postInstall.sh s3://myhpcblogs3bucket/postInstall.sh

Step D

Install and set up AWS ParallelCluster, a no-cost, open-source cluster management and deployment tool. It leverages AWS CloudFormation to provision all the resources needed for your HPC cluster in an automated and secure manner:

sudo pip3 install aws-parallelcluster --upgrade

  • Configure AWS ParallelCluster.

pcluster configure

  • Use default values through the guided setup.
  • Open your configuration file. Modify your ~/.parallelcluster/config file to define the Amazon Elastic Block Store (Amazon EBS) volume configuration settings for the snapshot created on Step B that contains our software. Also, if you create the optional S3 bucket.

The following configuration file creates a graphics head node (g3.4xlarge) and grabs the postInstall script from S3 Bucket. This creates a VPC and subnet for us. We also specify the snapshot ID, as well as the instance type of the compute nodes and number of nodes.

*Note: Check the bold areas and please notice the italicized letters—you need to substitute your own values here.

[aws]
aws_region_name = us-east-1

[aliases]
ssh = ssh {CFN_USER}@{MASTER_IP} {ARGS}

[global]
cluster_template = default
update_check = true
sanity_check = true

[cluster default]
key_name = KeyPairHPCBlog
base_os = alinux2
scheduler = slurm
s3_read_resource = arn:aws:s3:::myhpcblogs3bucket/*
post_install = s3://myhpcblogs3bucket/postInstall.sh
post_install_args = 'R curl wget'
master_instance_type = g3.4xlarge
vpc_settings = default
queue_settings = compute
ebs_settings = custom
dcv_settings = dcv

[vpc default]
vpc_id = vpc-xxx
master_subnet_id = subnet-xxx

[ebs custom]
shared_dir = Palabos
ebs_snapshot_id = snap-xxx
volume_type = gp2

[queue compute]
enable_efa = true
disable_hyperthreading = true
placement_group = DYNAMIC
compute_resource_settings = default

[compute_resource default]
instance_type = c5n.18xlarge
max_count = 10
initial_count = 1
min_count = 0

[dcv dcv]
enable = master
port = 8443
access_from = 0.0.0.0/0

  • Now check that your Amazon EBS snapshot is ready and create your first HPC cluster with the name myhpcblogcluster by running the following command:

pcluster create myhpcblogcluster

  • After this, we are ready to connect to our master node using ssh and submit our HPC job to the cluster using slurm.

Step E

Submitting your application.

1. Go to the EC2 dashboard. Select the instance Master. Connect to it using SSH and your key.

2. Create a file called slurm-job.sh by using the following command:

nano slurm-job.sh

3. Copy and paste the following content.

This specifies to run a slurm batch job using 10 computing nodes, with 36 task on each node using mpirun. Adjust the bolded numbers if you plan to run a smaller showcase model.

#!/bin/bash
#SBATCH --job-name=PALABOS
#SBATCH --output=PALABOS_%j.out
#SBATCH --nodes=10 # Number of nodes
#SBATCH --ntasks=360 # Number of MPI ranks
#SBATCH --ntasks-per-node=36 # Number of MPI ranks per node
#SBATCH --cpus-per-task=1 # Number of OpenMP threads for each MPI process/rank
date;hostname;pwd
#export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
cd /Palabos/palabos-v2.2.1/examples/showCases/boussinesqThermal3d
mpirun -np 360 ./rayleighBenard3D 1000
date

 Save the file.

4. Run the following command to run our job:

sbatch slurm-job.sh

If the submission was successful, the system will display a jobid.

5. You can use the following commands to have information about your cluster:

sinfo
squeue

6. You can track the progress by reading the .out file by using the following command:

cat PALABOS_jobid.out

You see the date being printed when it finishes. If you decide to cancel the job, you can do this using the jobid (returned from the squeue command) and using the following command:

scancel jobid

7. We use DCV to visualize the results. Type this command in AWS Cloud9:

pcluster dcv connect -k ~/environment/KeyPairHPCBlog.pem myhpcblogcluster

Place the returned URL in a browser, and you see a Linux desktop, select Activities and then Files. Go to Palabos folder, open Paraview.

8. In Paraview, go to File and then Open. Search for the results in the palabos folder, i.e palabos-v2.2.1/examples/showCases/bousinessThermal3d/tmp. Select the .vti group files. On the properties section, click on apply and then select temperature and volume. You now see the volume on the screen.

Figure 13.

Figure 13. Paraview interface showing the results of the cluster in the bousinessThermal3d example.

Now, you can bring up an HPC cluster, prepare, install, and run applications, and visualize the results—all without needing a single piece of hardware beyond your laptop.

To avoid incurring future charges, we recommend deleting the resources created in this exercise, making up the cluster. This makes the cluster ephemeral.

1. Cluster: From your AWS Cloud9 Environment (or your PC), stop and delete the cluster by using the following commands:

pcluster stop myhpcblogcluster
pcluster delete myhpcblogcluster

2. EC2 instances: Go to the AWS Management Console. Select your preparation instance (the EC2 instance we used to install Palabos and Paraview and create the EBS snapshot). Under Actions, select Terminate. This will delete the instance with no possibility of rolling back.

3. Volumes and snapshots: Go to the EC2 dashboard. On the left menu, locate the section Elastic Block Store and do click on Volumes. When the page loads, search for your volume, and select Actions, then Delete Volume. Repeat this process to delete the snapshot.

4. AWS Cloud9: Go to the Clou9 Service dashboard. Select your instance. Select the delete button.

5. Amazon S3: Go to the Amazon S3 dashboard. Select your bucket. First empty the bucket by doing click on the Empty button. Then delete it.

Conclusion

We just showed you how to bring up an HPC cluster, prepare, install, and run applications and visualize the results, all without needing a single piece of hardware. You can use the same steps to build and install any application and “save it” on an EBS snapshot, start/stop a cluster multiple times.

This allows you to create ephemeral application-specific clusters just the way you need them when you need them and therefore use the cloud for all your HPC needs.

Learn more about HPC on AWS and read more stories on HPC on the AWS Public Sector Blog.

Subscribe to the AWS Public Sector Blog newsletter to get the latest in AWS tools, solutions, and innovations from the public sector delivered to your inbox, or contact us.