Adrián F.N. - Blog - MPI Cluster

In this tutorial I will explain how to set up an MPI cluster in Ubuntu from scratch. In this case I will use four Odroid C2 with Ubuntu 22.04 LTS, but this configuration should be general for any other SoC (System on Chip) and even more modern Ubuntu versions.

I did this project during my second year in Computer Engineering, so I will try to focus this tutorial on people who are facing this kind of scenario for the first time, trying to explain in detail even those things that may be trivial for experienced users. I hope this will make it easier for beginners ;-)

Getting Started

Installing the OS

First, you must flash the microSD card of your SoC with the version of Ubuntu you want to use. The full installation tutorial can be found on the official Odroid website. In summary, you will need to do the following steps:

Download the ISO image: in my case, I will use Ubuntu MATE 22.04 image, which includes a graphical interface.
Install Etcher: this tool will allow you to flash your microSD card with the downloaded ISO.
Boot the system: after inserting the card, the system will boot the OS automatically (without any additional configuration process). In the case of Ubuntu MATE 22.04, the login credentials will be odroid/odroid.

If you are using a different SoC you will need to download the specific ISO image for that system. The installation process should be the same between SoCs.

Configuring the cluster network

Once all our SoCs are flashed, they must be connected to the same local network. In my case I connected them all by cable to a shared switch. In case you cannot connect them by cable, then connect them to the same wireless network, but it is not recommended since it will cause you to lose performance in your MPI applications.

After booting all your SoCs, each device will receive a private IP address via DHCP. Now we need to record the IP addresses of each device. You can check it using the ifconfig command. In my case I have obtained the following information:

Hostname	IP
`odroid1`	192.168.0.101
`odroid2`	192.168.0.102
`odroid3`	192.168.0.103
`odroid4`	192.168.0.104

Optionally, in case you want to ensure that these IP addresses are always assigned to the devices in your cluster, it is recommended that you assign them static addresses in your router DHCP configuration.

In addition, we will download and enable on each device an SSH server with the following commands:


sudo apt-get install openssh-server
sudo systemctl enable ssh
sudo systemctl start ssh

With our SSH service ready and the IP addresses obtained, now we are able to work from just one of our SoCs (or from another computer) connecting to a remote terminal via SSH (it is recommended to work this way from now on since reconnecting wires is not the most practical way). So, to connect to another device, we will use the following command:


ssh [username]@[ip]

username is the name of the user that we want to connect to (in my case it is odroid for all devices). ip is the IP address of the device we want to connect. After that command, you will be asked with the user password of your specified user. Finally, you will be logged into the remote device.

Configuring the cluster nodes

There are some remaining steps to finish the initial configuration. First we must configure our /etc/hosts file in order to let each device know the other nodes of the cluster through its hostname instead of its IP address. In my case, and following the table shown before, I added the following entries at the end of my /etc/hosts file in each device:


odroid1    192.168.0.101
odroid2    192.168.0.102
odroid3    192.168.0.103
odroid4    192.168.0.104

Now you can connect to each device using ssh [username]@[hostname] instead of specifying its IP address.

After that, we must set a hostname for each device. In my case I will use for each one the hostnames I have already specified above (but it does not necessarily have to be the same). To do this we will use the following command in each node:


hostnamectl set-hostname [hostname]

Important! Each device must have a unique hostname, otherwise you will encounter problems when running MPI.

Finally, we must create a user that will be used to run the MPI applications. This is needed since MPI requires it to work with SSH. That user must have the same name in all devices. In my case I will use the mpiuser user. To create it, we will use the following command (just set a password when prompted, you can skip all other fields):


sudo adduser mpiuser

Configuring the SSH keys

MPI requires connecting to the different nodes via SSH in order to work. However, each time we connect we are prompted for a password. Therefore, we are going to create SSH keys on each node and distribute them to the rest of the nodes in the cluster, so we will be able to connect between all the devices without using the password. To do so, follow the steps below in each node:


su - mpiuser

Generate an SSH key pair:


ssh-keygen -t rsa

Copy the public key to the authorized_keys file of all the devices in the MPI cluster:
```
ssh-copy-id odroid1
ssh-copy-id odroid2
ssh-copy-id odroid3
ssh-copy-id odroid4
                                
```
Note that there is no need to specify the username since all the devices have the mpiuser user and we are connecting automatically to it.

Configuring NFS

In order to be able to run an application, MPI needs that the executable is available on all nodes and, additionally, in the same path (just to make it simpler). This is achieved by creating a shared directory between nodes via NFS. To do this we will need to configure a master node and the different slave nodes.

We start by installing NFS on each node:


sudo apt-get install nfs-common

Then, we have to create the shared folder on each node and transfer its ownership to the mpiuser user:


sudo mkdir -P /home/mpiuser/nfs_shared
sudo chown mpiuser:mpiuser /home/mpiuser/nfs_shared

Configuring the NFS master node

The master node will store the contents of the shared directory in its own storage and the rest of the nodes will be able to access it through NFS. So in first place, select which node will act as the master node. After that, install on it the NFS server package and enable it, just as we did before with SSH:


sudo apt-get install nfs-kernel-server
sudo systemctl enable nfs-kernel-server
sudo systemctl start nfs-kernel-server

In order to allow the directory to be shared, we must then add the following entry to the file /etc/exports at the master node:


/home/mpiuser/nfs_shared *(rw,sync,no_subtree_check)

Finally, we have to apply the changes and restart the NFS service:


sudo exportfs -a
sudo systemctl restart nfs-kernel-server

Configuring the NFS slave nodes

The slave nodes will be able to access the shared directory through NFS. To do this, we must add the following entry to the file /etc/fstab at the slave nodes:


odroid1:/home/mpiuser/nfs_shared /home/mpiuser/nfs_shared nfs

In my case I have selected the odroid1 node as the master node, but you can select any other node and adapt the command accordingly.

Finally, we have to mount the shared directory by reloading the /etc/fstab file:


sudo mount -a

Configuring MPI

Now that we have configured the cluster, we can install the MPI library. In this case, we will use MPICH, but you can also try to install OpenMPI or another implementation. To install the mpich package, we will use the following command:


sudo apt-get install build-essential mpich

And that's it, we are done with our configuration. Now we can move on to the final step, which is to test the cluster.

Running your first MPI application

At this point, you should be able to connect between nodes, share files via NFS, and compile and run MPI applications. Now we are going to try our cluster by running a simple MPI application.

Remember to run your applications from the mpiuser user and to work on the nfs_shared directory.

We will use the following hello.c program to test our installation:


#include <stdio.h>
#include <mpi.h>

int main(int argc, char** argv) {
    // Initialize the MPI environment
    MPI_Init(NULL, NULL);

    // Get the number of processes
    int world_size;
    MPI_Comm_size(MPI_COMM_WORLD, &world_size);

    // Get the rank of the process
    int world_rank;
    MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);

    // Get the name of the processor
    char processor_name[MPI_MAX_PROCESSOR_NAME];
    int name_len;
    MPI_Get_processor_name(processor_name, &name_len);

    // Print off a hello world message
    printf("Hello world from processor %s, rank %d out of %d processors\n",
              processor_name, world_rank, world_size);

    // Finalize the MPI environment.
    MPI_Finalize();
}

After creating the file, we can compile it with the following command:


mpicc -o hello hello.c

And finally, we can run it with the following command:


mpirun -np 1 -host odroid1 ./hello \
       -np 1 -host odroid2 ./hello \
       -np 1 -host odroid3 ./hello \
       -np 1 -host odroid4 ./hello

The output should be similar to the following:


Hello world from processor odroid1, rank 0 out of 4 processors
Hello world from processor odroid2, rank 1 out of 4 processors
Hello world from processor odroid3, rank 2 out of 4 processors
Hello world from processor odroid4, rank 3 out of 4 processors

In case you want to run the application specifying more processes per node than the physical cores it has, you must also specify the --use-hwthread-cpus parameter when calling mpirun, otherwise you will get an error.

Running an MPI application using a machinefile

We can also run the application using a machinefile. This file will allow us to specify the number of processes to run on each node in a more readable and reusable way. To do this, we will create a file called machinefile with the following content:


odroid1 slots=1
odroid2 slots=1
odroid3 slots=1
odroid4 slots=1

And then we can run the application with the following command, obtaining the same result as the previous one:


mpirun -f machinefile ./hello

Conclusion

In this tutorial, we have seen how to configure a basic cluster able to run MPI applications. I hope this tutorial has been useful to you and that you have been able to configure your own cluster!

If you have any questions or suggestions, please do not hesitate to contact me :-)

Configuring an MPI cluster on Ubuntu using Odroid C2

A quick guide designed for beginners