Example: Simulation on GPU cluster
==================================
This describes how to set up the environment to run APPFL using either gRPC or MPI in a GPU cluster for simulation, which is useful for benchmarking the performance of different FL algorithms on various datasets. In this example, we partition the CIFAR10 in an non-independent and identically distributed (non-IID) manner and train a Resnet-18 model using the federated learning.
gRPC Simulation on Polaris Cluster
----------------------------------
.. note::
This section is generated based on the `Polaris supercomputer `_ at the Argonne Leadership Computing Facility (ALCF), which uses Portable Batch System (PBS) as it job scheduler.
Loading Modules
~~~~~~~~~~~~~~~
Most HPC clusters use `modules `_ to manage the environment, and the module configuration may vary depending on the clusters you use. On the Polaris supercomputer, load necessary module via the following commands:
.. code-block:: bash
module use /soft/modulefiles
module load conda
module save
Creating Conda Environment and Installing APPFL
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Now, we can create a conda environment and install APPFL.
.. code-block:: bash
conda create -n appfl python=3.10 # or conda create -p /path/to/env python=3.10
conda activate appfl
git clone https://github.com/APPFL/APPFL.git
cd APPFL
pip install -e ".[examples]"
cd examples
Creating Batch Script
~~~~~~~~~~~~~~~~~~~~~
The Polaris supercomputer uses PBS workload manager for job management. Below is an example of a batch script to run the gRPC simulation on the Polaris cluster which launch one server and two clients. Please replace ```` with your project name, ```` with the name of the conda environment, and ```` with the path to the APPFL repository.
.. code-block:: bash
:caption: submit.sh
#!/bin/bash
#PBS -A
#PBS -q debug
#PBS -l walltime=00:15:00
#PBS -l nodes=1:ppn=64
#PBS -l filesystems=home:eagle:grand
#PBS -m bae
# Set proxy
export HTTP_PROXY="http://proxy.alcf.anl.gov:3128"
export HTTPS_PROXY="http://proxy.alcf.anl.gov:3128"
export http_proxy="http://proxy.alcf.anl.gov:3128"
export https_proxy="http://proxy.alcf.anl.gov:3128"
export ftp_proxy="http://proxy.alcf.anl.gov:3128"
export no_proxy="admin,polaris-adminvm-01,localhost,*.cm.polaris.alcf.anl.gov,polaris-*,*.polaris.alcf.anl.gov,*.alcf.anl.gov"
# Load modules and activate conda environment
module use /soft/modulefiles
module load conda
conda activate
cd /APPFL/examples
# Launch the server
python ./grpc/run_server.py --config ./resources/configs/cifar10/server_fedavg.yaml &
sleep 20
echo "Server is ready"
# Launch the clients
python ./grpc/run_client.py --config ./resources/configs/cifar10/client_1.yaml &
python ./grpc/run_client.py --config ./resources/configs/cifar10/client_2.yaml &
wait
.. note::
On Polaris, it is important to set the proxy environment variables to access the internet from the cluster.
You can submit the script to run via the following command.
.. code-block:: bash
qsub submit.sh
Two output files, ``submit.sh.o{job_id}`` and ``submit.sh.e{job_id}``, are generated when the script starts to run. You can check the output in real-time by running the following command.
.. code-block:: bash
tail -f -n 10 submit.sh.o{job_id}
# or
tail -f -n 10 submit.sh.e{job_id}
MPI Simulation on Delta Cluster
-------------------------------
.. note::
This tutorial is generated based on the `Delta supercomputer `_ at the National Center for Supercomputing Applications (NCSA), which uses Slurm as it job scheduler.
Loading Modules
~~~~~~~~~~~~~~~
Most HPC clusters use `modules `_ to manage the environment, and the module configuration may vary depending on the clusters you use. On the Delta supercomputer, the following modules are loaded.
.. code-block:: bash
1) gcc/11.4.0 2) openmpi/4.1.6 3) cuda/11.8.0 4) cue-login-env/1.0 5) slurm-env/0.1 6) default-s11 7) anaconda3_gpu/23.9.0
You need to run ``module save`` to save the current module configuration.
.. code-block:: bash
module save
Creating Conda Environment and Installing APPFL
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Now, we can create a conda environment and install APPFL.
.. code-block:: bash
conda create -n appfl python=3.10 # or conda create -p /path/to/env python=3.10
conda activate appfl
git clone --single-branch --branch main https://github.com/APPFL/APPFL.git
cd APPFL
pip install -e ".[mpi,examples]"
cd examples
Creating Batch Script
~~~~~~~~~~~~~~~~~~~~~
The Delta supercomputer uses Slurm workload manager for job management.
.. code-block:: bash
:caption: submit.sh
#!/bin/bash
#SBATCH --mem=150g # required number of memory
#SBATCH --nodes=1 # number of required nodes
#SBATCH --ntasks-per-node=6 # number of tasks per node [SHOULD BE EQUAL TO THE NUMBER OF CLIENTS+1]
#SBATCH --cpus-per-task=1 # <- match to OMP_NUM_THREADS
#SBATCH --partition=gpuA40x4 # <- or one of: gpuA100x4 gpuA40x4 gpuA100x8 gpuMI100x8
#SBATCH --account= # <- one of: replace xxxx with your project name
#SBATCH --job-name=APPFL-test # job name
#SBATCH --time=00:15:00 # dd-hh:mm:ss for the job
#SBATCH --gpus-per-node=1
#SBATCH --gpu-bind=none
source ~/.bashrc
conda activate appfl
cd /examples
mpiexec -np 6 python ./mpi/run_mpi.py --server_config ./resources/configs/cifar10/server_fedcompass.yaml \
--client_config ./resources/configs/cifar10/client_1.yaml
The script can be submitted to the cluster using the following command.
.. code-block:: bash
sbatch submit.sh
You may see the output.
.. code-block:: bash
Submitted batch job {job_id}
The output file ``slurm-{job_id}.out`` is generated when the script starts to run, and you can check the output in real-time by running the following command.
.. code-block:: bash
tail -f -n 10 slurm-{job_id}.out
Multi-GPU Training
------------------
APPFL supports distributed data parallelism (DDP) for multi-GPU training. To enable DDP, users only need to specify the device as a list of cuda devices in the client configuration file, for example (``examples/resources/configs/cifar10/client_1_multigpu.yaml``):
.. code-block:: yaml
client_id: "Client1"
train_configs:
# Device
device: "cuda:0,cuda:1,cuda:2,cuda:3"
...
.. note::
When you are using multi-GPU training, please make sure the training and validation batch size are divisible by the number of GPUs.
Below provides the batch script to run the multi-GPU training on Delta cluster using MPI.
.. code-block:: bash
:caption: submit.sh
#!/bin/bash
#SBATCH --mem=150g # required number of memory
#SBATCH --nodes=1 # number of required nodes
#SBATCH --ntasks-per-node=6 # number of tasks per node [SHOULD BE EQUAL TO THE NUMBER OF CLIENTS+1]
#SBATCH --cpus-per-task=1 # <- match to OMP_NUM_THREADS
#SBATCH --partition=gpuA40x4 # <- or one of: gpuA100x4 gpuA40x4 gpuA100x8 gpuMI100x8
#SBATCH --account= # <- one of: replace xxxx with your project name
#SBATCH --job-name=APPFL-test # job name
#SBATCH --time=00:15:00 # dd-hh:mm:ss for the job
#SBATCH --gpus-per-node=4
#SBATCH --gpu-bind=none
# Activate conda environment
source ~/.bashrc
conda activate appfl
cd /examples
# Launch the experiment
mpiexec -np 6 python ./mpi/run_mpi.py --server_config ./resources/configs/cifar10/server_fedcompass.yaml \
--client_config ./resources/configs/cifar10/client_1_multigpus.yaml
Below provides the batch script to run the multi-GPU training on Polaris cluster using MPI.
.. code-block:: bash
:caption: submit.sh
#!/bin/bash
#PBS -A
#PBS -q debug
#PBS -l walltime=00:15:00
#PBS -l nodes=1:ppn=64
#PBS -l filesystems=home:eagle:grand
#PBS -m bae
# Load modules and activate conda environment
module use /soft/modulefiles
module load conda
conda activate
cd /APPFL/examples
# Launch the experiment
mpiexec -np 6 python ./mpi/run_mpi.py --server_config ./resources/configs/cifar10/server_fedcompass.yaml \
--client_config ./resources/configs/cifar10/client_1_multigpus.yaml