Example: Scaling Test of APPFL on GPU cluster

In this tutorial, we describe how to run federated learning (FL) experiments using APPFL on GPU clusters to simulate large-scale FL scenarios with hundreds or even thousands of clients. This is particularly useful for testing the scalability and performance of FL algorithms in a distributed environment.

MPI Simulation Scripts

We provide an MPI-simulation launching script for running large-scale FL experiments at examples/mpi/run_mpi_scaling.py, which takes two important configuration parameters:

  • gpu_per_node: Number of GPUs available on each compute node (default is 4).

  • clients_per_gpu: Number of clients sharing a single GPU (default is 1).

Example Script to Launch MPI Simulation

Below shows an example PBS script to launch a scaling test with 512 clients, where each GPU is shared by 8 clients. The script assumes that you have a GPU cluster with 17 nodes, each having 4 GPUs, and you want to run the simulation using MPI. Specifically,

  • In line 5, we allocates 1 nodes with 1 MPI slots for FL server, and 16 nodes with 32 MPI slots for FL clients.

  • The nodes for FL clients allocates 32 MPI slots as each node has 4 GPUs and each GPU is shared by 8 clients, resulting in 32 clients per node.

  • In line 17, we launch the MPI simulation with 513 processes (1 server + 512 clients) using the mpiexec command, and the $PBS_NODEFILE environment variable to specify the nodes allocated for the job.

 1#!/bin/bash
 2#PBS -A <project_name>
 3#PBS -q <queue_name>
 4#PBS -l walltime=01:00:00
 5#PBS -l select=1:ncpus=64:mpiprocs=1+16:ncpus=64:mpiprocs=32
 6#PBS -l filesystems=home:eagle:grand
 7
 8module load conda
 9conda activate /eagle/tpc/zilinghan/conda_envs/appfl
10cd /eagle/tpc/zilinghan/appfl/APPFL/examples
11
12export OMP_NUM_THREADS=1
13export OPENBLAS_NUM_THREADS=1
14export MKL_NUM_THREADS=1
15export NUMEXPR_NUM_THREADS=1
16
17mpiexec -np 513 --hostfile $PBS_NODEFILE python ./mpi/run_mpi_scaling.py --clients_per_gpu 8