Simulating PPFL

This package provides users with the capability of simulating PPFL on either a single machine or an HPC cluster.

Note

Running PPFL on multiple heterogeneous machines is described in Training PPFL.

We describe how to simulate PPFL with a given model and datasets. For simulation, we assume that test_data is available to validate the trained model.

Serial run

Serial runs begin simply by calling the following API function.

Some remarks are made as follows:

  • Parameter cfg: DictConfig reads the configuration of runs. See How to set configuration for details about configuration.

  • Parameters model, train_data, and test_data should be given by users; see User-defined model and User-defined dataset.

Parallel run with MPI

We can parallelize the PPFL simulation by usinig MPI through mpi4py package. The following two API functions need to be called for parallelization.

The server and the clients begin by run_server and run_client, respectively, where MPI communicator (e.g., MPI.COMM_WORLD in this example) is given as an argument.

Note

We assume that MPI process 0 runs the server, and the other processes run clients.

Note

mpiexec may need to specify additional argument to use CUDA: --mca opal_cuda_support 1