Simulating PPFL¶
This package provides users with the capability of simulating PPFL on either a single machine or an HPC cluster.
Note
Running PPFL on multiple heterogeneous machines is described in Training PPFL.
We describe how to simulate PPFL with a given model and datasets. For simulation, we assume that test_data is available to validate the trained model.
Serial run¶
Serial runs begin simply by calling the following API function.
Some remarks are made as follows:
Parameter
cfg: DictConfigreads the configuration of runs. See How to set configuration for details about configuration.Parameters
model,train_data, andtest_datashould be given by users; see User-defined model and User-defined dataset.
Parallel run with MPI¶
We can parallelize the PPFL simulation by usinig MPI through mpi4py package.
The following two API functions need to be called for parallelization.
The server and the clients begin by run_server and run_client, respectively, where MPI communicator (e.g., MPI.COMM_WORLD in this example) is given as an argument.
Note
We assume that MPI process 0 runs the server, and the other processes run clients.
Note
mpiexec may need to specify additional argument to use CUDA: --mca opal_cuda_support 1