Simulating PPFL¶
This package provides users with the capability of simulating PPFL on either a single machine or an HPC cluster.
Note
Running PPFL on multiple heterogeneous machines is described in Training PPFL.
We describe how to simulate PPFL with a given model and datasets. For simulation, we assume that test_data
is available to validate the trained model.
Serial run¶
Serial runs begin simply by calling the following API function.
Some remarks are made as follows:
Parameter
cfg: DictConfig
reads the configuration of runs. See How to set configuration for details about configuration.Parameters
model
,train_data
, andtest_data
should be given by users; see User-defined model and User-defined dataset.
Parallel run with MPI¶
We can parallelize the PPFL simulation by usinig MPI through mpi4py
package.
The following two API functions need to be called for parallelization.
The server and the clients begin by run_server
and run_client
, respectively, where MPI communicator (e.g., MPI.COMM_WORLD
in this example) is given as an argument.
Note
We assume that MPI process 0 runs the server, and the other processes run clients.
Note
mpiexec
may need to specify additional argument to use CUDA: --mca opal_cuda_support 1