4. Configuration

Configuring and Setting Up the Federated Learning Environment

Understanding the Configuration File

Cifer's FedLearn uses a config.json file located in the root directory of the project for configuration. This file contains all necessary settings for both local testing and distributed federated learning.

To locate the file:

bash
cd path/to/cifer
ls config.json

Configuration Parameters

The config.json file includes the following key parameters:

  • server_address: The gRPC server address and port (e.g., "localhost:8080").

  • num_rounds: Number of federated learning rounds.

  • strategy: The federated learning strategy (e.g., "FedAvg" for Federated Averaging).

  • grpc_max_message_length: Maximum gRPC message size in bytes.

  • mode: "local" for single-machine simulation, "collaborative" for distributed setup.

  • min_num_clients: Minimum number of clients required for a round (in collaborative mode).

  • min_sample_size: Minimum number of samples required from each client.

  • model_name: Name of the model to be used.

  • local_epochs: Number of local training epochs per round.

  • batch_size: Batch size for local training.

  • learning_rate: Learning rate for local training.

  • data_dir: Directory containing the dataset.

  • output_dir: Directory for saving output files.

Local Mode Configuration

For local testing and development, use these settings:

json
{
  "server_address": "localhost:8080",
  "num_rounds": 10,
  "strategy": "FedAvg",
  "grpc_max_message_length": 104857600,
  "mode": "local",
  "num_clients": 3,
  "local_epochs": 5,
  "batch_size": 32,
  "learning_rate": 0.01,
  "model_name": "simple_cnn",
  "dataset": "mnist",
  "data_dir": "./data",
  "output_dir": "./output"
}

Collaborative Mode Configuration

For distributed federated learning, modify the configuration as follows:

json
{
  "server_address": "0.0.0.0:8080",
  "num_rounds": 50,
  "strategy": "FedAvg",
  "grpc_max_message_length": 209715200,
  "mode": "collaborative",
  "min_num_clients": 3,
  "min_sample_size": 1000,
  "model_name": "resnet18",
  "local_epochs": 5,
  "batch_size": 64,
  "learning_rate": 0.001,
  "data_dir": "./local_data",
  "output_dir": "./collaborative_output"
}

Setting Up the Server

  1. Choose a machine with good network connectivity and sufficient resources.

  2. Update the config.json on the server with appropriate settings (use the collaborative mode configuration as a starting point).

  3. Start the server:

python
from cifer import fedlearn

# Initialize and start FedLearn server
fedlearn.start_server('config.json')

Setting Up Clients

For each participating client:

  1. Install Cifer and ensure access to the local dataset.

  2. Update the config.json on each client, changing the server_address to match the server's address.

  3. Start the client:

python
from cifer import fedlearn

# Initialize and start FedLearn client
fedlearn.start_client('config.json')

Network and Security Considerations

  • Ensure the server's firewall allows incoming connections on the specified port.

  • If using secure gRPC, generate and distribute SSL/TLS certificates, and update configurations accordingly.

Data Preparation and Model Selection

  • Ensure each client has access to its local dataset in the specified data_dir.

  • Verify data format consistency across all clients.

  • Choose an appropriate model_name in the configuration, or implement and specify a custom model.

Monitoring and Logging

  • Monitor server output for overall progress and issues.

  • Check individual client logs for local training progress and client-specific problems.

Troubleshooting Common Issues

  • Connection Problems: Verify network settings and firewall configurations.

  • Data Issues: Ensure correct data format and sufficient samples on all clients.

  • Resource Constraints: Adjust batch_size and local_epochs if clients experience memory issues or slow processing.

  • Model Compatibility: Verify that the chosen model is compatible with the dataset and task.

Participant Roles and Responsibilities

In a federated learning environment, different participants have distinct roles and responsibilities:

1. Server Administrator:

  • Set up and maintain the central server

  • Configure global parameters (rounds, minimum clients, etc.)

  • Monitor the overall federated learning process

  • Ensure security and privacy protocols are followed

  • Manage model aggregation and distribution

2. Client Participants:

  • Prepare and maintain local datasets

  • Ensure local system meets hardware and software requirements

  • Run client software and participate in training rounds

  • Protect the privacy of local data

  • Report any issues to the server administrator

3. Data Scientists/ML Engineers:

  • Design and implement the shared model architecture

  • Define evaluation metrics for the federated model

  • Analyze aggregated results and refine the model as needed

  • Collaborate with server administrator on hyperparameter tuning

4. Privacy Officers (if applicable):

  • Review and approve the federated learning setup

  • Ensure compliance with data protection regulations

  • Monitor the process for potential privacy risks

  • Advise on implementing additional privacy-preserving techniques

5. IT Support:

  • Assist with network configuration and troubleshooting

  • Ensure proper firewall settings and network security

  • Help with software installation and updates on client machines

6. Project Manager:

  • Coordinate between different participants

  • Ensure timely execution of federated learning rounds

  • Manage communication between technical and non-technical stakeholders

By clearly defining these roles and responsibilities, organizations can ensure a smooth and effective federated learning process while maintaining data privacy and security.

Last updated