A study of model aggregator algorithms in Federated Learning

What is Federated learning (FL)? 

Distributed model training in the edge. The principle of this approach is to aggregate models learned on distributed clients in order to obtain a new, more general model. The results model is then redistributed to clients for further training. 

FL struggles with Data Heterogeneity? As each client train a model in federation with non-identical and individual (non-iid) data, during model aggregation the next version of aggregated model might have reduced accuracy when deployed in client settings due to non-iid. 

Goals of Federated Learning Model aggregator? Heterogeneity of devices and user’s challenges ML with double objective • Generalization: Achieving over all target accuracy with federated learning settings (pervasive devices). • Personalization: Achieving target accuracy on each client in AI tasks with already seen data or new data. 

Strategies adopted in Model Aggregated algorithms? Key point in federated learning is the way specialized models are aggregated at the server In case of deep learning – two families of algorithms implementing different strategies are considered • First Strategy: Emphasize generalization – aggregation algorithm considers local models and build a new model that potentially calls into question all layers and all weights associated with neurons. This approach is exemplified by FedAvg and FedMA • Second Strategy: In contrast focus more on client specialization. Thus, the algorithm does not question certain parts of the local models. Specifically, only the local models base layers are sent to the server for generalization, while the last layers are kept unchanged. This approach is exemplified by the FedPer algorithm. 

Different type of model aggregator algorithms available/used? 

        Federated Averaging: (FedAvg) • Starts with random initialization of a neural model (server is in charge of coordinating and managing model transfers with client devices) • Resulting model is sent to clients to start local training from it • When on-device training is finished the weights of the local models are sent to the server • Aggregation is done in a weighted averaging manner where clients with more data influence more significantly the newly aggregated model. • FedAvg is however, has naive form of aggregation due to its coordinate wise averaging that may lead to sub-optimal solutions. Due to non-iiD data neurons in the same coordinate may be opted for entirely different purposes due to client’s specialization • Averaging neurons that are drastically different causes decrement results. 

        Federated learning with Personalization layers (FedPer): • Similar to FedAvg the way it computes new weights in the aggregated model. • However, it differs strongly on the parts of the model that are considered during aggregation • Clients only communicate the neural model’s base layer to the server and retrain the other layers • Underlying idea is base layer deals with “representation learning” and can be advantageously shared by clients through aggregation • Upper layers are more concerned with decision making which is more specific to each client • FedPer clients better handle various inputs (in the base layers) while being able to specialize in their particular data (in the upper layers). • However, aggregated model, on the server-side is only partial and is not usable for decision-making due to missing layers • FedPer can be seen as an adaption of the Transfer Learning methodology into a federared learning scheme. Studies have shown thtat it can surpass centralized learning and FedAvg approach in the HAR field. 

      Federated Matched Averaging (FedMA): • *FedMA modifies the neural model architecture by incorporating a layer-wise aggregation process where similar neurons can be fused, and new ones can be added. • This approach treats the number of nodes in a layer as a sub-problem to solve rather than a hyper-parameter to be set as an extension to CNN and RNN • *FedMA Considers that neuron in the NN layer are permutation invariant (changing the layer/neuron order will not impact the NN output). • *The algorithm central intuition is that all clients can contain neurons that are similar and should be merged together. **All neurons in the same cluster are averaged to produce a global neuron** • [[To find out which neuron can be fused, the algorithm uses a 2D permutation matrix that is computed iteratively from increasing rank layers.]] • Experiments with deep CNN and LSTM memory architectures show that the FedMA algorithm outperforms FedAvg on CV datasets. 

    Fededated Distance (FedDist): • Novel neuron matching and detail a federated learning algorithm based on a Euclidian distance dissimilarity measurement. • This algorithm includes some elements of FedAvg and FedMA, is called Federated Distance (FedDist) for its emphasis on computing distances of neurons of similar coordinates when comparing clients and server models. • FedDist recognized, like FedMA, that some client models may diverge because of heterogeneous, non-IID data. This results in neurons that cannot be matched with neurons from other models (because of weights that are too far apart). • FedDist also recognized that the model structure is relatively stable, which means that neurons with the same coordinates play a similar role. This view provides an opportunity to build a coordinate-wise approach • FedDist identifies diverging neurons using Euclidean distances. These neurons that are specific to certain clients are added to the aggregated model as new neurons. • This new neuron adding scheme can lead to larger models that are able to generalize better. As new neurons are added to a layer, a layer-wise training round is added in order to allow the neurons in the next layers to adjust to the new incoming neurons and weights. • To do that layer with the new neuron and those below are frozen, and the subsequent layers are trained. When all layers have been treated, and a new aggregated model is computed. ◦ A global server model is computed from all clients. ◦ Outliers are identified using the Euclidean distance and added to the aggregated model. • A penalty function has also been implemented to raise the threshold as training continues to prevent the never-ending addition of new neurons. If a neuron in any of the clients holds an individual distance above the threshold, it is then added to the server model. The process is performed layer wise. At each communication round, it is performed on the first layer.

  Reference: https://arxiv.org/abs/2110.10223 Happy reading..

No comments:

Post a Comment

Related Posts

Twitter Updates

Random Posts

share this post
Bookmark and Share
| More
Share/Save/Bookmark Share