Federated Machine Learning

Machine learning has been around for a long time. Even though the study and research in the field of AI date back to the 1950s, AI started to become popular and accessible quite recently. A paradigm shift in the way people looked at AI happened due to the development of advanced computer hardware and the availability of the tremendous amount of data at disposal marking the transition from the so-called period of AI winter to the modern era. 
Have you ever wondered why when we type something on our mobile or search something on the web, we see some suggestions based on what we have searched before or from our personal data? We know it is all happening because of AI. Well, how is it possible? Are these devices sending our personal data out? Before we address these questions, let’s take a look at a typical end to end machine learning pipeline.

End to end machine learning pipeline

Data collection/acquisition

Data from a variety of sources like user data from a website or sensor data from a machine is collected and stored.

Data aggregation

The data collected from different sources is then merged or aggregated to create a training database. So, data from a variety of location is then concatenated and stored on a server (cloud or some on-premise location)

Machine learning model development

Once the training database is ready, based on the requirement, a machine learning algorithm is chosen for the model development and the model is trained on the training database.

Model deployment

The trained model is then sent out to the edge devices or exposed to applications via some API so that the model is accessible to clients. Two obvious limitations to this approach are as follows:

• To move data from the source (edge devices) to the central server (the machine where training happens), significant network bandwidth may be required which makes frequent training of the model tedious as data has to be moved whenever training is to be performed

• Data from a variety of devices or users are available at a single location which may raise some privacy and security concerns.

To overcome these challenges in an elegant manner, a new framework for machine learning model development is introduced called Federated Learning.

Federated Machine learning

Federated learning is a distributed machine learning approach that enables training of a machine learning model on a large corpus of decentralized data. Federated learning is desirable when the dataset is huge and spread across different devices. Unlike the traditional ml paradigm, federated learning distributes the machine learning process from a single location of the server over to the edge devices. Let’s look at how federated learning works. The following are the typical steps involved in a federated machine learning pipeline:
Fig 1. Federated learning approach

Machine learning model deployment on edge devices

The most recent version of the machine learning model is deployed on the edge devices to perform specific tasks like predicting the next word in a sentence or to provide a recommendation about specific products. These models rely on the data from within the device like historical data to make a prediction.

Training of the machine learning model on edge devices

Along with predicting on the current data, a parallel process stores the data to be used for the machine learning development. Machine learning models are then trained on this dataset on the edge devices periodically. Since the dataset under consideration for model development is relatively small (as the dataset is from a single device) training process doesn’t take much time and doesn’t require advanced hardware like GPU.
The trained models from edge devices are then sent to the cloud data centers at a regular interval. The updates are pushed once the training process is complete and when sufficient bandwidth is available to push the model to the cloud. The trained models received from millions of edge devices are then averaged out to create a consolidated model. The consolidated model is then pushed to the edge devices

Advantages of Federated Machine Learning

1. From a big data perspective, federated learning is much more scalable than the traditional pipeline as it eliminates the need for reliable network bandwidth and advanced hardware

2. The most important highlight of this approach is the privacy and security it provides. Since only the trained model from individual devices are sent out, private user information remains safe on the edge devices

3. Less data communication bandwidth requirement

Application of Federated Machine Learning

Federated machine learning could be used in a variety of industries like finance, marketing, and healthcare. Because of the security it provides, it is being used extensively when data to be dealt with is extremely sensitive like medical records. A team comprising of researchers from MIT CSAIL, Harvard med school, and Tsinghua University have successfully demonstrated the use of federated machine learning to analyze electronic medical records for patient mortality and hospital stay time prediction.

Leave a Reply

Your email address will not be published. Required fields are marked *

five × 1 =