Federated Learning (2/3)

This video belongs to the openHPI course Applied Edge AI: Deep Learning Outside of the Cloud. Do you want to see more?

Enroll yourself for free

Federated Learning (2/3)

Time effort: approx. 11 minutes

An error occurred while loading the video player, or it takes a long time to initialize. You can try clearing your browser cache. Please try again later and contact the helpdesk if the problem persists.

Scroll to current position

00:00Hello and welcome. In the last video I mentioned it that an essential difficulty in Federated learning is how to minimize
00:10the number of communications, even if doing so will increase the computation.
00:16The need for overall model optimization is worth considering.
00:21For example, the original Federated learning to train a model requests 1000 iterations.
00:28A better algorithm makes it SGD
00:31of the central server converge faster by generating a better gradient locally, which only needs for instance 100 iterations.
00:42So the communication is saved 10 times, which is the goal of the current algorithm design the local computation is performed
00:52when the device such as a mobile phone is charging so it will not cause too much inconvenience to the user.
01:01Next I would like to introduce to you the Federated Average method first time proposed by google.
01:11Let's shortly recap how the traditional distributed learning method works first at the edge.
01:18Note, it receives the up to date model ways from the central server.
01:25Then it uses the local data and obtain the ways to compute the gradients.
01:31It further passes the gradient to the central server
01:35for a global update.
01:38The central server receives gradient from all the edge nodes, then sum them up and updates the weights according to the formula
01:48actually learned before theta represents the weight parameters.
01:53Theta is the learning rate.
01:55So after its weight updates we just finished the current iteration, it may require many iterations to get the model to converge.
02:06Now let's take a look at the Federated Average method.
02:11Federated Average is a communication efficient algorithm requiring fewer iterations to make the model converge.
02:20First the edge node worker will also get the ways from the central server.
02:26Then it will repeat the following iterations at first use the local data and receive the ways to compute the gradient, then
02:36use the gradient to update the ways we call this step
02:40the local update.
02:42It will generally continue to update for several epochs and each epoch needs to traverse all the local data.
02:50Google's paper suggests that local updates repeat 1 - 5 times.
02:56There's always this way the local weight parameters are already different from received from the central server.
03:05After the local updates,
03:06the new weight parameters ascend to the server.
03:10Central server know that what is sent here is no longer the gradient.
03:17Why do we do that?
03:18Because doing so we can make more weights changed in one communication, not just one time gradient descent. After the central
03:27server received the ways of all edge nodes, it does not need to perform gradient descent.
03:34Still, it performs a simple calculation or weight average
03:39to obtain the new model weights.
03:41Now we can see why the name of the algorithm is called
03:45Federated averaging?
03:49The paper reports.
03:51The following experimental comparison results of SGD
03:55and Federated averaging
03:59the horizontal axis in the number of communications and the vertical axis
04:04they know the classification accuracy.
04:07You can see that fit average converge faster and has higher accuracy with the same communication overhead.
04:15This is exactly the goal of federated average, which is no to achieve a faster convergence rate and fewer communication times
04:26between two connections. Federal average allows the edge.
04:31nodes, do a lot of local computation in exchange for more efficient communication.
04:39So suppose we now require federal it average and SGD
04:43to do the same amount of computation at the edge
04:46node, then the convergence speed of Federated average will be slower than that of SGD.
04:54We know that in the scenario of Federated Learning the computational cost is relatively small while the communication cost
05:03is high.
05:05So Federated average algorithm is very practical.
05:09Another point from the figure is that with Federated Average one can use higher learning rate.
05:16So the result of the learning rates equals to 0.25 is better than that of 0.05.
05:24On the contrary, SGD
05:25with higher learning rates results in worse learning curves.
05:35According to the previous example, we can find that the communication content between age nodes and central server in Federated
05:44Learning is mainly the models ways
05:47and the gradient information, which is computed at the edge
05:52node. User data does not leave the local device, but the question is, are weight and gradient information secure enough?
06:02Let's take a look at the formula of gradient computation, we know that gradient is a derivative of the loss function with
06:12respect to a weight parameter. Here X,
06:16is a sample,.
06:17and Y represents a label. The mathematical transformation of local data obtains the gradients.
06:25Therefore, gradient carriers, certain information from the training data.
06:32So if the gradient contains the information of training data, we can reverse engineer the data from the gradient somehow.
06:40Unfortunately, the foreign literature shows that the training data information can be severelly leaked through the model ways
06:50and gradients. In another paper, the authors devise a new set of attacks to compromising influence data privacy in collaborative
07:03AI system when deep network model and the corresponding inference task are split and distributed to different nodes.
07:13So one malicious participants can accurately recover and arbitrary input fed into this system, even if he has no access
07:24to another participants data or computations. it can also recover the prediction APIs
07:31from querying this system.
07:37This paper, demonstrates that Federated learnings update leak unintended information about participants training data.
07:46And it develops several inference attacks to exploit this leakage.
07:52It shows that adversarial participants can infer the presence of exact data points.
08:00For example, the specific locations in another training data.
08:05In other words the membership inflerence. Second,
08:10the authors show how the adversary can infer properties that hold only for a subset.
08:17For example, the adversary can infer when a specific person first appears in the photos used to train the binary gender classifier,
08:28in this way by using the gradient information as features to train the binary classifier
08:34one can further in for other properties like age, race, disease, etcetera.
08:41All very sensitive personal data, personal information that must be protected.
08:48So, there are many ways to obtain data information through model ways or gradient information.
08:55This video only introduced
08:57simplest one, privacy protection algorithms are indeed
09:01which shows research directions in the Federated Learning.
09:07How is the different defense also critical? for example, adding noise to gradient to strengthen protection
09:16this method is called differential privacy.
09:20However, the sort of method secures the information but also decreases the accuracy and convergence speed of the model.
09:30Therefore, the privacy protection of Federated Learning is still a challenging task first of all, ensuring the robustness
09:38of learning and combining barriers
09:42security attacks is also a good research direction. Such as the poisoning attack.
09:49It is an attacker who pretends to be a participant to join the Federation and insert poison in data.
09:58This kind of poison data information will disrupt other participants and make the entire learning task a fail.
10:07For example, this kind of attack can let the Global Model make a specific mistake by adding a customized adversarial examples
10:17or living an attack back door. Overall how to defend the privacy leakage and improve the robustness are
10:27essential research directions in the Federated Learning.
10:33Thank you for watching