This video belongs to the openHPI course Applied Edge AI: Deep Learning Outside of the Cloud. Do you want to see more?
An error occurred while loading the video player, or it takes a long time to initialize. You can try clearing your browser cache. Please try again later and contact the helpdesk if the problem persists.
Scroll to current position
- 00:00Hello and welcome. In the last video I mentioned it that an essential difficulty in Federated learning is how to minimize
- 00:10the number of communications, even if doing so will increase the computation.
- 00:16The need for overall model optimization is worth considering.
- 00:21For example, the original Federated learning to train a model requests 1000 iterations.
- 00:28A better algorithm makes it SGD
- 00:31of the central server converge faster by generating a better gradient locally, which only needs for instance 100 iterations.
- 00:42So the communication is saved 10 times, which is the goal of the current algorithm design the local computation is performed
- 00:52when the device such as a mobile phone is charging so it will not cause too much inconvenience to the user.
- 01:01Next I would like to introduce to you the Federated Average method first time proposed by google.
- 01:11Let's shortly recap how the traditional distributed learning method works first at the edge.
- 01:18Note, it receives the up to date model ways from the central server.
- 01:25Then it uses the local data and obtain the ways to compute the gradients.
- 01:31It further passes the gradient to the central server
- 01:35for a global update.
- 01:38The central server receives gradient from all the edge nodes, then sum them up and updates the weights according to the formula
- 01:48actually learned before theta represents the weight parameters.
- 01:53Theta is the learning rate.
- 01:55So after its weight updates we just finished the current iteration, it may require many iterations to get the model to converge.
- 02:06Now let's take a look at the Federated Average method.
- 02:11Federated Average is a communication efficient algorithm requiring fewer iterations to make the model converge.
- 02:20First the edge node worker will also get the ways from the central server.
- 02:26Then it will repeat the following iterations at first use the local data and receive the ways to compute the gradient, then
- 02:36use the gradient to update the ways we call this step
- 02:40the local update.
- 02:42It will generally continue to update for several epochs and each epoch needs to traverse all the local data.
- 02:50Google's paper suggests that local updates repeat 1 - 5 times.
- 02:56There's always this way the local weight parameters are already different from received from the central server.
- 03:05After the local updates,
- 03:06the new weight parameters ascend to the server.
- 03:10Central server know that what is sent here is no longer the gradient.
- 03:17Why do we do that?
- 03:18Because doing so we can make more weights changed in one communication, not just one time gradient descent. After the central
- 03:27server received the ways of all edge nodes, it does not need to perform gradient descent.
- 03:34Still, it performs a simple calculation or weight average
- 03:39to obtain the new model weights.
- 03:41Now we can see why the name of the algorithm is called
- 03:45Federated averaging?
- 03:49The paper reports.
- 03:51The following experimental comparison results of SGD
- 03:55and Federated averaging
- 03:59the horizontal axis in the number of communications and the vertical axis
- 04:04they know the classification accuracy.
- 04:07You can see that fit average converge faster and has higher accuracy with the same communication overhead.
- 04:15This is exactly the goal of federated average, which is no to achieve a faster convergence rate and fewer communication times
- 04:26between two connections. Federal average allows the edge.
- 04:31nodes, do a lot of local computation in exchange for more efficient communication.
- 04:39So suppose we now require federal it average and SGD
- 04:43to do the same amount of computation at the edge
- 04:46node, then the convergence speed of Federated average will be slower than that of SGD.
- 04:54We know that in the scenario of Federated Learning the computational cost is relatively small while the communication cost
- 05:03is high.
- 05:05So Federated average algorithm is very practical.
- 05:09Another point from the figure is that with Federated Average one can use higher learning rate.
- 05:16So the result of the learning rates equals to 0.25 is better than that of 0.05.
- 05:24On the contrary, SGD
- 05:25with higher learning rates results in worse learning curves.
- 05:35According to the previous example, we can find that the communication content between age nodes and central server in Federated
- 05:44Learning is mainly the models ways
- 05:47and the gradient information, which is computed at the edge
- 05:52node. User data does not leave the local device, but the question is, are weight and gradient information secure enough?
- 06:02Let's take a look at the formula of gradient computation, we know that gradient is a derivative of the loss function with
- 06:12respect to a weight parameter. Here X,
- 06:16is a sample,.
- 06:17and Y represents a label. The mathematical transformation of local data obtains the gradients.
- 06:25Therefore, gradient carriers, certain information from the training data.
- 06:32So if the gradient contains the information of training data, we can reverse engineer the data from the gradient somehow.
- 06:40Unfortunately, the foreign literature shows that the training data information can be severelly leaked through the model ways
- 06:50and gradients. In another paper, the authors devise a new set of attacks to compromising influence data privacy in collaborative
- 07:03AI system when deep network model and the corresponding inference task are split and distributed to different nodes.
- 07:13So one malicious participants can accurately recover and arbitrary input fed into this system, even if he has no access
- 07:24to another participants data or computations. it can also recover the prediction APIs
- 07:31from querying this system.
- 07:37This paper, demonstrates that Federated learnings update leak unintended information about participants training data.
- 07:46And it develops several inference attacks to exploit this leakage.
- 07:52It shows that adversarial participants can infer the presence of exact data points.
- 08:00For example, the specific locations in another training data.
- 08:05In other words the membership inflerence. Second,
- 08:10the authors show how the adversary can infer properties that hold only for a subset.
- 08:17For example, the adversary can infer when a specific person first appears in the photos used to train the binary gender classifier,
- 08:28in this way by using the gradient information as features to train the binary classifier
- 08:34one can further in for other properties like age, race, disease, etcetera.
- 08:41All very sensitive personal data, personal information that must be protected.
- 08:48So, there are many ways to obtain data information through model ways or gradient information.
- 08:55This video only introduced
- 08:57simplest one, privacy protection algorithms are indeed
- 09:01which shows research directions in the Federated Learning.
- 09:07How is the different defense also critical? for example, adding noise to gradient to strengthen protection
- 09:16this method is called differential privacy.
- 09:20However, the sort of method secures the information but also decreases the accuracy and convergence speed of the model.
- 09:30Therefore, the privacy protection of Federated Learning is still a challenging task first of all, ensuring the robustness
- 09:38of learning and combining barriers
- 09:42security attacks is also a good research direction. Such as the poisoning attack.
- 09:49It is an attacker who pretends to be a participant to join the Federation and insert poison in data.
- 09:58This kind of poison data information will disrupt other participants and make the entire learning task a fail.
- 10:07For example, this kind of attack can let the Global Model make a specific mistake by adding a customized adversarial examples
- 10:17or living an attack back door. Overall how to defend the privacy leakage and improve the robustness are
- 10:27essential research directions in the Federated Learning.
- 10:33Thank you for watching
To enable the transcript, please select a language in the video player settings menu.