kirill_makarov -

Federated deep learning offers new approach to model training

Training deep learning models puts a massive strain on enterprise infrastructure, but federated learning, which trains models on endpoint devices, could lessen some of the demand.

Training machine learning and deep learning models requires massive compute resources, but a new approach called federated learning is emerging as a way to train models for AI over distributed clients, thereby reducing the drag on enterprise infrastructure.

In one example, engineers at Google working on the company's Android mobile platform used federated deep learning to improve the performance of speech recognition and predictive text applications for phones in a way that reduces privacy concerns, increases model performance and reduces communication overhead.

"Federated learning is a new frontier in AI where you leverage the massive compute available in the form of distributed devices, thereby allowing for learning to be local, private and yet relevant," said Ramya Ravichandar, vice president of product management at FogHorn Systems, an IoT platform provider.

Traditional AI approaches require powerful compute resources where training data from all sources is aggregated and the models are trained. In federated learning, developers use the compute power of individual devices to distribute the learning process. And, because data never leaves the device that created it, federated learning can help support user privacy.

Federated learning can be applied to deep learning problems or more traditional machine learning problems, said Mike Lee Williams, research engineer at Cloudera Fast Forward Labs.

"Deep learning is cool and can be more accurate than traditional machine learning, but you need a good reason to use it because it introduces engineering complexity and may require specialized hardware."

A new paradigm

Most existing AI tools train on data in the cloud and then push better algorithms to devices. With federated deep learning, there's no need for training data to be sent in all its completeness to a central data store, Ravichandar said.

Approaches can vary in terms of which subset of edges to include in the training updates, how to capture those updates, when to trigger retraining and when to push out the new model to all users. All these approaches will vary based on the use cases and applications involved.

Example of the deep learning process
Most deep learning applications are developed following this process.

Ravichandar sees federated learning emerging in the mobile market predominantly where the endpoints are high-end phones with AI chips, and a lot of the techniques out there are focused on this segment. When one considers the industrial sectors, where the endpoints are network gateways or programmable logic controllers used in manufacturing, then the kinds of machine learning models that use federated learning will vary greatly.

In industrial settings, the volume of participating devices may not be as high as in mobile consumer applications. Also, compliance and regulations around factories and plants are extremely rigid. So, automated deployment of models -- one of the main appeals of federated learning -- without adequate testing and certification is not possible.

ByteLake, an AI consultancy based in Poland, recently released a proof of concept for the manufacturing industry in concert with Lenovo for predictive maintenance. With federated learning, ByteLake and Lenovo can monitor such filters in real time and still aggregate the findings across the whole factory or even beyond that. They are also working on algorithms for telecom.

"Federated learning changes the way we design AI solutions," said Marcin Rojek, co-founder of ByteLake. His team must think about where and how they collect and initially process the data, where the training shall happen and which edge devices should get some level of intelligence.

"The key challenge is that, in many cases, we are not just trying to find a cat or a dog in a picture, but we rather need to deep dive into mathematical equations of neural networks to understand how we can deploy the underlying algorithms across the network of various devices," Rojek said.

Federating across parties

There are single-party and multiparty approaches for implementing federated learning algorithms, said Sai Sri Sathya, CEO of, an AI startup. In a single-party system, only one entity is involved in governance of the distributed data capture and flow system. This could take several forms, such as a smartphone or IoT app, network devices, distributed data warehouses or machines used by employees.

Models are trained in a federated manner on data that has the same structure across all client devices, and in most cases, each data point is unique to the device or user. For example, a music recommendation engine, which recommends music on a specific app, can be federated this way.

In a multiparty system, two or more organizations or franchisees form an alliance to train a shared model on their individual data sets through federated learning.

"Keeping data private is the major value addition of federated learning here for each of the participating entities to achieve a common goal," Sathya said. This is because the data never leaves its original location or gets combined with data from the other entity.

A neutral third party could be involved in providing the infrastructure to aggregate model weights and establish trust among the clients. For example, multiple banks could train a common fraud detection model without sharing their sensitive customer data with each other through federated learning.

Challenges to federated deep learning

Because the success of federated learning really depends on the volume of participating devices, the biggest hurdle is the orchestration of all the actors, FogHorn's Ravichandar said.

AI engineers also must consider the manner and frequency with which models are shared, Cloudera's Williams said. This involves balancing the relative importance of things like privacy and limited compute resources on edge devices.

Also, the tool sets are new. "If you're looking at it because you have legal obligations to preserve privacy, then you need to tread a little carefully and make sure you understand the guarantees and risks," Williams said.

Another other big challenge is that real-world distributed systems are complicated and can be fragile, Williams said. AI developers must build a system that can handle the inevitable problems in this context, such as edge devices that are slow to respond (some people have slow phones) or fail altogether (some people turn their phones off at night). Williams believes the frameworks will eventually enable machine learning engineers to hide some of these challenges and focus on the data, in the same way a tool like Spark does in the data center. "But we're not there just yet," Williams said.

Building a federated data pipeline

Another challenge is that local training of supervised models requires labeled data that isn't available or is difficult to produce in many cases.

"A good way to tackle this challenge is by defining the federated learning problem and designing [a] data pipeline such that labels are captured in an implicit way," Sathya said.

For example, a user's interaction data from actions taken or events triggered could be used as feedback on model performance.

Ultimately, developers need to be mindful of finding the right use case for federated deep learning. Because model convergence time is higher in a federated setup compared to the traditional central training approach, reliability issues may arise due to connectivity problems, variations in app usage patterns and irregular or missed updates. Sathya said that federated learning should be considered only when the size of the data and cost of aggregating from distributed sources is especially high.

Sathya believes the mindset of centrally aggregating data for competitive advantage, which has developed through the age of big data, could be a major hurdle slowing the widespread adoption of federated learning.

"Effective data protection policies and appropriate incentives and business models around decentralizing data can tackle these issues and develop the federated AI ecosystem," he said.

Dig Deeper on AI infrastructure

Business Analytics
Data Management