Sergey - stock.adobe.com

Feature

Optimizing an artificial intelligence architecture: The race is on

As AI architecture improves and costs decrease, experts say enterprise adoption will go up, spurring yet more innovation and benefits for businesses and AI vendors alike.

George Lawton

Published: 09 May 2018

AI applications often benefit from fundamentally different architectures than those used by traditional enterprise apps. And vendors are turning somersaults to provide these new components.

"The computing field is experiencing a near-Cambrian-like event as the surging interest in enterprise AI fuels innovations that make it easier to adopt and scale AI," said Keith Strier, global and Americas AI leader, advisory services at EY.

"Investors are pouring capital into ventures that reduce the complexity of AI, while more established infrastructure providers are upgrading their offerings from chips and storage to networking and cloud services to accelerate deployment."

The challenge for CIOs, he said, will be matching AI use cases to the type of artificial intelligence architecture best suited for the job.

Because AI is math at an enormous scale, it calls for a different set of technical and security requirements than traditional enterprise workloads, Strier said. Maximizing the value of AI use cases hinges, in part, on vendors being able to provide economical access to the technical infrastructure, cloud and related AI services that make these advanced computations possible.

Keith Strier

But that is already happening, he said, and more advances in artificial intelligence architectures are on the horizon. Increased flexibility, power and speed in compute architectures will be catalyzed not only by the small band of high-performance computing firms at the forefront of the field, he said, but also from the broader HPC ecosystem that includes the chip- and cloud-service startups battling to set the new gold standard for AI computations.

As the bar lowers for entry-level AI projects, adoption will go up and the network effect will kick in, creating yet more innovation and business benefit for everyone -- enterprises and vendors alike, he said.

In the meantime, CIOs can give their enterprises a leg up by becoming familiar with the challenges associated with building an artificial intelligence architecture for enterprise use.

Chip evolution

One key element of the transition from traditional compute architectures to AI architectures has been the rise of GPUs, field-programmable gate arrays (FPGAs) and special purpose AI chips. The adoption of GPU- and FPGA-based architectures enables new levels of performance and flexibility in compute and storage systems, which allows solution providers to offer a variety of advanced services for AI and machine learning applications.

Surya Varanasi

"These are chip architectures that offload many of the more advanced functions [such as AI training] and can then deliver a streamlined compute and storage stack that delivers unmatched performance and efficiency," said Surya Varanasi, co-founder and CTO of Vexata Inc., a data management solutions provider.

But new chips only get enterprises so far in being able to capitalize on artificial intelligence. Finding the best architecture for AI workloads involves a complicated calculus involving data bandwidth and latency. Faster networks are key. But many AI algorithms also must wait a full cycle to queue up the next set of data, so latency becomes a factor.

Another issue is that data must traverse multiple protocols to cross server boundaries or go between servers and storage. Data engineers can reduce these by finding better ways to enable data locality, so one server can process larger chunks of data without waiting for others. Some cost savings have been demonstrated through better integration between GPUs and storage. Other vendors are looking at how to make it easier to architect AI servers for composability so the same servers can be reused across multiple workloads.

Bringing NVMe to AI workloads

Many GPU-based solutions are based on direct-attached storage (DAS) deployment models, which makes AI's distributed training and inferencing very difficult to do. As a result, staging and management of these deep learning data pipelines can become complex, time-consuming tasks.

This bottleneck is being addressed with non-volatile memory express, or NVMe, which was originally designed to provide better connectivity between solid-state drives (SSDs) and traditional enterprise servers. Now, it is being baked into new I/O fabrics to improve AI workloads.

The thinking is that NVMe over Fabrics (NVMeF), as these interfaces are called, will help reduce the overhead in converting between network protocols and in managing the idiosyncrasies of each type of SSD. This could allow CIOs to justify the cost of AI apps that use larger data sets.

There are risks with NVMeF, starting with the high cost of investing in the bleeding edge. Plus, the industry has not settled on a vendor-neutral approach to NVMeF yet, which means CIOs also need to be wary of vendor lock-in as they choose a product.

But the incorporation of NVMeF could be an important step in optimizing an enterprise's artificial intelligence architecture, according to Varanasi.

"Even though deployments of NVMe over fabric architecture may take another 12 to 18 months to become mainstream, the core elements are in place and early adopters are seeing promising results," Varanasi.

CIOs who want to push the envelope of AI applications may want to consider building a shared storage pool optimized for AI workloads using NVMeF, if it provides a competitive advantage in the short run as a replacement for existing storage networking equipment. But it may be more cost-effective to wait for the dust to settle on NVMeF interoperability.

Reducing data movement

One big consideration for CIOs in planning for the various stages of the AI pipeline is the cost of moving data. From ingesting and transforming data to using it train the algorithms, AI projects require a tremendous amount of data and data processing.

Haris Pozidis

The resources, both in hardware and people, required to manage these data requirements, as well as the time it takes to move the data, can make an AI project cost-prohibitive. If CIOs can find ways to avoid the movement of data between stages, there is a high probability they can develop a viable AI infrastructure that can meet business demands, said Haris Pozidis, Ph.D., manager, storage accelerator technology at IBM Research. Vendors are working on the problem.

For example, IBM has been experimenting with various hardware and software optimizations to reduce data movement for large scale AI applications at its Zurich labs. These optimizations have boosted performance by 46 times for a popular click analytics advertising test case. This work uses techniques like distributed training and GPU acceleration and improves support for sparse data structures, Pozidis said.

Parallelism is another important component for accelerating AI workloads. Distributed training requires changes at both hardware and software levels for efficient processing of algorithms across GPUs in parallel. The IBM researchers built a prototype of a data-parallel framework that enables them to scale out and train on massive data sets that exceed the memory capacity of a single machine. This is crucial for large-scale applications. Data movement was reduced through a new framework optimized for communication-efficient training, which respects data locality.

At the hardware level, the IBM researchers are experimenting with innovations in interconnectivity between GPU, CPU and memory components within servers and between servers and storage using NVMeF.

"Different AI workloads are limited by different bottlenecks on the network, by memory bandwidth, or CPU-to-GPU bandwidth. By addressing all parts of a system with more efficient interconnects and protocols, one paves the way for the development of faster AI applications," Pozidis said.

Composable computing

Chad Meley

Today, most AI workloads use a preconfigured database optimized for a specific hardware architecture. The market is going toward software-enabled hardware that will allow organizations to intelligently allocate processing across GPUs and CPUs depending on the task at hand, said Chad Meley, vice president of analytic products and solutions at Teradata.

Part of the challenge is that enterprises use multiple compute engines to access multiple storage options. Large enterprises tend to store frequently accessed, high-value data such as customer, financials, supply chain, product and the like in high-performing, high I/O environments, while less frequently accessed big data sets such as sensor readings, web and rich media are stored in cheaper cloud object storage.

One of the goals of composable computing is to use containerization to spin up computer instances such as SQL engines, Graph engines, machine learning engines and deep learning engines that can access data spread across these different storage options. The ability to run multiple analytical compute engines enables the use of ensemble models that incorporate insights across engines and, typically, lead to more effective results.

IT vendors like Dell Technologies, Hewlett Packard Enterprise and, now, Liquid are looking at moving past traditional architectures that assign workloads at the compute-box level. Instead, the goal is to assign AI workloads across a more granular mix of CPUs, GPUs, memory and storage. This transition requires the adoption of new networking components that improve the speed and reduce the latency when connecting these different components.

For example, many cloud data centers use Ethernet, which has a latency of about 15 microseconds, for connecting compute boxes and storage appliances. InfiniBand, which is championed in many converged infrastructure solutions, can reduce this latency to 1.5 microseconds. Liquid has created a set of tools for interconnecting different boxes via PCI Express (PCIE), which can drop latency to 150 nanoseconds.

Going forward, some folks are suggesting putting more memory right next to the GPUs used for heavy lifting with even faster interconnects, like DDR4 commonly used for RAM, which has a latency as low as 14 nanoseconds. But this only works over short distances of a few inches.

Malo Marrec

Malo Marrec, co-founder and product leader at ClusterOne, an AI management service, said more work is required to bring composability to AI workloads on the software side. Although enterprises have begun experimenting with using Docker and Kubernetes for bringing composability to AI workloads, applying these to GPUs is still relatively immature.

"Generally speaking, running GPU workloads and monitoring them is not trivial," Marrec said. "There is no good solution that addresses monitoring in an integrated way.

Bringing storage to the GPU

Another approach lies in using GPUs to preprocess data to reduce the data required for a particular type of analysis and to help organize and label this data. This makes it easier to stage the appropriate subset of this data close to the multiple GPUs involved in AI processing, which, in turn, allows the algorithm to work from memory rather than pulling the data from storage devices over slower networks.

Alex St. John

"The mentality that sees [storage, compute and memory] as separate components of a solution, which is a historically enforced view, is the reason we are struggling to scale efficiently," said Alex St. John, CTO and founder of Nyriad Ltd., a storage software vendor spun out of research on the Square Kilometer Array (SKA) Telescope, the world's largest radio telescope. The bigger the data gets, the less practical it becomes to move it somewhere else to process it.

Indeed, a key constraint for the SKA Telescope was the tremendous amount of power required for processing 160 TB of radio signal data in real time. A key element of the solution they came up with was to move away from the RAID storage systems commonly used in data centers to a parallel cluster file system like BeeGFS that simplifies the ability to stage data for particular AI workloads.

As CIOs are putting together a strategy for an artificial intelligence architecture that would best serve their use cases, usability is an important consideration. If developers, data engineers and DevOps teams can understand the new technology quickly, they can put more focus on building the right business logic rather than dealing with deployment and data pipeline issues.

Another key consideration is the amount of time, effort and support that the organization needs to put into merging the new AI architecture into an existing ecosystem.

"CIOs must weigh the investment of finite resources before implementing new infrastructures and plan for a heavy front-end workload," said Asaf Somekh, founder and CEO at Iguazio.

Optimizing an artificial intelligence architecture: The race is on

As AI architecture improves and costs decrease, experts say enterprise adoption will go up, spurring yet more innovation and benefits for businesses and AI vendors alike.

Chip evolution

Bringing NVMe to AI workloads

Reducing data movement

Composable computing

Bringing storage to the GPU

Dig Deeper on Digital transformation

Pure aims at AI beyond the enterprise with FlashBlade//Exa

How do CPU, GPU and DPU differ from one another?

How to choose the best GPUs for AI projects

AI-focused storage choices, features and considerations

Chip evolution

Bringing NVMe to AI workloads

Reducing data movement

Composable computing

Bringing storage to the GPU

Related Resources

Dig Deeper on Digital transformation

Pure aims at AI beyond the enterprise with FlashBlade//Exa

How do CPU, GPU and DPU differ from one another?

How to choose the best GPUs for AI projects

AI-focused storage choices, features and considerations