blobbotronic - stock.adobe.com
Microsoft made competition in the supercomputing market a little more interesting by debuting a machine custom-built to train large AI models that work with its Azure cloud platform.
The ultimate goal of the Microsoft supercomputer, co-developed with the San Francisco-based OpenAI, is to not just create large AI models but also to make training optimization tools available to developers, as well as to provide them with supercomputing resources through Azure AI services and GitHub. Making the tools more readily available is not completely altruistic on Microsoft's part. In so doing, the tools make it easier for developers, data scientists and business users to leverage Microsoft's AI at Scale initiative, the company said.
Tools included as part of its AI at Scale initiative are Microsoft Turing models, which are used to improve different language understanding tasks across the Bing, Office and Dynamics product lines. A few months ago, the company released to researchers what it claims is the world's largest publicly available AI language model, Microsoft Turing Natural Language Generation.
AI supercomputers a burgeoning field
Microsoft is hardly the only major vendor merging AI initiatives with supercomputers. A year ago, HPE purchased supercomputer pioneer Cray for $1.3 billion in the hopes of not only bringing high-performance computing to commercial enterprises but to also be more competitive in pursuing opportunities in the AI and machine learning markets. IBM, which has two of the three fastest supercomputers in the world with Summit and Sierra, is using both machines to go after similar opportunities.
Similarly, major cloud archrival AWS has expressed interest in competing in the supercomputer market.
This concerted push into the high-performance computing (HPC) market by top-tier players the past couple of years is due to two obvious factors, according to Steve Conway, senior adviser in Hyperion Research's HPC Market Dynamics practice: The market is now big enough to care about, and the raw compute power of supercomputers is essential to the acceleration in development of AI-related technologies.
Steve ConwaySenior adviser, Hyperion Research
"This is not a tiny market anymore -- we are projecting it to be $46 billion by 2024," Conway said. "Supercomputers have become indispensable for developing AI, self-driving cars and precision medicine."
But the new Microsoft supercomputer won't necessarily compete with the supercomputers from IBM and HPE-Cray, according to Conway, because those are built to solve different problems. If a user is working on a project that is communications-intensive, with a large amount of data traveling back and forth among team members, users would likely use supercomputers from IBM and HPE, Conway said.
"If you are designing cars or forecasting weather, you wouldn't use something like OpenAI," he said. "You would use a supercomputer with lots of processors, with fast communications among the processors."
But if users are working on projects that can be broken down and addressed as independent problems that have no dependency on each other, and there isn't a need for fast networking speeds, the OpenAI system would be a better fit.
"If what you are doing is adding up results like a payroll in a business, and each function can run on a different processor core, then OpenAI is a good solution," Conway said.
Looking ahead at supercomputers
While launching into the supercomputer market could give Microsoft's overall AI initiative a boost, one consultant said Microsoft still trails a few competitors, such as Google, in terms of general AI innovation. The best way for Microsoft to catch up is with a series of acquisitions of smaller AI companies.
"Microsoft has made some acquisitions in this [AI] space, but they are still playing catch-up," said Frank Dzubeck, president of Communications Network Architects in Washington, D.C. "They are still focusing on application-specific algorithms for certain industries. They have made some headway but aren't there yet where the Googles of the world are."
There will be a "changing of guard" in the AI market, Dzubeck said, led by a raft of both known and unknown fledgling AI companies, similar to what happened in the world of social networking 10 and 15 years ago. It is from among these companies that Microsoft, through acquisitions, will grow its fortunes in the AI market, he predicted.
The new Microsoft supercomputer, as a single system, has 285,000 CPU cores, 10,000 GPUs and 400 gigabits per second of network connectivity for each GPU server. If it were compared with other machines on the "TOP500" list of the world's most powerful supercomputers, it would rank in the top five, Microsoft said.
However, the performance of Microsoft's machine is not based on the same benchmarks as the supercomputers sitting atop the TOP500 list, Hyperion's Conway said, so claiming that it is one of the five fastest supercomputers is misleading.
In addition to the new system, Microsoft also announced it is about to begin the process of open sourcing its Microsoft Turing models, along with the instructions for training them in Azure Machine Learning, the same set of tools Microsoft uses to improve language understanding. The company also showed off a new version of DeepSpeed, an open source deep learning library for PyTorch, which minimizes the amount of computing power for necessary for large distributed model training. The new library reportedly lets developers train models 15 times larger and 10 times faster than they could without DeepSpeed.
Lastly, Microsoft announced added support for distributed training to the ONNX Runtime, an open source library that allows AI models to be ported across hardware platforms and operating systems.