00:00 Jack Guedj: Hello, and a warm welcome to the virtual Flash Memory Summit, November 2020. My name is Jack Guedj. I am the CEO of Numem, a provider of smart memory IP cores based on MRAM. Today, I want to talk about the MRAM-based DNN accelerator for the high-performance space computing program. As we know, there is and continues to be an acceleration of data, data being captured, data being transmitted, forecasted to be in the 175 zettabytes by 2025. Those are numbers that even us memory experts cannot grasp as they are so large. And to give an idea, a zettabyte is the entire Library of Congress 70 million times. So, 175 zettabytes represents no less than 12 million five hundred times the entire Library of Congress. As such, the model, which is very convenient and very efficient to pipe all the data up to the cloud and then do all the processing there, as a unified process is no longer practical.
01:37 JG: Why? It just takes a very large amount of bandwidth, and if you imagine here a space application that's the acid test. Data is being collected by sensors or an autonomous rover which may have north of 20 cameras collecting a huge amount of data transmitting to the spacecraft or the space station, and then transmitting back to Earth to the servers where it can be accessed by local locations here. Tons of bandwidth in this model, traditional model, being piped to the cloud, very slow data transfer, and during data analytics in deep space, that communication from the spacecraft to Earth may take over 20 minutes. It creates delayed action and decision. God forbid you would be on a space walk in a difficult spot and needing to get some analytics being done, and you have to wait there for 20 minutes until you get a response. Definitely not practical.
03:10 JG: Security is also an issue as more and more data is piped bi-directionally, it is more prone to cyber attacks or security attacks. So, like most systems, and especially for NASA, distributed processing is the way to go. Where now at the sensor or the rover, whatever equipment, intelligence would be built in, and then more intelligence would be built in the spacecraft, was a capability of compliance instead of intermittent connection or no connection. Enabling to have those faster data analytics could mean saving lives or saving very expensive space equipment. Better power efficiency, better security not only due to the transmission, but also the fact that with intelligence you can fight cyber attacks more efficiently. And then scalability, now you can add sensors without bogging down the entire system.
04:47 JG: So, it is very relevant, and clearly the acid test of the requirement for distributed system. So, let's take a look at the requirements of this DNN accelerator program for NASA.
In general, NASA's looking for high performance and there are size, weight and power constraints of space missions, they call it SWaP, and their goal is to reduce that and shrink that SWaP. Data retention is important for power loss or any kind of environmental instabilities. High endurance, being a space application and also wide temperature range. Radiation tolerance and the ability to do the autonomous data analysis so that there can be a faster response time and adaptive learning right at the spacecraft or even the intelligent sensor side. Security, more and more, obviously, cyberthreats are increasing and with more data analytics, the ability to understand what's going on and take some actions right there at the space craft. So, some of those applications, we talked about intelligent sensors, the self-driving rover definitely a unit which is somewhat comparable to self-driving cars, albeit with a different environment.
06:50 JG: There's obviously not as many rovers on the moon or Mars than there's cars here. So slightly different requirements, but the same need for capturing and processing a lot of data. Deep space with or without human, all kinds of monitoring, wearables for astronauts, and we talked about cybersecurity.
So, let's look at those requirements in terms of the project, this MRAM-based DNN accelerator. And what does MRAM . . . MRAM's role to achieve those goals. One, in terms of power, MRAM provides lower power solutions. They're much lower standby power in excess of 20 times lower standby power. And they're also smaller. Now, smaller means for AI processing, usually not shrinking of the device, but the ability to put more memory on chip. More memory on chip means less back and forth to the DRAM, which is roughly 57 times more power than an internal memory processing. So overall, MRAM enables capability to lower power both inside the chip and by reducing the data transfer back and forth to an external DRAM.
08:45 JG: It is non-volatile, so if power goes off or there's any kind of intermittent power failures, the data will be retained, the code will be retained, the coefficients will be retained. So, it's quite efficient to avoid data loss, but also recover very quickly from a power loss.
MRAM is relatively high endurance; we at Numem have tested in excess of 109 but some tests, we've seen up to 1012. So, we fully expect that over time, it will be in the 1012 or north of 1012 range. And it does very well in terms of temperature, does relatively well, and is capable of the wide temperature range from -40 to 125 C.
In terms of radiation tolerance, we at Numem have done some work, we still have more to do. But essentially, we've looked at TID and we followed the MIL standards, we've tested multiple devices, and each of the groups we've tested for statistical demonstration. And we've basically read irradiated devices with gradually lower voltage so we could kind of force and determine the failure points. And then we tried a bunch of erase, read, write, read on irradiated chips so we could observe if there were any failures.
10:32 JG: And we're happy to report that we have seen no failure in those preliminary tests. As I said, there's a lot more that we need to do over time. And if we look in general, tests have been done, multiple studies have been done on MRAM and MRAM behaves, being a non-transistor-based memory, behaves a lot better than SRAM or eFlash overall. And it varies, you can see for TID, total ionizing dose, it does.
Actually, the worst case is embedded flash. Whereas for single event, SRAM is the worst case but, in both cases usually, MRAM does quite well. So, this is one of the interests of space application programs using MRAM because of its inherently higher radiation tolerance.
So, let's look at the MRAM scalable solution for DNN accelerator that Numem is proposing. It is based on a scalable DNN accelerator developed by a Numem partner using what they call RPP, which is a reprogrammable processor. That solution can scale from 1 to 32 processing elements, each processing elements having 32 ALUs for a total per chip of 1,024 ALUs, enabling very efficient processing of things like matrix multiplication convolution. For higher performance, those chips can be daisy chains. So, for example, using four chips, 128 tops could be achieved.
12:45 JG: The memory architecture is based on MRAM and we'll see it in the diagram on the next slide, but basically in order to take advantage of MRAM, and where it fits the best, the memory architecture is split between a high speed video streaming based on SRAM and a lower power, smaller area footprint memory for weight coefficient based on NuRAM. Between those two, because the MRAM is functioning at a -- especially on the right -- at a different speed, there is a flow control to adjust the flow of data between the rest of the system and the MRAM memory.
So, if we look at that on a diagram here, you can see that the memory is split between the high-speed data stream, SRAM and the coefficient memory. Here we're representing that the coefficient memory relatively shrinking, but by a factor of, say, 2.5 to 3.5 times.
14:14 JG: But actually, what most of our customers are doing, in those kind of application, is use the same space available to put more memory on chip so that the DDR access, which is shown on the left side, is reduced. And in one of the applications, we've seen a reduction of DDR access from 32 gigabytes per second down to 0.7 gigabytes per second. And that saves a tremendous amount of power, as I mentioned earlier since DDR access is about 57 times higher than internal memory.
The flow control enables to regulate the traffic of data going into the MRAM. And the MRAM is usually pretty fast on the read. So, on the read times, especially by using the customization that we provide, we can go into large word sizes and barely to use power banking or pipelining. And so, on the read side, usually we can pretty much be at speed into the processing element. And this is one thing we've noticed, even though those are a coefficient and the write is actually nicely for MRAM relatively slow, the read needs to be fairly high performance to be able to get all the coefficients into the network very rapidly.
15:54 JG: So that is all processed by the processing engines, which as I said, is scalable from 1 to 32. The benefit is not only the reduction in power by going out to the . . . By reducing the DRAM access, but also the memory itself is much lower power. We're talking about 20 times, a large amount that memory is actually idle, and on standby we're talking about less than 20 times. If you take a mode equivalent to the SRAM in retention mode versus MRAM, then you could be talking about a 50 times reduction in power. And as discussed earlier, it has the benefit also of this radiation hardness, which is obviously critical for NASA and space applications.
16:57 JG: Now, let's take a look at the software for this DNN accelerator. For any AI chip solution, obviously the software is paramount to being able to tap on the performance provided by the accelerator. And so, there is no good AI chip solution without a strong software framework.
In this case, this solution is leveraging CUDA-compatible software. So, the CUDA compiler, as well as all the CUDA libraries, making it easy for users to program the device. It also has a graph compiler, so it can automatically do the performance optimization and take advantage of the hardware and memory on the chip. In addition to that, it can cover multiple AI frameworks like the TensorFlow, and provide DNN libraries or OpenCV, as well as the ability to simulate at the ISA level for faster simulation. Representative neural networks for the space application, include functions for sensor fusion, drain mapping, navigation and guidance, with algorithms like MSG-Net, MS-Net, VGG16, YOLO and ORB-SLAM.
18:51 JG: Also representative is ResNet, and when you look here at how it behaves using an RPP8. So, using eight processor engine versus Nvidia TX2 and Xavier. We're looking at it both in terms of a high-performance mode, and a low-power, high-efficiency mode. In the high-performance mode, the RPP8 solution runs about two times the number of images per second and roughly the same actually, even a little bit more on the low-power mode. So, we can see here on the upper right running at over 2000 images per second versus somewhere around 800 images per second for Xavier. In terms of power efficiency, it tracks pretty much the same, with about 50 images per watt per second using Xavier and close to a 100 images per watt per second using RPP8 in the high performance mode, and then we can see part of the graph here, is actually showing higher advantages at the lower power as we've seen above.
So overall, the MRAM DNN accelerator proposed by Numem, provides a scalable solution, in conjunction with the Numem partner, to address the wide range of applications from low power to high performance spacecraft applications. It comes with a comprehensive software suite for ease of programming and tapping on the capabilities of the hardware.
21:28 JG: It provides a path forward to the shrinking size, weight and power, under space constraints, which is a target and focus of NASA. It enables data retention -- which is of high importance for intermittent communications or potential power failures -- high endurance and capability for a wide temperature range, so that both spacecrafts and the sensors can have more and more autonomy and be able to make decisions on the fly, as opposed to waiting for the back and forth between Earth and space.
The radiation tolerance is promising and the testing that's been done by Numem as well as other companies, provide a very positive outlook for radiation tolerance, due to the fact that MRAM is a non-transistor-based storage element. More work needs to be done in that area and will be done over time. Finally, MRAM is significantly smaller in size and lower power of an SRAM when used in the appropriate applications like we demonstrated here with a partitioning of the memory using SRAM for high-speed streaming data, and MRAM for weight coefficient.
23:17 JG: All of this is now supported by multiple foundries, all of them have put significant efforts over multiple years to bring MRAM process to production and available now in production at different process nodes, and we can see over time the amount of process nodes expanding already in development and expected to be in production in the near future. With this, we'd love to take your questions either real time or via chat or online, and feel free to reach us with any questions you might have in the future, and if we can be of any help in helping you develop solutions that similarly will provide capability for space applications as well as commercial applications. Thank you very much.