Melting Clock Speeds – Why Persistent Memory is the Next Big Revolution in Compute

Posted on Wednesday, July 19, 2017
Get in touch
By Craig Lodzinski
Chief Technologist

More News

Ever since the dawn of modern computing, resources have been split into three main pools: compute, memory, and storage. While there have been innumerable advances in all three areas, one has taken precedence over the other two – the CPU. Taking Intel as the de facto standard, from the very first 4004 chip forward, the CPU has been at the centre of compute, with buses and bridges moving bits in and out for processing.

Thanks to some very clever people, Moore's Law has enabled us to cram more power and speed into boxes of tin every couple of years, with iterative changes to memory and storage to allow the other subsystems to keep up. But with Moore's Law beginning to slow down, we have to turn to the other subsystems, memory and storage, to satisfy our Clarksonian demands for POWER.

Thanks to the raw power of CPUs, one of the biggest factors impacting performance is latency, that is to say, how quickly we can push data in and out of the CPU via memory. We've recently seen a big leap forward in this thanks to the advent of flash technologies, with latency jumping from millisecond on disks down to microsecond on SSDs. But reading from SSD is still hundreds of times slower than reading from DRAM.

The answer to this is to bring memory and storage closer to minimise the latency within the system. There are a number of technologies in this field, some of which are available now, and some a bit further off.

Today – NVMe and Hybrid DIMMs

NVMe (non-volatile memory express) is a new standard for accessing non-volatile memory, utilising the PCIe bus rather than legacy SAS/SATA interfaces. This allows for much higher throughput at lower latencies. As the PCIe bus can be extended outside of the server, NVMeF (NVMe over Fabric) is starting to emerge as a potential solution for high performance, low latency external storage arrays.

While NVMe provide incrementally lower latency at massive scale, Hybrid DIMMs take a different approach, placing flash storage directly on the memory bus in order to dramatically lower latency compared to traditional storage. The downside is due to packaging on traditional DIMMs, there is competition for space on the memory bus, and limited capacities available. Despite these trade-offs, there is increased use of Hybrid DIMMs especially for databases. With this year's new generation on servers offering much greater capacity for memory bus storage, new applications will be able to take advantage of this super low latency flash storage.

Tomorrow – 3D XPoint

However, while removing the SAS/SATA bottleneck improves the performance of flash, there is only so far one can go with conventional NAND flash. Step forward Storage Class Memory (SCM). There are a number of technologies being developed as part of the 'post-NAND' generation, but first horse out the gate is set to be 3D XPoint, a technology jointly developed by Intel and Micron. Early indications are that this will sit in between SSD and DRAM both in terms of performance and price.

Meeting the right price and performance metrics will be critical to the success of 3D XPoint. We will have to wait for production samples to determine the advantages of using it compared to DRAM, conventional SSD, or a combination of the two, which is becoming increasingly popular.

3D XPoint is set to launch later this year as a PCIe card, with memory bus storage set to follow. Numerous OEMs are working on integrating 3D XPoint, with HPE having already re-architected their 3PAR arrays to take advantage of this technology as an accelerator.

Future – Universal Memory

While placing faster storage on the PCIe bus is one thing, true SCM requires changes to file systems, operating systems, and CPUs themselves. Compute simply isn't accustomed to DRAM being persistent memory, so development efforts are ongoing across the industry to take advantage of this. Though it's only part of the true nirvana of 'Universal Memory' – a huge pool of SCM, entirely replacing the traditional storage and memory subsystems, rather than supplementing them.

While 3D XPoint is fast, it remains significantly slower than DRAM and still requires a 2-tier architecture. The long term goal for universal memory is to create persistent memory at DRAM speeds with a number of competing technologies such as Phase Change Memory (PCM), Resistive RAM (ReRAM) and Memristor all under development.

But there is another factor in creating universal, storage class memory pools – the CPU. As mentioned before, the CPU has remained the heart of compute systems for decades and the current design is to have discrete memory, PCIe, SATA and inter-CPU link connectivity. In a world where memory and storage are converged, this orthodoxy doesn't stand up.

Fortunately, there’s an answer to that being developed, too. Intel’s ‘Omnipath’ and the consortium driven ‘Gen-Z’ are both being designed to be universal fabrics for various types of media to be connected into the CPU. For example, the below diagram shows a variety of storage and computational devices connected into CPU SoCs via the Gen-Z fabric. 

The end goal for these fabrics is to create a more egalitarian approach between compute subsystems, offering much greater bandwidth at dramatically lower latency. This will allow for the next generation of SCM, DRAM, GPGPU, FPGA, Tensor and more to unleash their full potential.

But what does all this technical nonsense mean for me?

Put simply, these technologies allow us to solve two problems:

1. The eventual slowing down of Moore's Law.

2. The explosion in data volume and speed.

For the first, expanding and changing how we work across compute reduces the industry's reliance on processor manufacturers pulling a rabbit out of the hat and enables greater performance across the board.

In the second case, this is fundamental to solving the big developing questions in technology. Autonomous vehicles, AI and machine learning, cognitive technologies, IoT, Big Data and more can all benefit greatly from reduced latency, greater bandwidth, and new fabric topologies. With statements such as 'data is the new oil' becoming commonplace in the industry, it goes without saying that any technology that can improve the speed at which data is processed and stored improves time to value in a huge variety of activities.

Finally, as with any advance in raw hardware power, new and innovative use cases will be found for it often operating within a lower power envelope. This will not only expand the horizons of research and the applications of data, but with less impact on the environment (and the electricity bill), which can only be a good thing.

Find out more

To learn how Softcat can help you find the right solution for your power and speed needs, or to speak to a specialist, contact your Softcat account manager or get in touch using the button below.

Get in touch

We would love to hear any comments you have about this article!