Jim Handy of Objective Analysis and I recently finished a white paper on The Future of Low-Latency Memory and presented a few slides related to that white paper at the SNIA Persistent Memory and Computational Storage Summit this week. Here are some interesting observations from that white paper that may have some bearing to the future of near and far memory. The figure below, comparing memory capacity and memory bandwidth available for common near memory solutions (DDR4 and DDR5 as well as HBM) versus the OMI interface can help understand the following discussion.
For Near Memory, the memory that attaches directly to a processor’s pins, the current DDR parallel memory bus has been tweaked and adjusted over the years.
Although its performance has improved impressively for more than two decades, DDR is failing to keep pace with the increasing bandwidth requirements of processor chips. Processor core counts are rising quickly, and clock speeds continue to creep higher, driving a thirst for bandwidth and capacity that runs in direct opposition to the way the DDR bus operates.
To achieve the highest DDR speeds, the bus’s capacitive loading must decrease as the bus speed increases. Because of this, the memory channels that previously managed four DIMM slots have shrunk to three, then two, and now the highest-speed channels can only support a single slot. As a result, the amount of memory per channel is declining.
Some processors, notably GPUs, use HBM (High Bandwidth Memory) to get past this issue. High Bandwidth Memory (HBM)s are stacks of DRAM that present 1,000-2,000 parallel signal paths to the processor. This can improve performance but the processor and the HBM must be intimately connected.
Although HBM is a help, it’s considerably more expensive than standard DRAM and is limited to stacks of no more than twelve chips, limiting its use to lower-capacity memory arrays. HBM is also complex and inflexible. There’s no way to upgrade an HBM-based memory in the field. As a consequence, HBM memory is only adopted where no other solution will work
Todays and tomorrow’s computing systems need a growing amount of both Near and Far Memory, that provide as much bandwidth as possible and the processor needs to use the smallest possible die area to communicate with these memories. Various approaches have been proposed for far memory (such as CXL) that enable memory disaggregation, pooling and composibility, but near memory has remained the provenance of parallel DDR and HBM memory.
The Open Memory Interface (OMI) is supported by the OpenCAPI Consortium and uses existing high-speed serial signaling PHYs, with a custom protocol, to connect standard low-cost DDR DRAMs to the processor. OMI is a latency-optimized subset of OpenCAPI. This approach allows large arrays of inexpensive DRAM to be connected at high speeds to a processor without burdening the processor with a lot of additional I/O pins. As shown in the figure above OMI provides near-HBM bandwidth at larger capacities than are supported by DDR.
A summary of various characteristics of these three near memory solutions are summarized in the table below. OMI would be attractive for big data near memory applications that require significant memory density, low latency and high performance as a percentage of chip area.
Cryptocurrency mining is a process of receiving cryptocurrency “coins” as a reward for completing blocks of verified transactions that are added to the cryptocurrency blockchain. This mining generally has requires extensive calculations to acquire the cryptocurrency (proof of work). Cryptocurrency miners require sophisticated processors, such as GPUs for mining the currency. There are large data center facilities, especially in Asia, that specialize in cryptocurrency mining. These facilities use a lot of power and their growth has spurred shortages of computing components such as NVIDIA graphics cards with GPUs for Bitcoin mining in the recent past.
There are cryptocurrencies that use free digital storage space rather than processing to generated cryptocurrency. These cryptocurrencies may be more eco-friendly than the processor-based type as they don’t require large amounts of energy to be consumed to generate the cryptocurrency. A new cryptocurrency named Chia created by Bram Cohen’s Chia Network uses a proof of space and time model (as reported in Tom’s Hardware on April 15). Bram Cohen is the inventor of BitTorrent uses the free space on storage devices, such as HDDs and SSDs, to generate his cryptocurrency.
Chia cryptocurrency won’t be officially available until early May. There are reports that constant read and write operations are being used on the storage devices and if so, they will consumer power to generate the cryptocurrency. This may also make consumer grade SSDs with low endurance a poorer choice for generating this cryptocurrency. The Tom’s Hardware article says that HDD and SSD prices are increasing in Hong Kong as miners are purchasing large amounts of storage devices to generate this new cryptocurrency. There are reports that some Chinese SSD makers are developing specialized SSDs for mining operations.
Could free storage capacity-based cryptocurrencies cause a shortage of HDDs and SSDs, like cryptocurrency mining did for GPUs in graphics cards in the past? Let’s look at this more closely. The figure below from Coughlin Associates shows total HDD and SSD capacity shipped from 2012 and projected out to 2026.
In 2021 we estimate that about 1.48 Zetabytes (ZBs) of HDD and SSD storage will be shipped (about 1.2 ZB of HDDs and about 280 Exabytes of SSDs). This translates into about 221 M HDDs and likely over 350 M SSD shipped this year.
The Chia proof of space and time cryptocurrency model favors access to high performance storage devices, which favors the use of high-capacity SSDs. It is reported that enterprise-grade 4TB-18TB SSDs are seeing the biggest run among Asian cryptocurrency minors in anticipation of Chia’s availability. Consumer SSDs are a larger percentage of the total SSD market than enterprise SSDs, thus there could be a shortage of these storage devices on the open market (versus tied up with contracted OEM or large data center sales). It is not clear that enterprise HDDs (at least without dual stage actuators) will be as impacted, as it is storage capacity and performance that are required for this data mining.
OMI is an attractive option for near memory that needs high density, low latency and high performance. New cryptocurrency mining using a space and time model, such as Chia may lead to a shortage of enterprise SSDs on the open market