In recent years, there has been a trend towards greater use of DRAM in data center applications. In-memory key-value stores are being used to cache or replace disk-based databases, and memory-based big data frameworks are supplanting earlier disk-based frameworks. However, process technology improvements for DRAM have not kept pace and instead have stagnated in terms of cost and density. With next-generation memory technologies like STT-MRAM, PCRAM, and RRAM still far from commercial viability, improving memory utilization is the most potentially fruitful path towards reducing the cost of data center memory in the near term. A typical way to improve utilization of a resource in a data center is to disaggregate said resource, allowing it to be shared across multiple nodes. Disaggregating memory has generally been quite difficult because the latency of a round-trip across a typical data center network is much greater than the latency of a DRAM access. However, recent work on photonic interconnects promises to deliver data center networks with much lower latencies, making the concept of data center remote memory more feasible.

In this work, we present the design of a DRAM caching remote memory system which divides a data center rack into compute-specialized and memory-specialized nodes. Each memory blade contains a fixed-function hardware controller that serves data from its large pool of DRAM to the compute blades through the rack network. Each compute blade contains a small local DRAM that is used as a cache for remote memory. This local DRAM is managed by a hardware controller which automatically refills the cache on misses by sending requests to remote memory. This system provides a global pool of memory that can be dynamically allocated among the compute blades and is transparent to software. We evaluated our system using microbenchmarks and realistic data center applications in cloud FPGA-based RTL simulations. Through these evaluations, we found that our DRAM caching system can serve data at lower latencies than earlier virtual memory-based remote memory systems and, with the aid of prefetching, can achieve performance comparable to local DRAM.




Download Full History