Optimized Memory Performance

Posted by Richard Jones, xByte Sales Engineer on Mar 29, 2022

When configuring memory in Dell PowerEdge 14th Generation servers the end users expect each server to function at peak performance.

Dell EMC 14th Generation use the latest Intel Skylake and Cascade Lake processors (Intel Scalable Processor Family). When it comes to configuring servers for optimal memory performance, end users do not always take into account the Intel Xeon Memory Channel architecture and configuring systems for optimal performance.

The Intel Xeon based processor has a built-in memory controller like previous generation Xeons but now supports ā€œSIXā€ memory channels per socket (see Figure 1.). This is an increase from the four memory channels found in previous generation Xeon E5-2600 v3 and E5-2600 v4 processors. Different Dell EMC server models offer a different number of memory slots based on server density.

Figure 1: PowerEdge R740 CPU-to-memory subsystem connectivity for IntelĀ® Cascade Lake™

The memory subsystem performance depends on the CPU model since the memory controller is integrated with the processor, and the speed of the processor and number of cores also influence memory performance.

In previous generation systems, the processor supported four memory channels per socket. This led to balanced configurations with eight or sixteen memory modules per dual socket server. Configurations of 8x16GB (128 GB), 8x32GB (256 GB), 16x32 GB (512 GB), 16x64 GB (1024GB) were popular and recommended NOT SO for14G Servers, these are unbalanced configurations and can lead to poor memory performance.

Balanced Memory Considerations

  • For optimal memory performance, all six memory channels of a CPU should be populated with memory modules (DIMMs) and populated identically. (This is a called a balanced memory configuration)
  • In a balanced configuration all DIMMs are accessed uniformly, and the full complement of memory channels are available to the application.
  • Memory performance will vary depending on whether the DIMMs used are single ranked, dual ranked, RDIMMS or LR-DIMMs.
  • 8GB 2666 MT/s memory is single ranked and have lower memory bandwidth than the dual ranked 16GB and 32GB memory modules.
  • 16GB and 32GB are both dual ranked and have similar memory bandwidth with 16G DIMMs demonstrating higher memory bandwidth.
  • 64GB memory modules have slightly lower memory bandwidth than the dual ranked RDIMMS.
  • An unbalanced memory configuration will lead to lower memory performance as some channels will be unused or used unequally. Even worse, an unbalanced memory configuration can lead to unpredictable memory performance based on how the system fractures the memory space into multiple regions ( Linux maps out these memory domains (ESXi)

Memory controller logic was designed around having all memory slots populated to return the highest memory bandwidth, so it should come as no surprise Figure 2 is the recommendation for a balanced configuration populated with twelve DIMMs. If a balanced and fully populated configuration with twelve DIMMs cannot be executed, then a balanced configuration with six DIMMs as in Figure 3 is the alternative recommendation, as memory bandwidth is reduced by only 3% when compared to twelve DIMMs.

Figure 2: Populating all twelve slots with identical capacity DIMMs for a balanced configuration is recommended for highest memory bandwidth and lowest memory access latency

Figure 3: Populating the first six slots with identical capacity DIMMs is recommended for the second highest memory bandwidth and lowest memory access latency

Unbalanced memory configurations

In previous generation systems, the processor supported four memory channels per socket. This led to balanced configurations with eight or sixteen memory modules per dual socket server. Configurations of 8x16GB (128 GB), 16x16 GB or 8x32GB (256 GB), 16x32 GB (512 GB) were popular and recommended. With 14G these absolute memory capacities will lead to unbalanced configurations as these memory capacities do not distribute evenly across 12 memory channels. A configuration of 512 GB on 14G Skylake is possible but suboptimal, as shown in Figure 4. Across CPU models (Platinum 8176 down to Bronze 3106), there is a 65% to 35% drop in memory bandwidth when using an unbalanced memory configuration when compared to a balanced memory configuration! The figure compares 512 GB to 384 GB, but the same conclusion holds for 512 GB vs 768 GB as Figure 2 has shown us that a balanced 384 GB configuration performs similarly to a balanced 768 GB configuration.

Figure 4

Near-balanced memory configurations

The question that arises is - Is there a reasonable configuration that would work for capacities close to 256GB without having to go all the way to a 384GB configuration, and close to 512GB without having to raise the capacity all the way to 768GB? Dell EMC systems do allow mixing different memory modules, and this is described in more detail in the server owner manual. For example, the Dell EMC PowerEdge R640 has 24 memory slots with 12 slots per processor. Each processorā€™s set of 12 slots is organized across 6 channels with 2 slots per channel. In each channel, the first slot is identified by the white release tab while the second slot tab is black. Here is an extract of the memory population guidelines that permit mixing DIMM capacities.

General memory module installation guidelines

NOTE: If your system's memory configurations fail to observe these guidelines, your system might not boot, might stop responding during memory configuration, or might operate with reduced memory.

The system supports Flexible Memory Configuration, enabling the system to be configured and run in any valid chipset architectural configuration. The following are the recommended guidelines for installing memory modules:

  • RDIMMs and LRDIMMs must not be mixed.
  • x4 and x8 DRAM based memory modules can be mixed. For more information, see the Mode-specific guidelines section.
  • Up to two RDIMMs can be populated per channel regardless of rank count.
  • Up to two LRDIMMs can be populated per channel regardless of rank count.
  • If memory modules with different speeds are installed, they will operate at the speed of the slowest installed memory module(s) or slower depending on the system DIMM configuration.
  • Populate memory module sockets only if a processor is installed. For single-processor systems, sockets A1 to A12 are available. For dual-processor systems, sockets A1 to A12 and sockets B1 to B12 are available.
  • Populate all the sockets with white release tabs first, followed by the black release tabs.
  • When mixing memory modules with different capacities, populate the sockets with memory modules with the highest capacity first. For example, if you want to mix 8 GB and 16 GB memory modules, populate 16 GB memory modules in the sockets with white release tabs and 8 GB memory modules in the sockets with black release tabs.
  • In a dual-processor configuration, the memory configuration for each processor should be identical. For example, if you populate socket A1 for processor 1, then populate socket B1 for processor 2, and so on.
  • Memory modules of different capacities can be mixed provided other memory population rules are followed (for example, 8 GB and 16 GB memory modules can be mixed).
  • Mixing of more than two memory module capacities in a system is not supported. Example 8GB, 16GB and 32GB
  • Populate six memory modules per processor (one DIMM per channel) at a time to maximize performance

So, the question is, how bad are mixed memory configurations? Below are tested valid ā€œnear-balanced configurationsā€ as described in Table 1, with the results displayed in Figure 5.

Table 1

Figure 5: Impact of near-balanced configurations. PowerEdge R640. Processor and DIMM configuration as noted. All 2666 MT/s memory.

Figure 5 illustrates that near-balanced configurations are a reasonable alternative when the memory capacity requirements demand a compromise. All memory channels are populated, and this helps with the memory bandwidth. The 288 GB configuration uses single ranked 8GB DIMMs and we see the penalty single ranked DIMMS impose on the memory bandwidth.

Conclusion

Balancing memory with IntelĀ® Xeon™ scalable processors increases memory bandwidth and reduces memory access latency. If memory is populated into a near balanced or unbalanced configuration, memory bandwidth can be reduced by up to 33% from its maximum potential. Applying the balanced memory guidelines will ensure that both memory bandwidth and memory access latency are optimized, therefore ensuring peak memory performance for  Dell EMC PowerEdge servers.


Contact xByte today, [email protected], to receive a custom quote on a Dell EMC PowerEdge 14th Generation Memory.