Make 2022 a more stable year, use ECC Memory

Are you annoyed at those random game crashes, Blue Screens and weird behavior from your applications? Do you create documents or media you don’t want to become corrupted? Then your next upgrade should include ECC Memory.

What is ECC memory?

Error correction code (ECC) memory detects and corrects bit flips (data corruption) in the data stored or transported as part of the computer’s memory. It stores a small amount of data about a larger section of data, and can recover the bad piece if only one is missing.

Best analogy I can think of is a simple Lego structure. If you have eight different sized Lego blocks on a set and know the weight of the entire set, if you weigh it again and the weight is less by a single block’s weight, you know what is missing. Whereas traditional memory wouldn’t ever weigh it and never know if something was wrong with said Lego structure.

There are actually several different types of ECC. DDR5 will come with “on-chip” ECC, but when talking causally or buying memory with ECC support, we mean “side-band” ECC. And both “side-band” and “on-chip” ECC can work together to make an even more robust system.

Does it really matter?

Yes. Your bits are being flipped and you don’t even realize it. If you are using 16GB of memory, you’ll be experiencing around three bit flips per hour. Keep in mind that means full on hammering that ram, not just it sitting idle. I did some testing myself on a system by running stress-ng on 160GB of ECC ram for an hour and the memory corrected 24 bit flips in that time. AKA 1 flip in every 6.7GB of ram per hour.

Now those numbers seem really scary, but let’s take it down to a more normal use level. Checking on my NAS server using 15GB of it’s ram overall, it had on average one reported ECC bit flip fix every 41.4 hours this past year. Though that may be highly underestimated because of background ECC correction (memory scrubbing) not being reported to the OS.

Obviously the bit flip doesn’t make a huge difference most of the time because everyone’s systems aren’t constantly crashing. However, I guarantee someone reading this has had a crash due to an unsuspected bit flip, and never truly knew the real culprit.

What systems support it?

Sadly, not all of them. Intel has gone the route of not including it in consumer chips at all recently, and AMD lets motherboard manufactures decide if they support it.

Chip SeriesAudiencePlatformSupport
AMD ZenConsumerAM4Varies by motherboard
(most AsRock support ECC)
AMD ThreadripperProsumerTR4/sTRX4Varies by motherboard
AMD EPYCServerSP3Yes
Intel CoreConsumerLGA 1700No
Intel XeonServerFCBGA1787Yes

If you’re building anything for a NAS or home server you need to be extra careful to select NAS boxes that support ECC ram or already come with it. As the TrueNAS community guide (PDF) states about ECC: “If you’re going to do it, do it right.”