[Cryptography] Ex-CIA Joshua Schulte Describes His Data/Crypto Hiding Prowess

Thu Mar 5 08:45:07 EST 2020

Hi,

> That is indeed one of the considerations -- spare blocks for remapping
> -- and even rotating media does that these days.

Remapping bad blocks has been done for a long time on rotating media.

> Drive makers have huge
> incentives to push what's possible, and large numbers of spare blocks
> cover many sins.

Well, I wouldn't call it sins. The involved physical processes have their physical limits. And there are lot of things where you have to make trade-offs during design and production (on the other hand, those tradeoffs are a possibility to optimize for a specific target application). At the moment it seems to me that it's impossible to produce 100% perfect flash memory in an economic way. And just throwing away 99% of the produced flash because it's only 99% perfect would be a huge waste of ressources from my point of view. So the manufacturers have to deal with imperfections. 

> It also gives a mechanism to signal to the OS or
> diagnostics that the drive is failing but not yet failed. 

> Remapped blocks are almost generally being fussy about being written,
> but still can be read. Thus they have potentially dangerous data that
> cannot erased. 

There are 2 technologies: Remapping and Wear-Leveling. Remapping only remaps single blocks, where you have a low risk of leaking data from my point of view. Wear-Leveling on the other hand never overwrites the pages, and it keeps the old data in place until all other space is reused, then it clears/flashes the whole block, and then it copies the new data and other old pages into the empty block. 
HDDs usually use Remapping and Flash usually uses Wear-Leveling, but for both categories I know about exceptions. Economic-wise, Wear-Leveling has a higher demand for RAM inside the disk, because it needs to know for every block or every page (or some other addressing granularity) where it is. 
For further information search for Flash-Translation-Layer (FTL). e.g. pages 25-29 of http://www2.futureware.at/~philipp/ssd/TheMissingManual.pdf

> On all bleeding-edge technologies, it's hard to say anything definitive
> because they're actively changing. My understanding is that SSDs in
> particular are still so bleeding-edge (especially in high-capacity,
> high-speed cases) that extrapolation from any isolated fact is hard.
> From rumors I have heard, your 7.3% seems on the low side. 

The numbers I heard are 1-2% for low cost products, 7% for consumer products, 50% or more for enterprise products. But this likely depends on the manufacturer, and in some areas it also depends on the reseller, because resellers have "Mass Production Tools", where they can arbitrarily define the percentage they want for their specific application. So in practice, anything is possible.
And on the other hand it can also be increased by the user with the "Host Protected Area" or a partitioning scheme that does not use the whole SSD.

> I've heard
> loose talk that some high-reliability drives might be much, much
> higher, particularly when there are manufacturing changes.

Yes, 50% or more is what I heard.

Identifying the percentage should be rather easy: Open the disk (or search the internet for a photo of an opened disk), identify the flash chips, lookup the capacity of the flash chips, and see how much the SSD claims to provide.

> For example, let's suppose we have a manufacturing line that makes 2TB
> "datacenter" drives and another one that makes 6TB "desktop" drives.
> Let's also suppose that each drive costs $50 to manufacture and sell
> fors $150. Now let's suppose we want to start to build 10TB drives with
> a new process. If we put new firmware on the 6TB drives so that they
> have 2TB active storage and 4TB of spare space, we can almost certainly
> hit the reliability metrics of the datacenter drives using the same
> guts as the desktop drives.

You dont need new firmware. All firmware I have seen and heard of is configurable. During manufacturing and often also afterwards.

> That's the sort of crazy talk I've heard
> that sounds plausible and while I have a raised eyebrow, I've seen
> wackier things in my day.

The deeper you dig the more layers with "spare" space you encounter. There is the Host-Protected-Area which is visible to the operating system if it knows about HPA but most operating systems likely dont care about it. The HPA is often used by RAID controllers and other system tools. And I heard recommendations to increase the HPA or to reduce the partitions used, to voluntarily reduce the used size of the SSD and to increase the size of the spare blocks to enlarge the lifetime.
Then inside the drive there are spare blocks, and they need to be there, because the wear-leveling always merges pages from old blocks into a new block, so it needs space to do that if the disk would be full. The blocks usually have spare pages, and the pages themselves also have a lot of space in them which is used for tracking the usage, for ECC, and I would not be surprised if there isnt even some space left that might be reserved for formware updates.
And by the way, there are also complex ECC (Error correction codes) in place, some of them even doing something like a RAID between the individual chips in a SSD...

Best regards,
Philipp Gühring