[Cryptography] [Crypto-practicum] Justify the sequence of operations in CTR mode.

Tue Feb 23 19:50:21 EST 2016

On Tue, Feb 23, 2016 at 04:17:58PM -0500, Jerry Leichter wrote:
> 
> 1.  There's always *some* metadata that describes the current state
> of the file/object/whatever.  Sure, the actual data is in the block
> - but you have to have a pointer to it somewhere so you can find it.
> Writing the data without the pointer makes it impossible to find;
> writing the pointer without the data means that anyone following the
> pointer will get garbage.

For update-in-place file systems, you can update the data block
without updating the metadata.  For copy-on-write file systems, yes,
you can only update the data block by allocating a new data block,
writing the new data block, and then updating the extent map.  But
then you have to allocate a new block for the extent map, etc., all
the way up to the root.  Excamples of such file systems include ZFS,
btrfs, log-structed file systems, etc.

However, the performance of such file systems on random write
workloads, such as a typical enterprise database, are disastrous.  So
in practice, these file systems generally have a mode where you can
disable COW and you can update database's tablespace files using an
update-in-place write, without updating the metadata.  (ZFS is owned
by Oracle, and has this feature.  Coincidence?  I think not.  :-)

> 2.  We generally assume that a write to a disk block is atomic: It
> either completes successfully (replacing all the old data with the
> new) or fails completely (leaving the old data unchanged).
> Unfortunately, this isn't true.  I can't give a reference right now,
> but detailed studies of disk failure modes show that all kinds of
> bizarre failures can and do occur.

This is called torn writes, and it's a very well known problem.  It
happens when the disk sector size is 512 bytes, and the block size is
4k.  In this case, yes, you can have torn writes on a crash.  To
protect against this, enterprise databases have per-block checksums
built into each page (so a 4k page might only have 4088 bytes of real
data, with 4 bytes of CRC and 4 bytes of blocok number -- to protect
against blocks written to the wrong place on disk).

Journalling file systems will often use checksums in the commit block,
and will discard the last commit as being incomplete if the checksum
doesn't check out.

With advanced format disks where the physical size is 4k, and the unit
of ECC is 4k, the disk drive *does* guarantee that each 4k write will
be atomic.  The bigger problem is in order to have more efficient
writes, the HDD vendors are proposing going to 32k or 64k physical
block sizes, where the unit of atomic write is 32k or 64k.  This means
that 4k writes will now require a read-modify-write cycle, or the
operating systems will have to go through significant changes so that
they can support VM page sizes smaller than the file system block
size.

One observation I recently made just today (I'm at the FAST
conference), is that one of the problems with trying to do file
systems with encryption or cryptographic data integrity is that many
storage engineers don't understand cryptography, and many crypto
experts don't understand file systems and storage issues.  Fortunately
with ext4 encryption I've been collaborating with Michael Halcrow (who
designed bitlocker for Microsoft) who does really understand the
crypto stuff, and we have access to some of really high-powered
cryptographic experts who happen to work at Google....

	      	      	  	 - Ted