Truncating SHA2 hashes vs shortening a MAC for ZFS Crypto

Tue Nov 3 01:31:19 EST 2009

Zooko Wilcox-O'Hearn wrote:
> Dear Darren J Moffat:
> 
> I don't understand why you need a MAC when you already have the hash of
> the ciphertext.  Does it have something to do with the fact that the
> checksum is non-cryptographic by default
> (http://docs.sun.com/app/docs/doc/819-5461/ftyue?a=view ), and is that
> still true?  Your original design document [1] said you needed a way to
> force the checksum to be SHA-256 if encryption was turned on.  But back
> then you were planning to support non-authenticating modes like CBC.  I
> guess once you dropped non-authenticating modes then you could relax
> that requirement to force the checksum to be secure.
> 
> Too bad, though!  Not only are you now tight on space in part because
> you have two integrity values where one ought to do, but also a secure
> hash of the ciphertext is actually stronger than a MAC!  A secure hash
> of the ciphertext tells whether the ciphertext is right (assuming the
> hash function is secure and implemented correctly).  Given that the
> ciphertext is right, then the plaintext is right (given that the
> encryption is implemented correctly and you use the right decryption
> key).

Hmm. That may be too many "given"s.

Tahoe (see www.allmydata.org) has an open bug to add a plaintext hash,
precisely because the encryption might not be implemented correctly or
the encryption key might not be correct:
<http://allmydata.org/trac/tahoe/ticket/453>

It seems as though ZFS (and many other protocols) is in the same position
as Tahoe, in wanting some way to validate that the ciphertext is correct
without needing the decryption key, but also wanting to minimize the risk
of some implementation error, and/or use of the wrong decryption key,
resulting in undetected errors in the plaintext.

I had something similar to the following in mind for the next update to
my proposal for Tahoe's new crypto protocol (simplified here to avoid
Tahoe-specific details and terminology):

 - a "plaintext verifier" is Hash1(index, salt, plaintext).

 - a "ciphertext verifier" is Hash2(index, ciphertext).

 - at a location determined by 'index', store:
   ciphertext = Encrypt[K](salt, plaintext)

This has the following advantages:

 - For integrity of the plaintext, you only need to assume that the
   implementation of the hash is correct. Moreover, if the hash
   implementation is not correct, that is very likely to cause it to
   fail to verify good data, which is noticeable as an error in normal
   operation. To get bad data to pass verification, the attacker would
   need to have some control over the output value of the incorrect
   hash; an error that effectively randomizes the value does not help
   them.

 - The verification also ensures integrity of the index. So, if a
   ciphertext ends up being stored in the wrong place, that will be
   detected.

 - Verification of the plaintext does not require the decryption key;
   it can be done using just the known plaintext verifier, and the
   purported values of 'salt' and 'plaintext' obtained from decryption.

   This is very important "if it must be possible to have all
   cryptographic key material stored and/or created entirely in a
   hardware device", as [1] states as a requirement for ZFS. If the
   verification can be done safely in software and if the encryption
   uses a standard mode, then it is more likely that existing crypto
   hardware, or at least hardware that has no specific dependency on
   ZFS, can be used.

 - Knowledge of the plaintext verifier by itself leaks no information
   about the plaintext, under the assumptions that the hash is oneway,
   and that there is no repetition of an (index, salt, plaintext) triple.

 - A non-malicious corruption of any of the plaintext verifier, the
   ciphertext, or the decryption key will cause the plaintext to fail
   to verify.

 - A malicious change to the ciphertext or any induced error in the
   decryption will cause the plaintext to fail to verify as long as
   the correct plaintext verifier is used.

Contrast with the case where we only use a ciphertext checksum, where
either an error in the decryption, or corruption of the decryption key,
will result in an undetected error in the plaintext.

Of course we also need to consider the space constraints. 384 bits
would fit two 192-bit hashes for the plaintext and ciphertext
verifiers; but then we would have no space to accomodate the
ciphertext expansion that results from encrypting the salt together
with the plaintext.

I'm not familiar enough with ZFS's on-disk format to tell whether there
is a way around this. Note that the encrypted salt does not need to
be stored in the same place as either the verifiers or the rest of
the ciphertext.

> A MAC on the plaintext tells you only that the plaintext was
> chosen by someone who knew the key.  See what I mean?  A MAC can't be
> used to give someone the ability to read some data while withholding
> from them the ability to alter that data.  A secure hash can.

Right. If hashes are used instead of MACs, then the integrity of the
system does not depend on keeping secrets. It only depends on preventing
the attacker from modifying the root of the Merkle tree. One consequence
of this is that if there are side-channel attacks against the
implementations of crypto algorithms, there is no information that they
can leak to an attacker that would allow compromising integrity.

(Of course, the integrity of the OS also needs to be protected. One way
of doing that would be to have a TPM, or the same hardware that is used
for crypto, store the root hash of the Merkle tree and also the hash
of a boot loader that supports ZFS. Then the boot loader would load an
OS from the ZFS filesystem, and only that OS would be permitted to update
the ZFS root hash.)

> One of the founding ideas of the whole design of ZFS was end-to-end
> integrity checking.  It does that successfully now, for the case of
> accidents, using large checksums.  If the checksum is secure then it
> also does it for the case of malice.  In contrast a MAC doesn't do
> "end-to-end" integrity checking.

A cryptographic checksum on the ciphertext alone doesn't do end-to-end
integrity checking either. Even if everything is implemented correctly
and there are no hardware errors, it doesn't verify the integrity of the
decryption key.

> For example, if you've previously
> allowed someone to read a filesystem (i.e., you've given them access to
> the key), but you never gave them permission to write to it, but they
> are able to exploit the isses that you mention at the beginning of [1]
> such as "Untrusted path to SAN", then the MAC can't stop them from
> altering the file, nor can the non-secure checksum, but a secure hash
> can (provided that they can't overwrite all the way up the Merkle Tree
> of the whole pool and any copies of the Merkle Tree root hash).

The scheme I suggested above also has that advantage: if you have a
plaintext verifier, then you can check the integrity of the plaintext
even if an attacker knows the decryption key (and no separate MAC key is
needed).

> Likewise, a secure hash can be relied on as a dedupe tag *even* if
> someone with malicious intent may have slipped data into the pool.  An
> insecure hash or a MAC tag can't -- a malicious actor could submit data
> which would cause a collision in an insecure hash or a MAC tag, causing
> tag-based dedupe to mistakenly unify two different blocks.

I agree. I don't think that Darren Moffat was suggesting to use the MAC tag
for dedupe. I also agree that a hash used for dedupe needs to be quite long
(256 bits would be nice, but 192 is probably OK).

> [1]
> http://hub.opensolaris.org/bin/download/Project+zfs%2Dcrypto/files/zfs%2Dcrypto%2Ddesign.pdf

--
David-Sarah Hopwood     http://davidsarah.livejournal.com

---------------------------------------------------------------------
The Cryptography Mailing List
Unsubscribe by sending "unsubscribe cryptography" to majordomo at metzdowd.com