Truncating SHA2 hashes vs shortening a MAC for ZFS Crypto

Tue Nov 3 23:11:07 EST 2009

Darren J Moffat wrote:
> David-Sarah Hopwood wrote:
>> Of course we also need to consider the space constraints. 384 bits
>> would fit two 192-bit hashes for the plaintext and ciphertext
>> verifiers; but then we would have no space to accomodate the
>> ciphertext expansion that results from encrypting the salt together
>> with the plaintext.

I notice from the "ZFS On-Disk Specification" linked from
<http://hub.opensolaris.org/bin/view/Community+Group+zfs/docs>
(illustration 8 on page 15) that the block pointer structure includes
192 bits labelled as "padding", in addition to the 256 bits labelled as
"checksum". The padding is described in section 2.12 as "space reserved
for future use". I'm not sure how up-to-date that spec is; has this
space already been used?

> Which is is where I was originally, the IV wasn't stored because there
> was a way to derive a secure (for CCM/GCM use) IV from other things in
> the block pointer.

Ah. There is another important design constraint that I hadn't considered:
when dedupe is enabled, you must use convergent encryption. That is, the
encryption must be a deterministic function of (dataset_key, plaintext),
so that duplicated plaintext blocks in a given dataset will encrypt to
duplicated ciphertext blocks.

(Or is it also desired to converge duplicated blocks across datasets?
That would be more complicated.)

Given this constraint, there is no *additional* loss of security from
deriving the IV from a hash of the plaintext, or a MAC of the plaintext [*].
This does leak some information, but only information that is necessarily
leaked by any convergent encryption scheme.

(See the thread starting at
<http://allmydata.org/pipermail/tahoe-dev/2008-March/000449.html> for
potential attacks on convergent encryption -- although some of them do not
apply if we are only trying to converge for a given encryption key. You
might want to use a different protocol when dedupe is not enabled, but
let's consider the dedupe case first: stronger design constraints can
often make a problem simpler.)

Straw-man suggestion:

  mac = MAC[dataset_mac_key](plaintext)
  iv = Hash1(mac)
  ciphertext = Encrypt[dataset_enc_key](iv, plaintext)

  Store (mac, Hash2(ciphertext)) in the block pointer.
  Use Hash2(ciphertext) as a dedupe tag.

The MAC and hash comfortably fit into 384 bits, e.g. a 128-bit MAC and
256-bit ciphertext hash. Any encryption mode can be used (it does not
have to be an authenticated mode because the MAC is computed separately).
Of course, some modes will have ciphertext expansion, and the mode
affects what security properties are needed for the IV. In some cases
it might be possible to simplify by using the MAC tag directly as the
IV, depending on its length.

A disadvantage of this scheme is that you have to perform the whole
MAC computation and then the whole encryption (or decryption and then
MAC); you cannot do them in parallel. I can't immediately see any way
to get around that without potentially introducing weaknesses.
The encryption and MAC can both be individually parallelisable (although
I don't know of any parellelisable MACs that are not patent-encumbered).
For example, it is possible to use CTR mode for the encryption [*],
which would have no ciphertext expansion.

[*] Caveat: for CTR-based modes (including CTR itself, CCM, GCM, and EAX
    modes), repetition of an IV for different plaintexts is catastrophic,
    because the security of the encryption effectively reduces to that of
    XOR with a repeated keystream. Compare with CBC mode, where a repeated
    IV only results in leakage of information about common initial prefixes
    of the plaintexts. Of course CTR-based modes have the advantage relative
    to feedback modes (e.g. CBC, CFB, and OFB) of being blockwise
    parallelisable.

    There would be room to fit a longer IV if using a 256-bit block cipher,
    but then you would not be able to use the most thoroughly analysed
    variant of AES.

>> I'm not familiar enough with ZFS's on-disk format to tell whether there
>> is a way around this. Note that the encrypted salt does not need to
>> be stored in the same place as either the verifiers or the rest of
>> the ciphertext.
> 
> The IV needs to be stored in ZFS because of some future ZFS features
> where I can't use a derived IV

The suggestion above doesn't need the IV to be stored separately, because
it is computed from the MAC.

> (the first of which will be the project
> that brings device eviction, this involves "moving" encrypted blocks to
> a new place and must work without the keys present).

How would disk block pointers be updated to point to the new location(s) --
via forwarding pointers at the old location(s), or by updating all existing
pointers? (I think there is a close analogy to in-memory garbage collection
algorithms here.) By the sound of "device eviction", I would guess that the
original device is going away sometime soon, and so long-term forwarding
pointers will not work, is that right?

If it must be done by updating all existing pointers and must work
without the keys present, then there is an information leak that I cannot
see how to avoid: the process that is moving disk blocks must be able to
see that two pointers point to disk blocks with the same contents. Also,
that process must be able to update the data-virtual-addresses.

So, an attacker can necessarily see this information and can update the
data-virtual-addresses as well. That seems bad.

A possible solution is to relax the design constraints so that a key is
needed, but not the same key that is used for encryption. Is that feasible?

(Again there is a remarkable correlation here between ZFS and Tahoe.
For the latter, one of the desired features for the next version is
"repair capabilities", which allow moving encrypted shares or reapplying
error-correction encoding without being able to read the data. Perhaps
I shouldn't be so surprised at this similarity, since ZFS and Tahoe are
both filesystems, but I would have expected the cases of distributed
and local filesystems to be more different.)

[...]
>> (Of course, the integrity of the OS also needs to be protected. One way
>> of doing that would be to have a TPM, or the same hardware that is used
>> for crypto, store the root hash of the Merkle tree and also the hash
>> of a boot loader that supports ZFS. Then the boot loader would load an
>> OS from the ZFS filesystem, and only that OS would be permitted to update
>> the ZFS root hash.)
> 
> That is my plan longer term, it will come partly from the ZFS crypto
> project and partly from another OpenSolaris project called Validated
> Execution.

Hmm. Based on the descriptions at
<http://hub.opensolaris.org/bin/view/Project+valex/>, the current design
for Validated Execution seems to be similar to other so-called "trusted
computing" proposals I have seen that concentrate on hashing and signing
individual executables. The point of my suggestion above was that you don't
need to do that: instead you always hash the entire filesystem.

That would not be all you need to do, but in any case, focussing on checking
hashes or signatures just of *executables* (even if you include scripts)
seems to me to be going down the wrong track; loss of integrity for data
files is often just as fatal to security as for executables. In many cases,
there is no clear distinction between data and code, anyway.

--
David-Sarah Hopwood     http://davidsarah.livejournal.com
---------------------------------------------------------------------
The Cryptography Mailing List
Unsubscribe by sending "unsubscribe cryptography" to majordomo at metzdowd.com