[Cryptography] How does the size of a set of target results influence the complexity of a preimage attack?

Sat Oct 17 05:22:14 EDT 2015

Den 17 okt 2015 09:36 skrev "Rob Meijer" <pibara at gmail.com>:
> No,  there are large sets of hashes that are distributed by
organisations spending time on categorizing files and distributing the
hashes with categorization to parties wanting to use these sets of hashes
in a forensic investigations. In the forensic investigation, these hash
sets are then used with the hashes calculated over files found on media in
a set-theory way in order to focus both the automated and manual part of
the digital forensic investigation.  Some of the hash sets are used as a
"we can safely ignore these files" type of set. These include files from OS
distributions, large collections of software packages, clipart collections,
etc,etc. Basically the stuff that you don't want to waste man-hours ,
CPU-cycles or IO activity on.  Some sets are "A person needs to look at
these" type of set. These sets are things like collections of the hashes of
child pornographic images.
>
> So there are three scenario's where a preimage attack could be a problem:
>
> 1) As an anti forensic technique, making an evidence containing file have
a hash from one of the  "we can safely ignore these files" set.
> 2) As denial of service attack on the human component of the
investigation by making a relatively large set of non-legally-problematic
files match with a hash from a "A person needs to look at these" set.
> 3) As part of a targeted anti-forensic attack. Making a file that can
exploit a flaw in a viewer for its file type match with a hash from a  "A
person needs to look at these" set.
>
> So basically if an attacker is able to successfully do a first preimage
attack where the hash of the modified file matches "ANY ONE OF" of the
hashes from the large set of hashes (for what we mostly don't have the
original files).
>
> Now my question is the following:
>
> If the complexity of a first preimage attack for the used hasing algoritm
is defined as 2^{N}, and the size of the hash-set that our "ANY ONE OF"
implies is 2^{M}, what would be the worst case complexity of an  "ANY ONE
OF" geared first preimage attack?
>
> I gather it would be smaller than 2^{N} yet probably not as small as the
naive assumption of 2^{N-M}.
>
> I hope my question is making more sense now.

These are the viable attacks as of today:

MD5: constructed collisions. A legitimate file and a shady file are first
taken. You then process both files and calculate two suffixes, one per
file, that cause then both to get to get a new shared MD5 hash value. A
laptop can do this. (There's possibly also other viable attacks on MD5. )

By trying to attack supply chains of other projects to modify files they're
about to release to match, you can trigger this human DoS.

SHA1: Right now there's the freestart collision that was just proven
possible (freestart = chosen IV / prefix).

128 bit hashes can be attacked the same way with standard birthday
collisions at 2^64 operations. Also note that a larger target set reduces
the work of takes almost linearly IIRC, as 2^64 applies for one single
target hash to match. 2^16 hashes in the set to match thus reduces it to
2^40, I think.

Just stick with strong 196 bit hashes or longer. SHA256 is good enough, but
SHA3 ought to be a good choice too if you're starting from scratch.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.metzdowd.com/pipermail/cryptography/attachments/20151017/48b5273f/attachment.html>