[Cryptography] SHA-256 decrypted (8 rounds)

Thu Mar 28 06:38:13 EDT 2024

> On 7 Jan 2024 12:06 +0000, from cryptography at metzdowd.com (McDair via cryptography):
> 
> > By decrypting, I mean finding a preimage.
> 
> 
> Thank you. That should help clarify a few things.
> 
> > In code, I found 'Encrypt' and 'Decrypt' clearer counterparts than
> > 'Hash' and 'FindPreImage' or something like that. Especially for
> > subroutines.
> > Also 'Encryption' is a general term wrt cryptography/cryptology.
> 
> 
> No, it is not.
> 
> Allow me to one more time go over some basic terminology for a moment.
> 
> Encryption is a one-to-one mapping selected by a key from a
> plaintext to a ciphertext.
> 
> Similarly, decryption is a one-to-one mapping selected by a key from
> a ciphertext to a plaintext.
> 
> That is, for any single key k, D_k(E_k(P)) == P for all values of P.
> Also ideally, given E_k(P) but not k it should be computationally hard
> to find k' to produce D_k'(E_k(P)) == P, or to find P by any other
> means. (Asymmetric encryption somewhat increases the complexity of
> what the "key" is, but the principle remains exactly the same.)
> 
> That there must exist a one-to-one mapping from plaintext to
> ciphertext to plaintext implies that the ciphertext must be at least
> equal in size to the plaintext; otherwise more than one plaintext must
> correspond to the same ciphertext, which in turn means that the
> one-to-one mapping requirement during decryption cannot hold.
> (Compression of plaintext before encryption is a different matter
> entirely.)
> 
> Correspondingly, hashing is taking an arbitrary-length input (the
> preimage) and deterministically producing a fixed-length output (the
> hash) based on the input.
> 
> That is, for hashing, for all values of P, there exists some value h =
> H(P); in this case, P is the preimage. Ideally, given h but not P, it
> should be computationally hard to determine any value P' such that h =
> H(P'); and given any value for P, it should be computationally hard to
> determine any value P' != P such that H(P) = H(P'). (Given P,
> computing h = H(P) is trivial.)
> 
> As you should be able to see, hashing and encryption/decryption
> mathematically have very different properties.
> 
> Even if we're willing to fudge the usage of "plaintext" and
> "ciphertext" (with respect to encryption and decryption) to mean
> "preimage" and "hash" respectively, the definitions of encryption and
> decryption are still meaningless in the case of hash functions
> because:
> 
> - hash functions don't produce one-to-one mappings (such mappings in
> the case of hash functions is a mathematical impossibility for the
> general case, as the input space is larger than the output space)
> 
> - hash functions don't have a key (although the input can include
> key-like material, in which case that key-like material is simply a
> part of the preimage)
> 
> Attacks on reduced-round versions of algorithms are valid, and often
> published in the literature.
> 
> > Additionally, when the message is supposed to stay hidden (password
> > hashes for instance), the message is arguably the key, or the key is
> > the message. In that case 'decryption' also makes sense imo.
> 
> 
> Sorry, but you can't just go around redefining and inventing terms of
> your own just because it "makes sense in [your] opinion". Terms have
> established meanings to facilitate communication. I can't go around
> calling the yellow highrise building across the street "banana" and
> expect that everyone will readily know what I'm talking about just
> because bananas are also yellow.
> 
> Correct usage of long-established terminology not only means that
> people are more likely to take you seriously; it also means that
> people are more likely to know what you are even talking about.
> 
> > The 'reconstruction' method finds the exact original input message
> > in case of maximum 8 rounds.
> > 8 rounds of SHA-256 covers eight 32-bit integers, or 256 bits. Which
> > matches the output (hash) size.
> 
> 
> So if a hash algorithm were to use 16-bit integers but produce a
> 256-bit hash, your method would work for 16 rounds because 16 x 16 =
> 256?
> 
> Sorry, but this claim makes no sense, as you stated it above.
> 
> SHA-256 uses eight 32-bit values internally and another eight output
> variables which are concatenated to form the output hash; that much is
> true. But the use of eight values has nothing to do with "eight
> rounds", which should be obvious if you look at the pseudocode for
> SHA-256 on Wikipedia: https://en.wikipedia.org/wiki/SHA-2#Pseudocode
> 
> > As you can see later on, when expanded to the full 64 rounds, the
> > method finds 'a' valid input message (so not necessarily the
> > original message). Valid at least from the perspective of the main
> > compression function.
> 
> > Finding a preimage (again, not taking into account additional
> > validation), even for 64 rounds happens therefore in negligible
> > time.
> 
> 
> So please show that. If you actually do have something which will,
> in anywhere near reasonable time and memory (let alone "negligible
> time"), find any preimage matching a given hash for full 64-round
> SHA-256, that is enormous. Even if only for certain applications,
> it's a total break of a hash function widely believed to be secure,
> and might well call into serious question the security of
> Merkle-Damgård constructions for hashing in general. To reiterate: if
> true, then that is ENORMOUS; likely well on par with an actual working
> quantum computer that can run Shor's algorithm to factor
> thousands-bits semiprimes in reasonable time.
> 
> $ dd if=/dev/random of=preimage bs=32 count=1
> 1+0 records in
> 1+0 records out
> 32 bytes copied, XXXX s, XXX kB/s
> $ sha256sum -b preimage
> deb360ae3c1ff7a29f83731b33dcd4bf354a5e80de2dc50370ebf55a14216b85 *preimage
> $ stat -c %s preimage
> 32
> $ sha1sum -b preimage
> 32f1304dd130c651ef3946988b5115ffe5d9ae4b *preimage
> $ sha512sum -b preimage
> efa85cea812e973de7262b52c72f325f2e76d47b67fb49462d12adc51ae214c35acb8fa90169b6f27ac57e00d99aff2c2d76fab7a9f906a54e1f59362bb2c901
> *preimage
> $ md5sum -b preimage
> 5af938f53ebeddd73ce7f2563d0177d7 *preimage
> $ b2sum -b preimage
> 80db70e650d08fa7fdca6b2826e67b3d581efcb7754c6d445de3b1b41dd7ec32b16c10de7dad8bc19a839a0233ca5fe6757934cdf1275b3e527adc7bb09d6f50
> *preimage
> $ gpg --encrypt --sign --armor --no-default-recipient --recipient 33E227B509A455F7843E652CDC750207C2AF4F35 -u 33E227B509A455F7843E652CDC750207C2AF4F35 --quiet --output - preimage > encrypted-preimage.asc
> 
> $ gpg --export --armor 33E227B509A455F7843E652CDC750207C2AF4F35 > 33E227B509A455F7843E652CDC750207C2AF4F35.pub.asc
> 
> $
> 
> If your claim is that your attack can be expanded to full SHA-256,
> then let's please have a matching preimage (any matching preimage
> will do) for that exact SHA-256 hash. (I'm making it easier by
> restricting the preimage to 32 bytes, so a single one-to-one mapping
> should in principle be at least possible, but I'm not restricting
> other candidate preimages to that length; and I'm including several
> other representations of the preimage to make it easier to verify
> whether you have found a preimage or the preimage. b2sum is
> BLAKE2; the others should be obvious.)
> 
> If you post any matching preimage for that SHA-256 hash, whether or
> not it matches any of the other hashes of the same preimage included
> in this post and whether or not the preimage you post is of the same
> length (32 bytes), I hereby commit to posting the secret PGP key
> corresponding to the attached public key such that anyone can decrypt
> the attached file containing the encrypted preimage and thus verify
> whether it is the exact preimage those hashes were generated from, a
> single-algorithm full-SHA-256 collision, or a multi-algorithm
> collision. Do note that the key pair for the attached public key was
> generated specifically for the purposes of this one post and is SINGLE
> USE; ONLY the file attached to this one post is signed using this key
> pair by me, and I WILL NOT use this particular key pair for any other
> purpose.
> 
> --
> Michael Kjörling 🔗 https://michael.kjorling.se
> “Remember when, on the Internet, nobody cared that you were a dog?”
> 
> _______________________________________________
> The cryptography mailing list
> cryptography at metzdowd.com
> https://www.metzdowd.com/mailman/listinfo/cryptography

> > In code, I found 'Encrypt' and 'Decrypt' clearer counterparts than
> > 'Hash' and 'FindPreImage' or something like that. Especially for
> > subroutines.
> > Also 'Encryption' is a general term wrt cryptography/cryptology.
> 
> 
> No, it is not.
> 
> Allow me to one more time go over some basic terminology for a moment.
> 
> Encryption is a one-to-one mapping selected by a key from a
> plaintext to a ciphertext.
> 
> Similarly, decryption is a one-to-one mapping selected by a key from
> a ciphertext to a plaintext.
> ...

I'm sorry but your basic terminology lesson, however educational, just won't cut it here.

Encryption in its most general meaning is about protecting secrets. Period. In this context it is not function-type specific.

When you are using the SHA-256 hash function to protect your secret (what you have done here yourself, or in case of password hashing, bitcoin, ...), you are now using the hash function as an encryption tool.
It is also common terminology in such cases to refer to the hash function input message as 'key'.

Moreover, 8 rounds of SHA-256, which was the subject, is *in fact* decryptable in the sense you have specified, meaning a one-to-one mapping between output and original input exists, as shown before (code was provided for anyone to test for themselves). 
Additionally, the inner workings of the SHA-2 compression functions in particular bear similarities with key-based encryption, in the sense that it can be thought of using an internal key (based on the message) to convert from plaintext (based on the IV) to ciphertext. The SHACAL-2 block cipher is based on this.

> > The 'reconstruction' method finds the exact original input message
> > in case of maximum 8 rounds.
> > 8 rounds of SHA-256 covers eight 32-bit integers, or 256 bits. Which
> > matches the output (hash) size.
> 
> 
> So if a hash algorithm were to use 16-bit integers but produce a
> 256-bit hash, your method would work for 16 rounds because 16 x 16 =
> 256?
> 
> Sorry, but this claim makes no sense, as you stated it above.
> 
> SHA-256 uses eight 32-bit values internally and another eight output
> variables which are concatenated to form the output hash; that much is
> true. But the use of eight values has nothing to do with "eight
> rounds", which should be obvious if you look at the pseudocode for
> SHA-256 on Wikipedia: https://en.wikipedia.org/wiki/SHA-2#Pseudocode

I recommend that you take another look then, because that is exactly how (and why) it works. 
Each round processes a single 32-bit value (word) derived from the original input message. 
For this specific function it is possible to find (and be certain of) the exact original message when the input space matches (or is smaller than) the output space (in case of 8 rounds or less).

> Attacks on reduced-round versions of algorithms are valid, and often
> published in the literature.

I’m not sure how much weight your opinion has here, as it seems the concept of rounds is not entirely clear to you, but yes, they are valid, I never claimed otherwise.

> If your claim is that your attack can be expanded to full SHA-256,
> then let's please have a matching preimage (any matching preimage
> will do) for that exact SHA-256 hash.

Firstly, there are no ‘claims’ here. I put in the effort of providing code so you could verify for yourself.
The inability or unwillingness to examine and/or run the code doesn’t turn this into a claim. 
Being dismissive without taking a proper look (‘at first glance it doesn’t work’) is not productive.

Secondly, to summarize previous findings:
-	Deterministically retrieve the original input message in case of 8 rounds of SHA-256.
In case you are still not convinced of this, I suggest you create and provide a single block 8-round hash and I will send you the original input message.

-	Deterministically find a 64-round preimage and collision with respect to the SHA-256 main compression function excluding padding scheme and extended words validation. This doesn’t necessarily mean a full-fledged 64-round preimage of the overall hash function; I was clear about this.

I’ll create a separate thread for your challenge specifically and share additional progress there.

McDair