[Cryptography] Cisco FNR; block cipher without padding

Hugo Landau hlandau at devever.net
Thu Jun 26 09:08:38 EDT 2014


On 23/06/14 09:10, TJ wrote:
> Opinions on this: "Open Sourcing FNR an Experimental Block Cipher"[1] ?
>
> [1]
> http://blogs.cisco.com/security/open-sourcing-fnr-an-experimental-block-cipher/
> _______________________________________________
> The cryptography mailing list
> cryptography at metzdowd.com
> http://www.metzdowd.com/mailman/listinfo/cryptography
One obvious use for this is ID number encryption, a fairly obscure but
interesting use for cryptography.

If ID numbers are sequentially allocated they can expose commercially
sensitive information. For example, you suppose Amazin.com is an online
retailer. You place an order at noon, and again at noon the next day.
The difference between the ID numbers tells you the approximate number
of orders occuring per day. If you assume nobody places more than one
order a day, you might take it to be a rough estimate of the number of
people who buy from Amazin.com every day.

Wikipedia has some information on this under "German tank problem":

  https://en.wikipedia.org/wiki/German_tank_problem

Obviously the use of stream ciphers to encipher ID numbers is out. On
the other hand, block ciphers are seemingly perfect for the application,
since they produce an invertible mapping to a random permutation of the
input domain given a single key. Most modern block ciphers use block
sizes of at least 128 bits (if you have 128-bit IDs, you can just
generate random UUIDs which expose no information, so this isn't a very
useful use case), but for example Blowfish is 64-bit. If you wanted
something with even smaller block sizes I suppose you could construct a
block cipher out of a Feistel network using a hash function, I'm not
sure how secure this would be.

Since most block ciphers have outputs in base 2, this might be a problem
if you specify the format of your ID numbers as "five decimal digits" or
so on... base 10 ciphering is probably one of the most common cases of
"format preserving encryption". The case I have most commonly heard of
is the encipherment of credit card numbers in legacy systems which can
only handle ~16-digit decimal credit card numbers.

YouTube video IDs are obviously base64'd and appear to be random, so I
would guess that they use some sort of block cipher. I wonder if anyone
has performed a cryptanalysis?

But a greater problem of these approaches is that they tend to make
identifiers the maximum length even when the number of identifiers
allocated is low. YouTube video IDs are (as far as I am aware) as long
now as they were when YouTube began. One solution to this might be to
produce, e.g. an unciphered 64-bit sequence number, and if the high 32
bits are zero, encode it with a 32-bit block cipher instead. If the
results are encoded by some fixed length scheme (base32, say), the
length of the input/output identifier can be used to determine the
correct block cipher to be used.

So this is an obvious use case for a block cipher which can support
arbitrary block sizes.

Of course the step up from encoding N to longer encoding N+1 reveals
information, but this is presumably a case where security is traded off
for usability.

If it is desirable to stop people guessing ID numbers, and the number
space used is full enough that this becomes a possibility, this can be
mitigated simply by reserving bits in the unciphered input. For example,
in a 64-bit identifier, you could allocate the upper 16 bits as always
zero and reject any deciphered identifer which doesn't match this.

Take tinyurl.com as an example: I just generated an identifier at
tinyurl.com, then another one immediately after, and they don't appear
sequential, but the identifiers tinyurl.com used to generate certainly
used to be a lot shorter and have grown gradually. Suppose that for an
N-digit identifier to be generated, all N-1 digit identifiers must have
been generated. But if I strip a letter off the identifier tinyurl.com
gave me, I get a 404, so the identifier space is probably padded in some
manner similar to the above. (On the other hand, guessing four-character
identifiers seems to have a high success rate.)

Obviously there are limits to the concealment that such a scheme can
provide. It can't conceal the rate of ID generation if the attacker has
some means of getting a list of IDs generated, or a fixed or high
proportion thereof.

I close with an interesting case of successful ID number analysis:
Amazon AWS is a cloud infrastructure service providing hosted virtual
machines and so on. Identifiers are allocated in the format
TYPE-xxxxxxxx, where x is a hex digit. So for example, a virtual machine
(an "instance") might get allocated the identifier i-deadbeef. These
identifiers appear randomly allocated, but it turns out they are really
just obfuscated with xor:

 
http://www.jackofallclouds.com/2009/09/anatomy-of-an-amazon-ec2-resource-id/

I'm no cryptanalyst, so there may be issues I missed here. On one hand
you might argue the whole idea is snake oil and driven by pointless
vanity (there are probably a lot of companies which hate the idea of
their first customer using a new ordering system seeing that their order
is "order 1"), but on the other hand the German tank issue shows that
there can be a genuine need to conceal this sort of information. It is a
little bizarre that YouTube, which is happy to announce how many videos
people upload to it, uses such opaque identifiers; whereas Amazon, which
doesn't wish to reveal this information, uses such flimsy protection.

Hugo Landau


More information about the cryptography mailing list