Compression theory reference?

Thu Sep 2 04:36:04 EDT 2004

It's a sad situation when you have to get a non-technical
judge to resolve academic conflicts like this,
but it's your head that you're banging against the wall, not mine.
If you want to appeal to authority, there's the FAQ,
which of course requires explaining the Usenet FAQ traditions;
perhaps you can find Lempel, Ziv, or Welch?

In reality, you could show an algorithm for which any input
gets at most _one_ bit longer, rather than arbitrarily longer.
And of course most of the compression algorithms work because
real data almost always has structure which reduces the entropy.
My information theory books from grad school have
long vanished into some closet, and were written
just about the time LZ came out so they mainly discuss
Huffman coding in the discrete-message sections,
but you should be able to find a modern book on the topic.

Matt Crawford's inductive argument is very strong -
it gives you a constructive way to say that
"for any integer N, I can give a proof for that N",
starting at 1 and working your way up,
showing that if there's a lossless coding that doesn't make
any messages of length N any longer, then it doesn't make any any shorter,
so it's not a compression method, just a permutation.

The "You could compress any message down to 1 bit"
argument is a great throwaway line, but it isn't rigorously valid.
(And if it were, you might as well compress down to zero bits while you're 
at it.)
The problem is that for most lossless compression algorithms,
some strings will get shorter (maybe even much shorter),
but some will stay the same length,
so even if you had a hypothetical "never gets longer"
compression algorithm, you can't guarantee that your
starting message would be one that got shorter as opposed to staying the same,
so you can't say that all messages would compress to zero.

If your judge doesn't like inductions that count up,
or your academic opponents insist on examining methods that count down,
Bear's argument gets you most of the way there,
by emphasizing the 1-1 mapping aspect, but you could easily get tangled.
(To do this properly, you need to do n and 2**n, but I'll use 10 for 
concreteness.)

There are 1024 10-bit messages, and only 512 9-bit messages,
so something obviously happened to the >=512 that didn't compress to 9 bits.
Maybe 512 of them didn't compress further and stayed as 10-bit;
almost certainly some of them became 8 bits or shorter.
At least one message didn't get shorter, because
         (2**10 - 1) = 2**9 + 2**8 + 2**7 ... + 2**1
So if you want to recurse downwards through repeated compression,
you need to be sure your mapping keeps track of the ones that
didn't compress the first time (maybe they'll compress the second time?),
the ones that compressed by one bit,
and the ones that compressed by more than one bit,
and avoid wandering around in a maze of twisty little passages.

So working your way up is probably cleaner.

At 11:30 AM 9/1/2004, bear wrote:
>Actually you don't need to take it all the way to the
>reductio ad absurdum here.  There are 1024 10-bit
>messages.  There are 512 9-bit messages.  You need
>to point out that since a one-to-one mapping between
>sets of different ordinality cannot exist, it is not
>possible to create something that will compress every
>ten-bit message by one bit.
>
>Or, slightly more formally, assume that a function C
>can reversibly compress any ten-bit message by one bit.
>Since there are 1024 distinct ten-bit messages, there
>must be at least 1024 distinct nine-bit messages which,
>when the reversal is applied, result in these 1024
>messages.  There are exactly 512 distinct nine-bit
>messages.  Therefore 512 >= 1024.
>
>                         Bear
>
>---------------------------------------------------------------------
>The Cryptography Mailing List
>Unsubscribe by sending "unsubscribe cryptography" to majordomo at metzdowd.com

----
Bill Stewart  bill.stewart at pobox.com 

---------------------------------------------------------------------
The Cryptography Mailing List
Unsubscribe by sending "unsubscribe cryptography" to majordomo at metzdowd.com