Compression theory reference?

Hadmut Danisch hadmut at danisch.de
Tue Aug 31 08:48:00 EDT 2004


Hi,

I need a literature reference for a simple problem of 
encoding/compression theory:

It can be easily shown that there is no lossless 
compression method which can effectively compress every possible
input. Proof is easy: In a first step, consider all 
possible messages of length n bit, n>0. There are 2^n different
ones. But there are only (2^n)-1 shorter messages, so 
there is no injektive encoding to encode all messages into
a shorter one. And then all codewords of length <n are occupied, 
so it is impossible to compress messages shorter than n bit.
So when trying to compress a message of length m, m<n, it 
must be encoded in to a codeword of at least n bit, thus 
longer than m and not effectively compressed. (And you'd even 
have to consider the entropy of the eom sign or the bit counter)

Or in other words: For every input word which is compressed into 
a shorter codeword, there must be another, shorter input word, which 
cannot be effectively compressed, but gets longer - if the
algorithm/function should be able to accept any input and should be
lossless, i.e. for any input a   decompress(compress(a))=a.

Thus, for every lossless compression method, which can accept any 
input message and is not completely useless (i.e. there is at least 
one message which's codeword is shorter than the message), there is 
at least one input which's codeword is longer than the message. 

As far as I know that's stuff of the early semesters of computer
science to learn, that in theory there is no universal lossless method 
compressing everything. Lossless compression is the idea to encode
those messages with higher probabilities into shorter codewords, and 
those with lesser probability into longer codewords, thus reducing
the average length of the messages, not the length of every single
message. 


As I mentioned earlier, I have some trouble with a computer science 
department. They do not want to believe that there is no such
algorithm, and they claim that there are such algorithms which can 
compress everything without loss and without any input resulting into 
a longer codeword. They claimed that Lempel-Ziv and MTF (Move to
Front) would do the job. I've given counterexamples in LZ, showed 
that gzip on a file filled with random numbers results in a bigger
file, and showed that MTF is not a compression at all, since it does
not change the length of a message. They don't understand.

Therefore, I need a book about computer science or encoding theory,
which explicitely says that this is impossible, in a way that a person
unexperienced in computer science (judges at a court) can read and 
at least see that this is not just my personal phantasy and at least
somehow reasonable. 

I have checked several books about information and coding theory, 
but didn't find any suitable for readers without computer science 
background. The statement is always hidden in formulas about
entropy, average code lengths, code efficiency inequations etc.
If you had such an encoding scheme, you could easily show that 
the average length is below the entropy of the efficiency is >100%.  
But a non-computer science person does not understand that. 

Does anybody know a book about coding theory which explicitely states
the impossibility of such a compression method in plain language?

regards
Hadmut







---------------------------------------------------------------------
The Cryptography Mailing List
Unsubscribe by sending "unsubscribe cryptography" to majordomo at metzdowd.com



More information about the cryptography mailing list