Compression theory reference?
Hadmut Danisch
hadmut at danisch.de
Tue Aug 31 08:48:00 EDT 2004
Hi,
I need a literature reference for a simple problem of
encoding/compression theory:
It can be easily shown that there is no lossless
compression method which can effectively compress every possible
input. Proof is easy: In a first step, consider all
possible messages of length n bit, n>0. There are 2^n different
ones. But there are only (2^n)-1 shorter messages, so
there is no injektive encoding to encode all messages into
a shorter one. And then all codewords of length <n are occupied,
so it is impossible to compress messages shorter than n bit.
So when trying to compress a message of length m, m<n, it
must be encoded in to a codeword of at least n bit, thus
longer than m and not effectively compressed. (And you'd even
have to consider the entropy of the eom sign or the bit counter)
Or in other words: For every input word which is compressed into
a shorter codeword, there must be another, shorter input word, which
cannot be effectively compressed, but gets longer - if the
algorithm/function should be able to accept any input and should be
lossless, i.e. for any input a decompress(compress(a))=a.
Thus, for every lossless compression method, which can accept any
input message and is not completely useless (i.e. there is at least
one message which's codeword is shorter than the message), there is
at least one input which's codeword is longer than the message.
As far as I know that's stuff of the early semesters of computer
science to learn, that in theory there is no universal lossless method
compressing everything. Lossless compression is the idea to encode
those messages with higher probabilities into shorter codewords, and
those with lesser probability into longer codewords, thus reducing
the average length of the messages, not the length of every single
message.
As I mentioned earlier, I have some trouble with a computer science
department. They do not want to believe that there is no such
algorithm, and they claim that there are such algorithms which can
compress everything without loss and without any input resulting into
a longer codeword. They claimed that Lempel-Ziv and MTF (Move to
Front) would do the job. I've given counterexamples in LZ, showed
that gzip on a file filled with random numbers results in a bigger
file, and showed that MTF is not a compression at all, since it does
not change the length of a message. They don't understand.
Therefore, I need a book about computer science or encoding theory,
which explicitely says that this is impossible, in a way that a person
unexperienced in computer science (judges at a court) can read and
at least see that this is not just my personal phantasy and at least
somehow reasonable.
I have checked several books about information and coding theory,
but didn't find any suitable for readers without computer science
background. The statement is always hidden in formulas about
entropy, average code lengths, code efficiency inequations etc.
If you had such an encoding scheme, you could easily show that
the average length is below the entropy of the efficiency is >100%.
But a non-computer science person does not understand that.
Does anybody know a book about coding theory which explicitely states
the impossibility of such a compression method in plain language?
regards
Hadmut
---------------------------------------------------------------------
The Cryptography Mailing List
Unsubscribe by sending "unsubscribe cryptography" to majordomo at metzdowd.com
More information about the cryptography
mailing list