Traffic Analysis References

Fri Oct 20 19:23:45 EDT 2006

On 10/19/06, Leandro Meiners <lmeiners at gmail.com> wrote:
> Can anybody point me to any good references regarding traffic analysis?

This is the only interesting page I found on it:
http://guh.nu/projects/ta/safeweb/safeweb.html

There are some historical incidents that are sufficiently old to be
unclassified:

For example, the Japanese left their normal morse operators behind
when setting sail for Pearl Harbor.  They continued to send
transmissions as though they were still in Japan's waters.  Morse
operators are fairly identifiable by their rhythm and idiosyncrasies,
known collectively as their "fist".  It's just like any other behavior
performed subconsciously, like typing or signing your name; at first
there's a lot of variation, and later it becomes fairly fixed and
potentially identifying.

Also during WWII, a year before D-Day, the Allies in Scotland created
a radio net that purported to be a [nonexistent] 4th Army, ostensibly
to feint towards southern Norway.  The purpose behind this was to
further dilute Axis forces, to keep them far enough away to be unable
to participate around Normandy (there were, obviously, numerous
deception operations around D-Day).  This last bit is well documented
in "The Codebreakers", which also has numerous entries in its appendix
for Traffic Analysis.

I suspect that in many instances where traffic analysis was useful, it
was necessary to make (or learn) certain assumptions about typical
traffic patterns; that is, orders come from the top and are
disseminated down the military hierarchy, etc.; that requests for
supplies, battle damage assessments, and other feedback flows up from
the front-line troops to the logistic units or field commanders; that
traffic increases as one approaches a major military operation, etc.
In other words, it's context-specific, and may resist generalization
into easily-remembered axioms.

Also, the mixmaster and cypherpunk remailers, AT&T's crowds, and the
onion-routing groups, probably have some papers considering various
traffic analysis and correlation attacks against those systems since
they are encrypted inside the mixers.

One thing I have been interested in is the security of typical
plaintext Internet protocols when "secured" with SSL/TLS/IPSec.  If
they don't do any padding, then the length of each step of the
protocol is effectively given away; just count how much data passes to
the recipient before data starts flowing in the opposite direction.
Also, there is timing information, and it is fairly well preserved
even across the Internet (see the timing side channel attacks against
SSL).  Even if there is padding, which is basically wasted bandwidth,
it may still be possible to discern information.

I've been thinking about this, and I am not sure how to entirely avoid
it without running into other problems.  For example, Unix's
configuration files and application-level TCP/IP protocols are very
easy to interpret and troubleshoot thanks to their human-readable
strings.  The typical encrypted protocol uses non-textual,
constant-length messages, which can make it difficult to extend
without introducing incompatibilities (or even making different
responses different lengths again, the worst of both worlds).  One
doesn't typically need very extensive decoding algorithms in order to
make the plaintext data human-readable, which is good because those
decoding libraries are also processing data from remote (untrusted)
entities and form part of the attackable surface, and have proven to
be security holes on more than one occasion.

One alternative I came up with is to send the entire catalog of
possible responses at the beginning of the transmission, then refer to
them by a fixed-length index.  This would be a lot of overhead in many
cases.  Another alternative is to have a standard catalog, something
like an MIB, that may be cached between invocations.  Nevertheless,
there are many times during a protocol that you wish to dynamically
construct a response without knowing it a priori; it would seem
difficult to deal with those cases in any other way.  These approaches
could be implemented simultaneously, and perhaps one only needs to pad
when sending variable-length messages, so that "normal" common
messages don't incur any overhead (at the cost of fixed-length and
variable-length messages being distinguishable sets, but not
distinguishable individually).  In this way it is similar to what
cryptologists were doing with telegraph codebooks, which encoded
standard phrases in relatively similarly sized units, but had to spell
out anything not in the codebook using many codes (each signifying one
letter or part of a word).

If you come across any other links, please let me know as I'd like to
add them to my page on side-channel attacks:
http://www.subspacefield.org/~travis/side_channel_attacks.html
-- 
"It's not like I'm encrypting... it's just that my communications
developed a massive entropy deficiency." -><-
<URL:http://www.subspacefield.org/~travis/>
GPG fingerprint: 9D3F 395A DAC5 5CCC 9066  151D 0A6B 4098 0C55 1484

---------------------------------------------------------------------
The Cryptography Mailing List
Unsubscribe by sending "unsubscribe cryptography" to majordomo at metzdowd.com