DMCA Crypto Software

Ian Clelland ian at veryfresh.com
Fri Apr 18 07:39:04 EDT 2003


On Fri, Apr 18, 2003 at 01:50:04AM +0200, Nomen Nescio wrote:
> CR advocates "forensic watermarking".  In the longer report (available by
> email request) they describe this as a system where there are two versions
> of selected portions of the content - for example, two alternate versions
> of a particular movie frame.  There would be multiple such "polymorphs"
> throughout the content, and each device would have keys such that for each
> polymorph it would see only one version.  By randomizing and encrypting
> the frames it can be arranged that the devices can't even tell which
> frames are polymorphic.  The set of keys assigned to a playback device
> implicitly identifies the device itself, so that if an unprotected version
> of the movie is released, the specific versions of the polymorphs that
> are present will reveal which device did the decryption.
> 
> The obvious attack is to combine the output from multiple devices
> from which keys have been scraped, but this does not work (up to a
> point) because even when multiple devices are used, there is still
> enough information in the output to identify which specific devices
> were involved.  CR gives an example of a 90 minute movie, 30 frames per
> second, with 1% of the frames being polymorphic - 1620 frames.  Even if
> an adversary breaks into 4 playback devices and gets their keys in order
> to identify the polymorph frames, the manufacturer can identify those four
> devices with an error probability, according to the formula derived by the
> CR report, of less than 4 x 10^(-10), an extremely good detection rate.
> 
> But what happens if you use the CR formula with the assumption that
> the attacker cracks one more device for a total of 5?  Suddenly the
> system doesn't work so well, and there are over 10^20 possible sets
> of 5 devices that could produce the combined output!  We go from
> 4 x 10^(-10) to 10^20 with just one more device.  This kind of exponential
> explosion is common to many traitor tracing schemes.  The attackers
> have an inherent mathematical advantage which is very hard to address.
> All this is glossed over in the CR analysis.

Can you post at least an overview of this formula, or describe how it is
constructed?

My take on the math is this: Suppose that a 90 minute movie contains
1620 'polymorphic frames,' each version of which movie contains one of
two alternate versions of each such frame. Then every version of the
movie has effectively been watermarked with a 1620-bit id. All an
attacker needs to do to release a watermark-stripped version is to find
both versions of each of the 1620 key frames, and put together a new
version of the movie, with a random selection for each frame. This new
version cannot be traced back to any particular source, so the attacker
cannot be identified.

If an attacker can only obtain a small number of 'official' versions
with which to work, then his chances of success depend to a large extent
on the total number of official versions in existence. (Someone, whether
the company releasing the movie, or the company doing the watermarking,
is going to have to keep track of all of the official versions, and whom
they are licensed to).

If the attacker can only obtain four versions, then he will have access
to both versions of 15/16 of the polymorphic frames (assuming that
watermark bits appear random). Knowing their positions within the movie
file, he can randomise them, effectively obscuring 1508 bits of the
watermark. However, the remaining 102 bits will identify the four source
files with very high probability.

Assuming that there are 1 billion legitimate copies of the movie
(roughly 2^30), then the odds are only 2^30/2^102 = 1 in 2^72, or 1 in
4*10^21, that any other copy shares those 102 bits with the four source
files.

With five source files, the situation changes a bit - there are now only
51 identifying bits, and a 1 in 2^21 (1 in 2*10^6) chance that another
source file might match. Still not good, though.

With a sixth source, the attacker has cut the number of identifying bits
to 25. In this scenario, you can expect there to be about 32 copies out
of 1 billion which share those bits. That's 26 others on top of his 6.
Much better odds, but still not a lot to hide behind.

Of course, the attack only gets better with each additional source, and
after a while (after about lg 1620 = 11 sources,) it is very likely that
an attacker can put together all versions of every polymorphic frames,
and there is no way to track it at all.


Anyway, I'm very curious to know where the original numbers (4*10^-10
and 10^20) came from. Or perhaps my reasoning is just way off on this.


Ian Clelland
<ian at veryfresh.com>

---------------------------------------------------------------------
The Cryptography Mailing List
Unsubscribe by sending "unsubscribe cryptography" to majordomo at metzdowd.com



More information about the cryptography mailing list