[Cryptography] Steganography and bringing encryption to a piece of paper

Fri Jul 18 19:45:05 EDT 2014

On 2014-07-18, Bear wrote:

>> Steganography has been around for a long time but the problem with 
>> these techniques is that they are easily defeated.

No. The trouble is that most people who build these stego applications, 
don't seem to read their literature at *all*. For some reason, unlike in 
the rest of the crypto circuit, those who actually code stego work at 
the script kiddie level, instead of the PhD one -- which really does 
exist even for stego, as part of the information theoretical viewpoint 
of things.

>> The objective with Steno.io is to bring the robustness of an 
>> electronic encryption algorithm to paper.

Seriously, that it just stupid. It has absolutely nothing to do with 
hardcore, statistical data hiding.

I mean, I've been thinking about how to do that for a couple of years 
now, so as to hide low rate text messaging within telephone audio calls. 
The best I've come up with are a couple of DFT and DHT compatible syncro 
waveforms, with baseband direct sequence spread sprectrum, resynchro 
algoriths keyed on GSM's line protocol, stochastic waveshaping to throw 
any cheap, network-wide statistical recognizer off, and whatnot.

And yet I can be nowhere sure they couldn't detect the transmission 
amongst the utility signal, en masse, in any case. So that I shouldn't 
even start *coding* my solution as of now.

So what *is* it with you people? Can't you see that steganography really 
starts and ends with information and coding theory, unlike cryptography? 
Its bounds really necessarily and from the start have to do with noise 
and uncertainty, whereas crypto protocols only deal with clean data and 
computational complexity (eventually, preferably, proven-to-be-hard 
one-way-functions). Steganography really is its own, separate field, 
eventhough it shares most of the randomness, signal processing, 
complexity and whatnot, framework, with current crypto proper. 
(Especially the symmetrical and streaming kind, BTW, which might be a 
problem aand a subject for further study.)

> Okay, I have a hypothetical. Let's call it the "Voynich alternative." 
> Redirecting intellectual effort from cryptography as such to 
> linguistics could plausibly result in an arguably practical system of 
> storing handwritten information privately. It would be a system of 
> limited utility at best because you'd have to actually spend up to a 
> year or two internalizing the system in your own squishy brain before 
> it would be usable to you, or your correspondent.

On the other hand, let's squish the brain and still do proper 
steganography. Also proper linguistics. With the silicon brain. How many 
covert bits can you really fit into a Twitter post, before NSA's silicon 
brain flags it as being terrorist? That's the steganographic competition 
for real! And we'd win it simply by numbers, if we just built a proper 
protocol...with the numbers actually utilizing it. So then, how do we 
build the protocol, and especially incentivize the numbers to adopt it?

> Let's imagine that there is a person who is a conlang hobbyist and has 
> a diary which he keeps in an entirely made-up language.

Never use a conlang for this sort of thing. They're too easy to parse 
and much too dense to embed information in. Use something like my mother 
tongue, Finnish: rather regular even in orthography, but still a natural 
language ripe with opportunities for embedding. Especially in its 
various dialects.

> [...] while there may be only one 'image' in the constructed language 
> for a given proper noun, the constructed word could be the result of 
> applying the process to any of billions of possible preimage strings - 
> of which possibly only one or possibly as many as a few dozen are 
> genuinely proper nouns from which it might have been derived.

That really only works with polysyntetic languages, like some of the 
Native American and Inuit ones. Even Finnish doesn't really carry that 
far. Klingon would work pretty well, as would Navajo and Taa, but pretty 
much anything else would be too easy to decrypt. And yet the latter 
actually leave quite a bit of free redundancy to be exploited.

> And, to make matters worse than that, almost every *other* word in the 
> language could also result from the same set of substitution rules, 
> each with billions of possible preimages which might include zero, 
> one, or as many as a few dozen completely unrelated proper nouns.

At the same time, do you actually know of a grammar framework which 
could actually encapsulate either of your English or of my Finnish 
language, fully? Generatively? Fully describing both of their natural 
statistics, starting from a computational model, and one which actually 
leads to a computationally efficient recognition framework? So that the 
NSA, or the GCHQ, or what was it now in Sweden and Russia, can actually 
find your stego in real time?

> It represents a monumental amount of very much enjoyed but arguably 
> wasted intellectual effort on his part, in much the same way that 
> Tolkien's middle-earth languages did prior to the publication of his 
> books.

Wasn't there supposed to be someone, somewhere, who was actually raised 
speaking Lojban as her (?) first language? Because while that'd be 
rather difficult to parse at first, over the long run the whole language 
has been designed to be machine parsable.

Not to mention the fact that might se'd might be a she. Oh my dear.

> While not "encrypted" as such, I doubt that anyone who got their hands 
> on his journal could, in any reasonable timeframe or possibly ever, 
> read it.

Undoubtedly so. But then, that's not what real steganography is about. 
It's not about willy nilly pushing a bit or two here or there into text, 
or using funny words. It's about taking a well statistically 
characterized carrier, in language/text, image, video, sound, 
whatevever, and imprinting a surreptitious message upon it, without 
disturbing any of its visible/audible/computationally-unearthable 
qualities, to some chosen degree. Preferably one that you could prove to 
be fully undetectable, but as it goes with even symmetric cryptosystems, 
we don't have such unconditional proofs as of yet...

> [...] it should be as impenetrable as the Voynich manuscript.

Quite. But the way you make it so is different. Nowadays you base it on 
information and coding theory, and also cryptography proper. Certainly 
not on esoteric gryphs, because they might awake interest; no, you 
really ought to aim at kitteh-pic-meme-embedding or something akin to 
that. For carrier bandwidth, you see, and the endless variability.
-- 
Sampo Syreeni, aka decoy - decoy at iki.fi, http://decoy.iki.fi/front
+358-40-3255353, 025E D175 ABE5 027C 9494 EEB0 E090 8BA9 0509 85C2