[Cryptography] about subtractive dither, for audio and other use (also scientific)

Fri Nov 19 18:22:57 EST 2021

Hi. As many of you may know, there are two different means dithering a 
signal: subtractive and additive. The first requires you to know the 
precise dither signal put in the A/D stage when doing D/A, which makes 
it difficult to apply. As such additive dithering is nowadays used, 
because it works single-ended, and is good enough for most work.

I would like to combine the advantages of each approach, but hit a snag 
some years back. I'd like to return to the subject, and solicit some 
help.

Additive dither works for audio applications because it can be shown 
that adding a squarely distributed 1 LSB peak-to-peak zero centered 
noise signal before a quantizer decouples the mean of the quantization 
error from the utility signal, and adding each successive one decouples 
an extra statistical moment. Crucially, two such noise signals added 
lead to decoupling not only of the long term average, but also variance, 
meaning power. As such, at the lowest levels, audible noise modulation 
(coupling of the error signal to average power) goes away at precisely 
two such signals added.

(Adding independent white noises of given statistical distribution to 
each other, leads to the resulting noise having a distribution the 
convolution of the formers'. As such, adding two squarely distributed 
noises leads to a triangularly distributed outcome; the TPDF dither we 
so often use when rounding down from higher word widths. "That 
Good-Enough.")

Things are much better on the subtractive side: if you know the dither 
signal from the start, adding and then subtracting a single independent 
noise channel, rectangularly distributed (RDPF) will decouple *all* of 
the statistical moments from the error signal at the same time. It's as 
good as it gets, from the get-go. It's also easy to simulate, so go 
about it: 6dB extra on the voltage, 3dB extra on power, and whatever the 
units are on the higher moments; it's *very* much audible when you get 
there.

The trouble is, then, how to transmit the subtractive dithering signal 
to the receiver side. Obviously, you can't use a full side channel. But 
in the digital domain you *can* use pseudo-random number generators. 
Those whom are known to have nice statistical and computational 
properties. The side channel to indicate a particular one is lean, and 
with the cryptographically secure ones, only needs something like 128 
bits in order to be indistinguishable from white noise. So, when that 
side channel is available, why not tag an (audio?) signal as having been 
dithered with the optimum subtractive dither?

Which in fact isn't in this case the RPDF kind, but the TPDF one. 
Because absent the metadata -- often the case with running digital from 
disparate sources -- you'd be left with just an additive RPDF dither, 
which is not enough to make the power-coupling of the noise signal to go 
away. So instead, you should do subtractive TPDF dithering, so that the 
overall signal stays well-enough dithered to second degree (and the ear) 
even if you don't know enough to do the subtraction. That's easy 
fail-over. (If you get to do it, it's free from statistical error to all 
degrees, again.) And still, if you can do subtractive, it makes *very* 
little difference if the subtractee is RPDF or TPDF; everything still 
cancels out. (The only problem being that sqrt(2) difference at the 
edges of the A/D/A range, which ought to have some margin anyways. 
Hitting the rails, and all that, is not much of a problem in multibit 
converters, which we are talking about here.)

So in NASA kind of work, this suggests a simple configuration: an agreed 
upon configuration of a pseudo random number generator which feeds a 
dither D/A, or maybe some side data to give its seed. This is 
particularly easy to implement if rounding down in the digital domain -- 
such as in the audio work we do and which inspired this post -- but of 
course, it's the same from say 22 bit sensor values down to 8 bit 
telemetry, in a space probe. Also, it *can* be done so-so even for 
analog signals before the first round of digitization, and has been 
done.

Next in my line of thought is, how about if we *can't* confer that 
metadata reliably. And herein is the cinch I'd like some help with...

How do you reliably detect from streaming audio (signal) content, that 
this sort of coding (which subtractive dither very much is, changing the 
signal as it is) has been applied? Without out-of-band metadata? Well 
the metadata has to be in-band by definition, so that it will have to 
somewhat disturb the signal. It also has to be able to be detected from 
the encoded signal with reasonable effort in both software and hardware, 
yet not skew the signal statistically, even if it has to leave *some* 
recognizable mark.

To date my best architecture works as follows:

1) I XOR all of the bits of a sample together, on the theory that there 
is going to be some randomizing noise in there, and so some entropy and 
independence in the resulting bits per sample.

   1a) This is useful because you can do the operation over bit stuffed
   words, the optimum way of embedding stuff into words of wider width.
   If the less significant bits are set to zero, all of the entropy of
   the higher order ones will work all the same. That is especially
   significant in audio work, because it's always 8-18-24 bits in
   exchange, and often passed over channels which do not declare the
   accuracy. In the digital domain, those extra zero cease to have any
   meaning, by design.

2) This is because I 1) want to synchronize against what my subtractive
    coder is doing if bits are lost, and in the longer run, be able to
    correlate against running data even at the analog level. This is
    after all about synchronous decoding, and its usual gain.

   2a) If I'm given a certain running stream of bits, with bit errors and
       then synchronization errors in it, I want to be able to
       resynchronize. The typical way to do this would be to run a
       continuous convolution, detecting a signature. But that's *very*
       ineffient. Instead I do it stochastically, flagging a small stream
       of bits from my above XOR circuit, in a running shift register.
       Then I compare it (after some processing) with a fixed, hashed
       value. That is very efficient in hardware, and I already know
       much doable by precomputation even in 8-bit software (cf. how CRC
       table calculation goes; only using linear polynomial stuff here,
       so it can be accelerated in hardware via the crypto-krimitives as
       well).

   2b) I run the bit stream over an inverse comma-code decoder as well.
       This is a poor-mans's version of doing a matched filtering operation,
       because such a decode of such a code essentially stops there and only
       there, where there isn't an overlapping match. I use a short much
       like that as my first latch, so that the second, heavier
       comparison doesn't have to be engaged as often.

3) So the idea is that every now and then, if my machinery has been told
    it's to do subtractive dithering, it'll a) do what it does by
    metadata, 1) going along as it has from the beginning, or 2)
    sometimes falling out of sync, so that it searches for the next
    pseudo-stochastic syncho point, and resynchs then, digitally.

   3a) If it loses synch, it'd be able to tell within which interval,
       even if kind of unpredictable by design.

   3b) The machinery would always leave behind in its subtractive
       dithering a well defined, well-spread out tract, which would be
       detectable in more intensive analysis. And sO, the signature could
       once again be deterministically subtracted out, to arrive at
       almost pristine signal.

   3c) The coding theory principle here is that if you only reserve one
       signal out of the billiards available within even a second of
       speech band communication, or a 2400 Baud modem thingy, it's one over
       billiard. Reserving that sort of minimum bandwidth, with proper
       coding, is unnoticeable. Which is what I'm doing here, with the
       continuous, deterministic noise introduced for dithering
       purposes. What I'm introducing here is a couple of bits of
       information, in the form of on-and-off low level noise.

So finally... Help me! I haven't been able to make this work quite like 
the theory above went. When I use the most performant algorithms and 
circuits I know for the above, it always comes back to a linear feedback 
shift register network, and even nonlinearity doesn't naively help. This 
project of mine is basically stalled because of that. So, please help me 
1) latch on right for occasional acquisition without synchroneity or 
metadata, 2) do low computational complexity and high 
statistical homogeneity serial random number generation, and e.g. 3) 
help me extend the approach towards the analog domain, e.g. in space 
communication.
-- 
Sampo Syreeni, aka decoy - decoy at iki.fi, http://decoy.iki.fi/front
+358-40-3751464, 025E D175 ABE5 027C 9494 EEB0 E090 8BA9 0509 85C2