# [Cryptography] about subtractive dither, for audio and other use (also scientific)

Sampo Syreeni decoy at iki.fi
Fri Nov 19 18:22:57 EST 2021

```Hi. As many of you may know, there are two different means dithering a
signal: subtractive and additive. The first requires you to know the
precise dither signal put in the A/D stage when doing D/A, which makes
because it works single-ended, and is good enough for most work.

I would like to combine the advantages of each approach, but hit a snag
some years back. I'd like to return to the subject, and solicit some
help.

Additive dither works for audio applications because it can be shown
that adding a squarely distributed 1 LSB peak-to-peak zero centered
noise signal before a quantizer decouples the mean of the quantization
error from the utility signal, and adding each successive one decouples
an extra statistical moment. Crucially, two such noise signals added
lead to decoupling not only of the long term average, but also variance,
meaning power. As such, at the lowest levels, audible noise modulation
(coupling of the error signal to average power) goes away at precisely

(Adding independent white noises of given statistical distribution to
each other, leads to the resulting noise having a distribution the
convolution of the formers'. As such, adding two squarely distributed
noises leads to a triangularly distributed outcome; the TPDF dither we
so often use when rounding down from higher word widths. "That
Good-Enough.")

Things are much better on the subtractive side: if you know the dither
signal from the start, adding and then subtracting a single independent
noise channel, rectangularly distributed (RDPF) will decouple *all* of
the statistical moments from the error signal at the same time. It's as
good as it gets, from the get-go. It's also easy to simulate, so go
about it: 6dB extra on the voltage, 3dB extra on power, and whatever the
units are on the higher moments; it's *very* much audible when you get
there.

The trouble is, then, how to transmit the subtractive dithering signal
to the receiver side. Obviously, you can't use a full side channel. But
in the digital domain you *can* use pseudo-random number generators.
Those whom are known to have nice statistical and computational
properties. The side channel to indicate a particular one is lean, and
with the cryptographically secure ones, only needs something like 128
bits in order to be indistinguishable from white noise. So, when that
side channel is available, why not tag an (audio?) signal as having been
dithered with the optimum subtractive dither?

Which in fact isn't in this case the RPDF kind, but the TPDF one.
Because absent the metadata -- often the case with running digital from
disparate sources -- you'd be left with just an additive RPDF dither,
which is not enough to make the power-coupling of the noise signal to go
away. So instead, you should do subtractive TPDF dithering, so that the
overall signal stays well-enough dithered to second degree (and the ear)
even if you don't know enough to do the subtraction. That's easy
fail-over. (If you get to do it, it's free from statistical error to all
degrees, again.) And still, if you can do subtractive, it makes *very*
little difference if the subtractee is RPDF or TPDF; everything still
cancels out. (The only problem being that sqrt(2) difference at the
edges of the A/D/A range, which ought to have some margin anyways.
Hitting the rails, and all that, is not much of a problem in multibit
converters, which we are talking about here.)

So in NASA kind of work, this suggests a simple configuration: an agreed
upon configuration of a pseudo random number generator which feeds a
dither D/A, or maybe some side data to give its seed. This is
particularly easy to implement if rounding down in the digital domain --
such as in the audio work we do and which inspired this post -- but of
course, it's the same from say 22 bit sensor values down to 8 bit
telemetry, in a space probe. Also, it *can* be done so-so even for
analog signals before the first round of digitization, and has been
done.

Next in my line of thought is, how about if we *can't* confer that
metadata reliably. And herein is the cinch I'd like some help with...

How do you reliably detect from streaming audio (signal) content, that
this sort of coding (which subtractive dither very much is, changing the
signal as it is) has been applied? Without out-of-band metadata? Well
the metadata has to be in-band by definition, so that it will have to
somewhat disturb the signal. It also has to be able to be detected from
the encoded signal with reasonable effort in both software and hardware,
yet not skew the signal statistically, even if it has to leave *some*
recognizable mark.

To date my best architecture works as follows:

1) I XOR all of the bits of a sample together, on the theory that there
is going to be some randomizing noise in there, and so some entropy and
independence in the resulting bits per sample.

1a) This is useful because you can do the operation over bit stuffed
words, the optimum way of embedding stuff into words of wider width.
If the less significant bits are set to zero, all of the entropy of
the higher order ones will work all the same. That is especially
significant in audio work, because it's always 8-18-24 bits in
exchange, and often passed over channels which do not declare the
accuracy. In the digital domain, those extra zero cease to have any
meaning, by design.

2) This is because I 1) want to synchronize against what my subtractive
coder is doing if bits are lost, and in the longer run, be able to
correlate against running data even at the analog level. This is
after all about synchronous decoding, and its usual gain.

2a) If I'm given a certain running stream of bits, with bit errors and
then synchronization errors in it, I want to be able to
resynchronize. The typical way to do this would be to run a
continuous convolution, detecting a signature. But that's *very*
ineffient. Instead I do it stochastically, flagging a small stream
of bits from my above XOR circuit, in a running shift register.
Then I compare it (after some processing) with a fixed, hashed
value. That is very efficient in hardware, and I already know
much doable by precomputation even in 8-bit software (cf. how CRC
table calculation goes; only using linear polynomial stuff here,
so it can be accelerated in hardware via the crypto-krimitives as
well).

2b) I run the bit stream over an inverse comma-code decoder as well.
This is a poor-mans's version of doing a matched filtering operation,
because such a decode of such a code essentially stops there and only
there, where there isn't an overlapping match. I use a short much
like that as my first latch, so that the second, heavier
comparison doesn't have to be engaged as often.

3) So the idea is that every now and then, if my machinery has been told
it's to do subtractive dithering, it'll a) do what it does by
metadata, 1) going along as it has from the beginning, or 2)
sometimes falling out of sync, so that it searches for the next
pseudo-stochastic syncho point, and resynchs then, digitally.

3a) If it loses synch, it'd be able to tell within which interval,
even if kind of unpredictable by design.

3b) The machinery would always leave behind in its subtractive
dithering a well defined, well-spread out tract, which would be
detectable in more intensive analysis. And sO, the signature could
once again be deterministically subtracted out, to arrive at
almost pristine signal.

3c) The coding theory principle here is that if you only reserve one
signal out of the billiards available within even a second of
speech band communication, or a 2400 Baud modem thingy, it's one over
billiard. Reserving that sort of minimum bandwidth, with proper
coding, is unnoticeable. Which is what I'm doing here, with the
continuous, deterministic noise introduced for dithering
purposes. What I'm introducing here is a couple of bits of
information, in the form of on-and-off low level noise.

So finally... Help me! I haven't been able to make this work quite like
the theory above went. When I use the most performant algorithms and
circuits I know for the above, it always comes back to a linear feedback
shift register network, and even nonlinearity doesn't naively help. This