[Cryptography] [FORGED] Re: Speculation considered harmful?

Tue Jan 9 14:59:32 EST 2018

>> VLIW isn't just wide microcode.  It's a combination of wide instructions
>> with compilers that no-one can ever quite manage to write that take
>> advantage of them.
> 
> The compilers actually worked pretty well, give or take the memory scheduling stuff.
> 
>> There's a good reason why vendors went deep rather than wide to make things go
>> fast, it was, and still is, the easiest way to get performance.
> 
> Agreed.  I think the reason is that it turned out that a lot of the
> hazard avoidance stuff that trace scheduling was intended to avoid
> turns out to be doable on the fly in hardware, which makes it easier
> to keep those deep pipelines busy.

There's more to it than that.

Multiflow knew from the beginning that its compiler would be very slow, and would need feedback from multiple runs with real data to help it make the correct decisions.  But the theory was that this was fine for scientific code that would be run hundreds, maybe thousands, of times, unchanged, once fully optimized.

But it turned out that there isn't quite enough code like that - or at least not enough to support machines that required such a radically different development cycle to really show their strength.

There have been other efforts to use feedback to drive better levels of optimization.  HP's C++ compiler (for their Precision RISC machines) had a mode that did that - perhaps driven by the technology they acquired from Multiflow.  I remember trying it.  Compilations took *forever* - sometimes literally, as the compilation sometimes showed every sign of never terminating - and the gain in performance was nothing particularly notable.

The payoff from feedback-driven compilation was never worth the disruption of development and build methodologies, and faded away - to re-appear, years later, in JIT compilers, which finally made the process invisible and non-disruptive.  (I haven't seen any solid evidence on how much feedback actually helps in producing faster/smaller code.  The only specific example I know of in Java, for example, has to do with optimizing virtual dispatch when it appears that the run-time type is fixed, or close to fixed.  Probably significant but a rather limited case - and given branch prediction, it may actually gain almost nothing.)

Meanwhile, moving all this stuff into hardware that discovers what's happening fully on the fly usually can't do as well as a *theoretical* feedback-driven compiler *with good feedback*, but (a) it comes pretty close; (b) it can actually do better in some cases; (c) the biggest plus:  It doesn't require changes to languages, libraries, development methodologies, what have you.

                                                        -- Jerry