[Cryptography] I'll give the right answers to the right questions

Nico Williams nico at cryptonector.com
Mon Jan 29 12:26:57 EST 2018


On Fri, Jan 26, 2018 at 08:50:06PM -0500, Jerry Leichter wrote:
> > You could look at Sun/Oracle Niagara which was pretty much 8-way
> > hyperthreading per core. Ultimately it didn't have good enough
> > single-thread performance, regardless of how embarrassingly parallel
> > your workloads.
> I worked on intelligent network management software back in those
> days.  Solaris SPARC was our favorite platform.  (We supported many.)
> But Niagra - well, they were then the T series processors I believe -
> were a disaster.  Customers kept asking us to run our code on them, as
> they were much cheaper than traditional high-end SPARC boxes - and if
> you just looked at the total specs the T series were way more
> powerful.  Our software had tons of threads - a basic server started
> up with 80 or more of them - but they were used for structuring, not
> for parallelism.  It was hard to get the effective parallelism above
> maybe 4.  So on a T series performance was disastrous.  (It didn't
> help that early T series boxes used FP units that were shared -
> something like on FP unit for every four processors.  We didn't do a
> huge amount of FP computation - but we did enough this bottleneck made
> already bad things much worse.)

OK, sure, FP killed your on early Niagara.  And your workloads weren't
as embarrassingly parallel as the number of threads in your application
might have implied.

Another thing that is terrible on SPARC in general is register window
spills.  Niagara has a single window... so every function call takes a
spill, which is almost certainly less optimal that caller- or callee-
saved register protocols.  Turns out that aspect of SPARC sucked.

But the question is: can we build a better world on massive hardware
threading with less speculation?

I don't think Niagara provides a final answer to that question.

Ultimately it's all about whether we can more easily turn serial code
into parallel code, and that so far has been very difficult.

A hybrid CPU with a few fast threads along the lines of current x86_64
CPUs, and a bunch of slower CMT-style threads, might actually help the
market move towards CMT.

Nico
-- 


More information about the cryptography mailing list