*BSD News Article 86483

Path: euryale.cc.adfa.oz.au!newshost.carno.net.au!harbinger.cc.monash.edu.au!news.mira.net.au!news.vbc.net!vbcnet-west!samba.rahul.net!rahul.net!a2i!bug.rahul.net!rahul.net!a2i!ns2.mainstreet.net!news.pbi.net!news.mathworks.com!enews.sgi.com!chronicle.mti.sgi.com!news
From: Dror Maydan <maydan@mti.sgi.com>
Newsgroups: comp.unix.bsd.freebsd.misc,comp.arch,comp.benchmarks,comp.sys.super
Subject: Re: benchmarking discussion at Usenix?
Date: Wed, 15 Jan 1997 15:25:21 -0800
Organization: Silicon Graphics
Lines: 35
Distribution: inet
Message-ID: <32DD6761.167E@mti.sgi.com>
References: <5am7vo$gvk@fido.asd.sgi.com> <32D3EE7E.794B@nas.nasa.gov> <32D53CB1.41C6@mti.sgi.com> <32DAD735.59E2@nas.nasa.gov>
NNTP-Posting-Host: three.mti.sgi.com
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
X-Mailer: Mozilla 2.0S (X11; I; IRIX 6.2 IP20)
Xref: euryale.cc.adfa.oz.au comp.unix.bsd.freebsd.misc:33823 comp.arch:62300 comp.benchmarks:18753 comp.sys.super:6841

Hugh LaMaster wrote:
> 
> Dror Maydan wrote:
> 
> > One more interesting category is the latency accessing objects bigger
> > than 4 bytes.  On many cache based machines accessing everything in a
> > cache line is just as fast as accessing one element.  I've never seen
> > measurements, but my guess is that many data elements in compilers are
> > bigger than 4 bytes; i.e., spatial locality works for compilers.
> 
> Well, optimum cache line sizes have been studied extensively.
> I'm sure there must be tables in H&P et al. showing hit rate
> as a function of line size and total cache size.  For reasonably
> large caches, I think the optimum used to be near 16 Bytes for
> 32-bit byte-addressed machines.  I don't know that I have seen more
> recent tables for 64-bit code on, say, Alpha, but my guess is that
> 32 bytes is probably superior to 16 bytes given the larger address
> sizes, not to mention alignment considerations.  Just a guess.
> Also, we often (but not always) have two levels of cache now,
> and sometimes three, and the optimum isn't necessarily the
> same on all three.  Numbers, anyone?

My point was that different machines do have different line sizes, and
the differences are quite large.  On the SGI R10000, the secondary line
size is 128 Bytes. On some IBM Power 2's, the line size is 256 Bytes.
I'm pretty sure that some other vendors use 32 Byte line sizes.
Why different vendors use different line sizes is probably related to
both system issues and to which types of applications they try to
optimize.  But, it is irrelevant to the benchmarking issue.  The issue
is that lmbench measures the latency for fetching a single pointer.  On
such a benchmark a large-line machine will look relatively worse
compared to the competition than if instead one used a benchmark that
measured the latency of fetching a cache line.
Now which benchmark is "better".  I think both are interesting.  Which
is more relevant to a typical integer application?  I don't know.