*BSD News Article 32343

Path: sserve!newshost.anu.edu.au!harbinger.cc.monash.edu.au!msuinfo!agate!ihnp4.ucsd.edu!swrinde!news.uh.edu!moocow.cs.uh.edu!wjin
From: wjin@moocow.cs.uh.edu (Woody Jin)
Newsgroups: comp.os.386bsd.misc
Subject: Re: FreeBSD platform
Date: 4 Jul 1994 05:40:16 GMT
Organization: University of Houston
Lines: 54
Message-ID: <2v87c0$6vo@masala.cc.uh.edu>
References: <2uacoc$gfi@reuter.cse.ogi.edu> <2uo9ju$12b@reuter.cse.ogi.edu> <2v79ig$s8u@masala.cc.uh.edu> <2v7tf4$ctj@mojo.eng.umd.edu>
NNTP-Posting-Host: moocow.cs.uh.edu

In article <2v7tf4$ctj@mojo.eng.umd.edu>,
Charles B. Robey <chuckr@glue.umd.edu> wrote:
>Woody Jin (wjin@moocow.cs.uh.edu) wrote:
>: bj@staff.cc.purdue.edu (Ben Jackson) wrote:
>: > The 256K cache is *highly* recommended.  I've heard that a DX2/66 is
>: > about 40% faster (wall clock time) with a 256K external cache.  The
>: > Pentium supports 512k, and the extra 256K might be worth the $60-100
>
>: What I read from an article some time ago was that the cache does not
>: affect any performance on multi-user platforms such as Unix, since
>: most PC boards use *direct mapped* cache.
>: Is there a motherboard which uses 4(or 8) way associative cache ?
>
>The thing with cache is, the more you have, the less each additional
>increment helps.  I've seen some studies done, that make me wonder if
>anything over 128K is really of use.  The key statistic is the hit rate
>of the cache, which means the percent of time that a memory read can be
>satisfied by a high speed cache read, instead of a slower main memory read.

That is when cache is smart enough for the system, in theory.
For MS-DOS, or Windows, the programs reside in consecutive pages in memory.
In this case, direct mapped cache is smart enough.
And they advertise that DX2/66 is about 40% faster with 256K cache, which
is for MS-DOS case.

However, in Unix, a program does not reside in consecutive pages, they
are all scattered around the physical memory.
One worst senario we can easily imagine is that my trn program reside in
(k * n)'th pages in physical memory, where k is the number of pages of
cache memory.  In this case, the direct mapped cache will always have to
fault even though there are enough vacent space (512K - 1page).

While the above story is not realistic, it is typical that similar story
happens very often.  We may easily know this since direct mapped cache (DMC)
can't do LRU. For example, when my trn needs next page, DMC will typically
overwrite the page which contained a tcsh's page (for example) that was most 
recently used.  When I tried, "!tcsh",  DMC will have to fault and overwrites
the trn's page, which the tcsh's page was resideing, and when I exit from tcsh,
the DMC will again fault to bring in the trn's page. The atrocious fact is that
all these happen, even though there is a lot of free space available in cache.


>If you get a hit rate of over 90 percent at just over 128K (as the study
>I saw indicated) then the speedup you get from the additional cache is,
>at best, only speeding up a *max* of 10 percent of your reads; in reality,
>it's less, I think.  I haven't read any Pentium studies, but I'd wonder
>if the expense really were worth it, to go to 512K.

One problem is that it is very difficult to expect the hit ratio of cache in
muti-tasking OS.
The only way is to compare the performance  by running various applications.

--
Woody Jin