*BSD News Article 7040

Newsgroups: comp.unix.bsd
Path: sserve!manuel.anu.edu.au!munnari.oz.au!uunet!zaphod.mps.ohio-state.edu!cs.utexas.edu!sun-barr!ames!agate!dog.ee.lbl.gov!hellgate.utah.edu!fcom.cc.utah.edu!cs.weber.edu!terry
From: terry@cs.weber.edu (A Wizard of Earth C)
Subject: Re: cache terms (was Adding Swapspace ??)
Message-ID: <1992Oct25.224950.3098@fcom.cc.utah.edu>
Sender: news@fcom.cc.utah.edu
Organization: University of Utah Computer Center
References: <Bw7H4L.LLB@cosy.sbg.ac.at> <1992Oct16.162729.3701@ninja.zso.dec.com> <1992Oct16.201806.21519@fcom.cc.utah.edu> <Bw8Mw5.IFC@pix.com> <1992Oct18.082017.22382@fcom.cc.utah.edu> <BwLLxp.7Bt@flatlin.ka.sub.org> <1992Oct25.111525.25782@fcom.cc.utah.edu> <26965@dog.ee.lbl.gov>
Date: Sun, 25 Oct 92 22:49:50 GMT
Lines: 152

In article <26965@dog.ee.lbl.gov>, torek@horse.ee.lbl.gov (Chris Torek) writes:
|> In <1992Oct18.082017.22382@fcom.cc.utah.edu> terry@cs.weber.edu
|> (A Wizard of Earth C) claimed:
|> >>>the write to the disk is done through a write-through cache.
|> 
|> In article <BwLLxp.7Bt@flatlin.ka.sub.org> bad@flatlin.ka.sub.org
|> (Christoph Badura) pointed out:
|> >>The UNIX FS buffer cache has since its invention been write-behind
|> >>and not write-through.
|> 
|> In article <1992Oct25.111525.25782@fcom.cc.utah.edu> terry@cs.weber.edu
|> (A Wizard of Earth C) writes:
|> >I tend to use these terms synonymously.  When can a cache be write through
|> >but not write behind?
|> 
|> The various terms for describing caches are pretty standard.  In hardware,
|> a `write through' cache is one where each write updates both the cache
|> and main memory `simultaneously'.  In contrast, in a `write back' cache,
|> writes update only the cache line; main memory is updated only when the
|> line is kicked out, either by an explicit cache flush or by replacement
|> with new contents.

Of course, this still leaves the issue of `write behind' unresolved; is it
synonymous with `write through' or `write back'?  The distinction I would
make, were I to draw one, would be that there was enforced latency for seek
and rotational delays between the time the data was written to the cache and
the time the data got written in a `write behind' cache.  The difference
between this and a `write through' cache being seek policy, since there is
always rotational latency to consider (and possibly queue around) in both
cases.  As was pointed out by someone else, a faster head posititioning
mechanism makes rotational delay relatively more important.  Since I think
we can all agree that a block of data has to go to memory before that memory
is written to disk, then the operation is either `write through' or `write
behind' based on your definition of simultanaity (something can't be going
into a bounce buffer and written at the same time).

|> In the Unix kernel, the buffer cache code simulates a hardware `write
|> back' cache.  All else being equal, write-back caches are usually more
|> efficient than write-through.  (All else is rarely equal.)  In this
|> case, cache `flush' occurs only on sync() or fsync() calls, or in some
|> systems, through timers.  Replacement occurs when a buffer is reused.

So it's neither `write through' nor `write behind'; the issue being that
writing a block of virtual memory through a traditional swap mechanism and 
writing a block of virtual memory through the file system page mechanism
differ in (1) a copy taking place, and (2) the fact that you are trading
cache memory for virtual memory.  The overhead of the copy is obvious, but
the overhead of the "cache buffers dedicated to swap data rather than real
file data" is questionable; certainly it would be faster to "swap in" from
a cache buffer than from real disk, but it would be faster to swap in from
real disk directly than into a cache buffer, followed by a copy.  I think
that this is probably acceptable overhead for the benfits derived, and that
the cost of the copy in kernel space is negligible.

|> The BSD kernel does not, however, use a strict write-back policy.
|> Instead, whenever it seems important for consistency (directory
|> operations and indirect blocks), and/or whenever it seems likely that a
|> block will not be rewritten soon, the kernel uses a synchronous
|> bwrite() call or an asynchronous but immediate bawrite() call.  More
|> detail can be found in the Bach and BSD books.

So it can, at times, act as `write through' for critical data.  I had actually
put swap in this category, although in retrospect it matters little whether
swap data is reliably on the disk before a system crash, since by definition
the data is invalid anyway (unless you attempt to make a recovery of the
system state at the time of the crash).

|> >Just curious as to why you draw such a sharp distinction, the point being
|> >that there is negligible overhead in a cached writes for swap no matter
|> >how you slice the pie.
|> 
|> This is not really true, since swapping/paging occurs mainly when the
|> machine is low on memory.  This tends to coincide with the machine
|> being `active', which implies that every bit of overhead counts.  With
|> unified VM/buffer caches, the effect is even worse: `heavy paging' and
|> `overloaded buffers' can become completely synonymous without some sort
|> of policy to prevent the buffer cache from taking over all of physical
|> memory.  (Current BSD systems have an enforced limit on buffer cache
|> size, namely `bufpages' in machdep.c.)

I wan't really thinking of permitting this; rather, I was thinking of going
the other way (virtual memory steals buffer cache).  I can definitely see
the drawbacks in going the other way; going this way potentially blocks
processes on a resource other than VM, which is good, plus also has the
effect of limiting the sector-to-cache-buffer-mappings during times of heavy
swapping.  Basically, if there are less cache buffers for users, there are
less seek offsets represented by user cache buffers, and thus disk throughput
(assuming good placement of the swap file) should be increased under heavy
load.  Of course, implementing this doesn't require the file system be used
for swapping.

If 10% of the cache buffers were reserved for swapping as a low watermark, and
some higher value (30%?) for the high watermark of cache buffers being used
for swapping, this would increase system availability in a "memory bound"
kernel (one in which swapping is required to occur).

|> This is the core of the idea behind `dribble' buffer write policies
|> (the timers mentioned above):  the machine can best afford the writes
|> when it is not busy doing other stuff.  At the time the write occurs,
|> it is busy (obviously so: someone is busy writing).  If the write is
|> merely cached, a huge queue can build up, and then when demand
|> increases *everyone* will have to wait.  A `dribble-back' cache avoids
|> all of this, but requires extra mechanism and trades off total
|> throughput for decreased latency.  Systems with big queues tend to have
|> greater overall throughput.

Right; more memory == better performance.  A `dribble-back' is certainly a potential loss of granularity for swapping implemented on top of it (in the
sense of some form of unified but managed implementation of VM/buffer cache).
After all, one doesn't want to delay swapping until the machine is less busy
....8-).  This assumes the buffer caching is `dribble-back' and the swap-to-disk
mechanism isn't.

I still think it's desirable to swap to a file.  The best arguments against
this are still Christoph's, which basically boil down to penalties when trading
cache memory for buffer memory, the cost of the additional copies coming and
going, and the allocation policy for files being [potentially] bad for swap
space.  Arguments based on VM/buffer-cache unification, and the actual I/O
(which has to be done anyway) going through the file system rather than the
current swapping mechanisms are much less important, to my mind, as they
represent negligible overhead compared to the work that has to be done anyway
(given promiscuous preallocation of the swap file to get a "good" geometry
on disk).


None of the arguments so far have convinced me that I shouldn't swap to a file,
and, in particular, one on an NFS mounted partition, since not doing so means
there are 36 machines that are doomed to sit here and run DOS that could be
providing CPU cycles for 386BSD.


I think it would be nice if a student could put in a boot disk, get a login
prompt first thing, run X on 386BSD, and have the machine reboot on logout.  It
would be even better if they could run a "DOS program" off the Novell server
to boot 386BSD without having to have a local disk.  Swapping over the net is
the only way to achieve this (short of "please insert swap floppy in A: and
press any key to continue" 8-)), and swapping to a file is the easiest way to
swap over the net.  Regardless of how much overhead (I see it at about 6-8%)
going through the file system causes, anything is better than DOS.


					Terry Lambert
					terry@icarus.weber.edu
					terry_lambert@novell.com
---
Any opinions in this posting are my own and not those of my present
or previous employers.

-- 
-------------------------------------------------------------------------------
                                        "I have an 8 user poetic license" - me
 Get the 386bsd FAQ from agate.berkeley.edu:/pub/386BSD/386bsd-0.1/unofficial
-------------------------------------------------------------------------------