*BSD News Article 45532

Path: sserve!newshost.anu.edu.au!harbinger.cc.monash.edu.au!simtel!zombie.ncsc.mil!news.mathworks.com!news.kei.com!nntp.et.byu.edu!news.byu.edu!hamblin.math.byu.edu!park.uvsc.edu!usenet
From: Terry Lambert <terry@cs.weber.edu>
Newsgroups: comp.unix.bsd.freebsd.misc
Subject: Re: Eliminating kernel panics
Date: 14 Jun 1995 20:14:46 GMT
Organization: Utah Valley State College, Orem, Utah
Lines: 114
Message-ID: <3rnfvm$1qa@park.uvsc.edu>
References: <3rlecq$i2r@felix.junction.net>
NNTP-Posting-Host: hecate.artisoft.com

Michael Dillon <michael@junction.net> wrote:
] I was just browsing a WWW page at Amdahl, the mainframe
] manufacturer when I came across the following:
] 
]     http://www.amdahl.com/doc/products/oes/cb.uts/utshist.html
] 
]     UTS 4.2 was engineered to eliminate all kernel panics (other UNIX 
]     operating systems based on a simple port of the base SVR4 source 
]     contain "panic" code that will stop the machine in unexpected 
]     situations). In the development of UTS 4.2, the base SVR4 code
]     was methodically "scrubbed" to create a run-time environment as
]     reliable as the S/390 hardware platform it serves. 
] 
] If they can do it, why can't FreeBSD do the same? I'm thinking that this
] problem is similar to the problems with TCP/IP congestion and that
] solutions could be found similarly.

Is that "marketing eliminated" or "engineering eliminated".

I guaran-damn-tee you if there is a hardware fault, the machine
is going to suck mud, no matter what they do to the software.

Just like a short in the ethernet will take out a NetWare SFT
(Software Fault Tolerance) server.


Now there *are* two classes of panic.  One is the result of an
unrecoverable failure mode.  UTS has unrecoverable failure modes,
too -- don't let them kid you.  You hadle these by panicing.

The Second type of panic is one where the kernel agrees to do
something, then renigs on the agreement.  There are a lot of
cases, mostly based on probability, where the kernel will commit
to doing something that it thinks it can most likely do, but is
not 100% certain it can.  For instance, allowing a process to
start at all without knowing what the maximum dirty data pages
it will use during its lifetime is beforehand.

It's possible to get around most of these problems by not allowing
the overcommitting of resources; the problem with that is that on
the the average, it's OK ot overcommit resources, and doing so
will result in less overall resources being required for the
average case.

One of my favorite hobby-horses is memory overcommit.  The good
things about memory overcommit are:

o	Your total avaiable memory is swap size + RAM size

o	You don't require real swap for clean text (and data, if
	correctly implemented) pages, since they can be reloaded
	from the file (this is called using the file as backing
	store).

o	Precommitting resources takes time, so not doing it means
	you can start executing code before you have it all in core.

o	The copy costs for the pages can be amortized over the
	runtime of the program.  The plus to this is that it
	grants the appearance of speed; the downside is that it
	actually detracts from overall speed during runtime binding
	(a problem most shared library implementations also have).


The bad things are:

o	Unless your total available memory is limited to swap size
	(meaning that you have real swap space reserved as backing
	store for RAM), you can't guarantee hot shutdown/restart,
	and you can't guarantee enough space to support kernel
	dumps (in case of unrecoverable errors).

o	Using a program file as backing store causes problems: if
	the program was loaded over NFS, the NFS server must stay
	up to swap in pages; therefore the image is fragile to
	network outages (anyone who has used a diskless Sun would
	agree).  The "fix" for this would be to special case remote
	file systems to load remote images entirely into local swap.
	That only works in "dataless" configurations, not "diskless",
	since swap is also remote in the second case.  FreeBSD,
	NetBSD, SVR4, etc. typically don't implement this "fix".

o	Using the program image as a swap store makes the program
	fragile to modification.  This is the purpose of the VTEXT
	flag on an in core vnode on such systems, and attempts to
	modify the image result in an error return of ETXTBUSY (a
	non-POSIX error return "extension").  The "fix" for this
	one is to fault the image to swap (and make the VM system
	"prefer" swap pages to disk pages -- something you want
	anyway, since a page reference from swap is much faster
	than one through the file system) and allow the modification
	to proceed.  Again, this is not typically implemented, and
	there are problems if the modification is not local to the
	machine doing the running, since the non-standard VTEXT
	flag is not propagated to a remote host (NFS/RFS).  In
	combination with forcing remotely executed code to local
	swap, this window is (mostly) closed.

o	Delayed startup (obviously: related to the size of the image
	being copied to swap).


And this is just *one* of the overcommitted resources on the machine.


Obviously, it a set of trade-offs between what the user is willing
to spend on hardware vs. what they get for their money.


                                        Terry Lambert
                                        terry@cs.weber.edu
---
Any opinions in this posting are my own and not those of my present
or previous employers.