*BSD News Article 17910

Path: sserve!newshost.anu.edu.au!munnari.oz.au!uunet!news.univie.ac.at!fstgds15.tu-graz.ac.at!fstgds01.tu-graz.ac.at!not-for-mail
From: chmr@edvz.tu-graz.ac.at (Christoph Robitschko)
Newsgroups: comp.os.386bsd.bugs
Subject: Re: Nethack
Date: 3 Jul 1993 18:03:50 +0200
Organization: Technical University of Graz, Austria
Lines: 145
Message-ID: <214al6INNsmp@fstgds01.tu-graz.ac.at>
References: <1993Jul3.055522.4000@fcom.cc.utah.edu>
NNTP-Posting-Host: fstgds01.tu-graz.ac.at
X-Newsreader: TIN [version 1.1 PL7]

In article <1993Jul3.055522.4000@fcom.cc.utah.edu> A Wizard of Earth C (terry@cs.weber.edu) wrote:
-> In article <C9J9H8.Ltu@sneaky.lonestar.org> gordon@sneaky.lonestar.org (Gordon Burditt) writes:
-> >Ok, I managed to duplicate the "nethack problem" in a much simpler program.
-> >System:  386bsd, patchkit 0.2.3.
-> 
-> [ ... ]
-> >
-> >Now, the question I have is, with this bug in the system, why does
-> >it stay up for more than 10 minutes?  Why can I run the compiler
-> >without it crashing?  
-> >
-> >Is there a 486-specific fix for this (set the WP bit in the cr0 register?  
-> >anything else needed or is that alone enough?)
-> 
-> This would probably be enough if the process creation code didn't depend
-> on it being unenforced during create.
-> 
It is not enough with the current copyout: It basically uses the kernel
permissions to write to user space (same in copyin: You can read the kernel
memory with write() ). Also, if WP is on, you cannot map memory read-only
to the user and read-write for the kernel, but this is needed.

-> Given exactly the behaviour you have described (note: it is *still*
-> possible for a write-back to occur to the swap store file given the
-> particular conditions in place during a system shutdown), the problem is
-> obvious.  The question is which of the three things that aren't being
-> done are the root cause:
-> 
-> 1)	Data pages for the process are being written, but aren't being
-> 	marked dirty like they should be.
-> 
-> 2)	When the reference count drops to 0 on an in core vnode, the
-> 	FS cache buffers associated with it aren't marked invalid and
-> 	returned to the pool.  This is a result of the way vnodes are
-> 	shared between all processes (and is incidently a pain in the
-> 	ass, since it limits the maximum size of on disk inode data
-> 	to 188 bytes, a nice binary number -- NOT!).  The only fix would
-> 	be to make the vnode a field in the inode instead of the other
-> 	way around and set up the hash function to have a list of per
-> 	file system inodes containing the vnodes that get passed around.
-> 	The vnode code itself sucks out (in vgoneall() in vfs_subr.c,
-> 	the stupid thing is removed from the hash chain without the
-> 	inode hash being cleaned out until later -- vgoneall is called
-> 	on the swap store vnode in exit() which is called from rexit()
-> 	which is called from the exit system call).
-> 
-> 3)	When the hash list is checked on open in iget() in ufs_inode.c,
-> 	if the thing isn't found, the initialization doesn't make sure
-> 	that all associated cache buffers are freed (this could be argued
-> 	to be a problem in getnewvnode(), since the problem is probably
-> 	common to all file systems).  Since you haven't made any other
-> 	references to files, you get the same vnode as before and ...
-> 	the cache chain serendipitously points to the pages at the
-> 	addresses you expect.  You can probably amuse yourself for hours
-> 	by writing two programs and watching them contaminate each other.

The changes are always written back to the executable on shutdown (or earlier).

What I think happens is this:
In execve, the text portion of the executable is mapped read-write and 
copy-on-write. This is immediately changed (with vm_protect) to read-only,
so the copy-on-write bit is cleared. The process then modifies its text
region because the copyout function does not check the page protections.
When the process is terminated, the pager detects that the page has been
modified, and because it is not copy-on-write, it is written back to the
executable file (actually not before a shutdown or the buffer is needed
again).
The 'bug' of the VM system is that is assumes read-only memory cant be modified.
The *bug* of the copyout family is that it happily writes to read-only memory.
It is obvious what should be fixed. (I'm working on a clean+fast patch).
-> 
-> As to why the compiler doesn't have this behaviour, it's probably because
-> it does all of the following:
-> 
-> 1)	Explicitly calls exit(); the "exit by running off the end of the
-> 	program" cleanup seems to have a bug as well.

There is no difference whether exit is called explicitly or implicitly
by crt0.o .
-> 
-> 2)	It opens a bunch of vnodes after the one it's using (you do this
-> 	too, but you put them back on the freelist in the same order so
-> 	that when your program exits, it's vnode is always the first one).
-> 
-> 3)	It jumps around enough in it's memory that the modified data is
-> 	forced to swap (otherwise you'd run out of memory fast) instead of
-> 	just hanging around off the vnode.

I think the actual reason why the compiler is not crashing is that it does
not modify it's *constants*.
-> 
-> 
-> So:
-> 
-> 1)	Put used vnodes on the end of the haslist instead of the start
-> 	(this is a false fix, since the vnode you are using may be the
-> 	last free one -- oh c'mon!  it *could* happen).
-> 
-> 2)	Change the vnode allocation/deallocation interface  (this needs to
-> 	be done anyway to allow real file systems to be written).
-> 
-> 3)	Fix the copyin/copyout programs to write dirty and check and fake
-> 	page faults (there's a fix for this, but it generates false faults).
-> 
-> 4)	Set the flag on the 486 and fix the process start code.
-> 
-> 5)	Fix the process exit code so the non-exit() call process exit will
-> 	work correctly.
-> 
-> 6)	Fix getnewvnode() so that it invalidates the buffers (if not actually
-> 	deallocating them).
-> 
-> 7)	Fix the process swapping so that it comes from swap rather than the
-> 	program image (this will also get rid of the bizarre crashes that
-> 	can occur when you are out of memory, allow real system dumps without
-> 	needing more memory than swap has, and speed up swapping at the cost
-> 	of slowing down startup slightly, since copy to swap can be done on
-> 	an as-needed basis and the current trade off is "optimizing the boot
-> 	code at the expense of the program").
-> 
So: Fix locore.s (I'm trying to do this) !
-> 
-> 
-> Suffice it to say, it's a rather involved set of problems, not just a single
-> problem, and there are people working on it, but no, there's not a fix yet
-> (but actually explicitly calling exit() might help for right now.

No. It does not help.
What might help is compiling nethack with -fwritable_strings (Note that the
foo variable in the posted test program is a constant and thus in the text
portion !).
When the problem with locore.s is fixed, nethack will die immediately with
a segmentation fault instead of silently trashing its executable, so
compiling with writable_strings (or fixing nethack) will be required.

As for why the system can run reasonably stable despite this bug is that it
is only triggered by a program bug, that is modifying memory (using read or
ioctl etc) which is mapped read-only (constant strings in this case).
-> 
-> 
-> 					Terry Lambert
-> 					terry@icarus.weber.edu


								Christoph