*BSD News Article 8713

Xref: sserve comp.os.linux:18357 comp.unix.bsd:8769
Path: sserve!manuel.anu.edu.au!munnari.oz.au!news.hawaii.edu!ames!olivea!uunet!haven.umd.edu!decuac!pa.dec.com!vixie
From: vixie@pa.dec.com (Paul A Vixie)
Newsgroups: comp.os.linux,comp.unix.bsd
Subject: Re: [386bsd]  cp something to /bin/cp and cp core dumps; bug or feature?
Message-ID: <VIXIE.92Dec5145556@cognition.pa.dec.com>
Date: 5 Dec 92 22:55:56 GMT
References: <Byn6uL.2oM@ra.nrl.navy.mil> <1992Dec2.185331.57@unislc.uucp>
	<1fjcsuINN2vf@hrd769.brooks.af.mil>
Followup-To: comp.unix.bsd
Organization: DEC Network Software Lab
Lines: 107
NNTP-Posting-Host: cognition.pa.dec.com
In-reply-to: burgess@hrd769.brooks.af.mil's message of 2 Dec 1992 16:21:18 -0600

[Dave Burgess]
>  I have also noticed something that bothers me a little bit.  I was ftping
>a bunch of files from point A to point B and managed to ftp /usr/bin/ftp.
>The execed ftp core dumped.
>
>  Why?
>
>I have seen the same thing with several other programs (cp, for example).
>What umbilical link is there between a running program and its image on
>disk?

Paged "Virtual Memory", as BSD implements it, means that programs are
brought into memory in itty bitty pieces called "pages", and various
lies are told that make the program believe that its text and its data
and its stack are all contiguous in memory even though most of it could
be missing and what's there could be in random order in the real RAM.

Each page of "virtual memory" -- meaning, memory as viewed by a user
program -- has several possible states.  It can be "invalid", meaning
that the program did not specify anything for that page and so using
it results in "segmentation violation - core dumped".  It can be
"read/write data or stack" which means that the program has specified
that something exists there, and if it tries to access it while the
kernel has done something else with that memory, the kernel has to 
catch the exception ("page fault"), allocate a page of real memory,
change the page tables to make that real memory look like it's in the
place the program expects it to be, and then fill the memory with the
contents the program gave it (usually this means reading from the swap
area, since the contents were put there when the page was "stolen" by
the kernel in the first place).  Finally a page can be "read-only text"
as in your particular case (overwriting /usr/bin/ftp while running it).
"Read-only text" is a page that cannot change while the program is running.
If the kernel has to steal this page -- or if the program has specified
it but not tried to use it yet -- then it is NOT written out to "swap"
since the kernel assumes that, as read-only text, the original file
from which it was loaded will still be there if the page has to come
back.  This is why the file system won't let you write(2) a file that
someone else is running, as shown by...

	% cp /bin/cat mycat
	% ./mycat &
	[7] 10382
	[7]  + Suspended (tty input)  ./mycat
	% cp /bin/cat mycat
	cp: mycat: Text file busy
	% 

However, the file system is less stringent about allowing you to remove
the file entirely.  That is, you can remove any link ("name") of the file,
but the blocks are not supposed to be deallocated (returned to the free
list where other files can grow into them) until the last program that has
it open, closes it.  So, continuing my example:

	% rm mycat
	rm: override protection 755 for mycat? yes
	% cp /bin/cat mycat
	% 

"rm" saw that the file seemed to be open, so it asked me if I was sure I
wanted to remove it.  After I did that, I was able to create another file
with the same name (and, as it happens, with the same contents).  I can
run this new "mycat" but it will NOT share pages with the one that I'm
running in %7.  You can see the blocks being held away from the free list
by the following continuation of my example:

	% ls -s mycat
	  28 mycat*
	% df .
	Filesystem   Total    kbytes   kbytes   %     
	node         kbytes   used     free     used  Mounted on
	/dev/rz0f     521885  289461  180236    62%   /a1
	% kill %7
	[7]    Terminated             ./mycat
	% df .
	Filesystem   Total    kbytes   kbytes   %     
	node         kbytes   used     free     used  Mounted on
	/dev/rz0f     521885  289433  180264    62%   /a1

The file is (and was before) 28K.  When I killed the job that was running
the old version of the file, 28K magically appeared in my free list -- even
though the "rm" command was executed several minutes ago.  Those blocks were
still needed by the kernel, in case %7 had had any of its read-only text
pages stolen by the kernel (or in case it needed one it hadn't used yet).

Note that NFS makes this harder on everyone, since the server won't keep
track of who has files open (this is because NFS is "stateless").  When
someone on the server removes a file that is being executed by some client,
the blocks GO AWAY IMMEDIATELY.  In recent years, NFS was fixed so that the
client's old block-numbers are invalid to the server after the file is
removed (but not when it's written to!  hahahahaha but I digress).  The
symptom you saw with your own /usr/bin/ftp process getting a segfault
because you overwrote the executable still happens when NFS is involved,
but for purely local files (as I expect your /usr/bin/ftp was) it is not
supposed to happen.

I don't have a 386BSD machine to try this on.  I tried ref.tfs.com but
there's some kind of network problem between me and it right now.  I would
love to see someone else run the above examples to see if 386BSD knows how
to keep you from writing on running executables, and whether it hangs onto
"busy blocks" until the last close.  I know that Bill had to rewrite the
buffer cache, which is what handles all of this stuff, and it's possible
that this somewhat-obscure boundary condition didn't get tested in this way.
--
Paul Vixie, DEC Network Systems Lab	
Palo Alto, California, USA         	"Don't be a rebel, or a conformist;
<vixie@pa.dec.com> decwrl!vixie		they're the same thing, anyway.  Find
<paul@vix.com>     vixie!paul		your own path, and stay on it."  -me