*BSD News Article 14431

Newsgroups: comp.os.386bsd.development
Path: sserve!newshost.anu.edu.au!munnari.oz.au!news.Hawaii.Edu!ames!agate!dog.ee.lbl.gov!hellgate.utah.edu!fcom.cc.utah.edu!cs.weber.edu!terry
From: terry@cs.weber.edu (A Wizard of Earth C)
Subject: Re: File Truncation Philosophy
Message-ID: <1993Apr13.203234.16408@fcom.cc.utah.edu>
Sender: news@fcom.cc.utah.edu
Organization: Weber State University  (Ogden, UT)
References: <1993Apr8.025858.22137@uvm.edu> <1993Apr11.035322.19610@fcom.cc.utah.edu> <C5FJx6.o5w@ns1.nodak.edu>
Date: Tue, 13 Apr 93 20:32:34 GMT
Lines: 111

In article <C5FJx6.o5w@ns1.nodak.edu> tinguely@plains.NoDak.edu (Mark Tinguely) writes:
>the dumb approach.
>	Once the file starts executing, fail writes to the file. Though this is
>	extremely simple, it is dumb because once opened successfully, writes
>	should not fail. Also Terry points out EBUSY is not POSIX compliant.

EBUSY *is* in Posix ..it's ETXTBSY that's missing; but the return value of
EBUSY would be incorrectly overloaded (as if it were a lock) on return from
open().  There is also the issue of an EBUSY resulting from a read/write/
other operation on a file --this seems (from my reading) to be an illegal
return.  Definitely not acceptable.

>better (than the dumb) approach.
>	When a program wants to execute an already open file (again as Terry
>	said preferably a writable open file) or open a executable file, copy
>	the file as a temporary in the filesystem. By adding a new vnode
>	reference to the the vnode structure, we can allow other programs that
>	also start executing the now open file, to use this copy of the program
>	(so we do not fill the filesystem with these temporaries).
>
>	We have two choices of the life time of the temporary (assume the
>	original write lasts longer than the running of the program), we can
>	keep the lifetime of the temporary until we close the write. In this
>	way we keep the file closer to the original and cut down in the copying
>	overhead (in a sense you could think of this like the old days when
>	programs were copied to swap and used the sticky bit). On the other
>	hand we could make the temporary disappear with the last use and if it
>	gets executed several time, a new copy is made (just like normal
>	execute). Obviously if the temporary executes longer than the write,
>	the temporary will stay around until the program finishes.
>
>	I think appropriate approach is closing the file after all the copies of
>	executing programs have ended and creating a new one if needed.
>
>	When this thread was started, I was thinking we would have to implement
>	this temporary file approach. Do we lose anything by this temporary in
>	the filesystem versus in the VM (swap/memory)?

Uh, um, erg... >*ahem*<... uh...

Well, there seems to be two issues to consider.  One is the dnlc (directory
name lookup cache) and the other is subsequent opens.

The dnlc (which isn't called the dnlc, and seems to be less general purpose
than you'd want in the 386BSD implementation) caches file names shorter
than some watermark length and the vnode pointer for the directory that
the file name lives in, and the vnode for the open instance for that file.

What this boils down to is that opens and/or lookups don't necessarily
fetch an inode reference to do the open if there is a cache hit.  Since
the exec doesn't have its own lookup hooks at the VFS interface, there is
no way to distinguish what kind of open is occurring (an open for exec,
which should return the "temporary" inode, or an open (which should return
the "real" inode).

I don't see a way the distinction could be made at the VFS layer between
two in core vnodes pointing to the same file... the lookup will pick one or
the other (the first one in cache) to return.  This leaves us with two
possible approaches to implement the temporary inode mechsnism, both of
them unpleasent.  The first is to coerce cache flushes of the UFS dnlc
usages in the case of a "shadowed" inode, and the second is to push the
knowledge of the type of lookup/open below the VFS interface (by adding an
extension to it).  Both of these require changes to each file system type
instance below the VFS interface, and the second requires changes to the
NFS protocol to support an additional op.

The interface I envisioned would involve a copy-on-write-to-file of the
text pages to swap, and a remarking of the process pages as page from
swap instead of page-from-file.  This is expensive because there is no
back reference from a vnode pointer to the processes which are executing
the image (although we can either provide a hash-link or a cache of the
in-core structure to allow this, with an additional flag).  The main issue
to address here is copy-on-write text pages once they are in the swap,
since by default is to assume that if I have a page in swap, it got there
because I already modified it (and thus I am free to modify it again)...
a hellacious security hole in a naieve implementation.

New processes executed from a modified file would use the modified file
as a swap store until they terminate, write a text page, or the file
was modified again (making it unusable as a swap store).  I think the
gymnastics can be hidden in the kernel interface presented the VFS by
providing a modification advisory call that a file system makes before
opening a file for write or truncating (calling it with the vnode).
Trying to copy the original image pre-modification seems to be a mistake
(this was the suggestion I made in my last post) because it would by
definition *prevent* direct unification of the VM and buffer caches (a
suggection I also made in the last post as a speedup).  A non-anonymous
unification, where control of a page is explictly haded off between the
two would still be a possible soloution, but it's more complicated than
it has to be... besides, it makes more sense to run the code in the file
instead of the code that used to be in the file (the only exception is
the case of a fork of the original process -- it would keep the swap
image of the original file as shared text).

I think I (and others) need to hit the 1003.1 book and see what can be
slipped in through the cracks in the standard to arrive at the best
approach... this assumes Posix compliance is a goal: it's one of mine,
but potentially not one of Bill and Lynne's... any comments?


					Terry Lambert
					terry@icarus.weber.edu
					terry_lambert@novell.com
---
Any opinions in this posting are my own and not those of my present
or previous employers.
-- 
-------------------------------------------------------------------------------
                                        "I have an 8 user poetic license" - me
 Get the 386bsd FAQ from agate.berkeley.edu:/pub/386BSD/386bsd-0.1/unofficial
-------------------------------------------------------------------------------