*BSD News Article 12946

Newsgroups: comp.os.386bsd.bugs
Path: sserve!newshost.anu.edu.au!munnari.oz.au!news.Hawaii.Edu!ames!elroy.jpl.nasa.gov!swrinde!zaphod.mps.ohio-state.edu!uwm.edu!caen!nic.umass.edu!news.mtholyoke.edu!news.byu.edu!ns.novell.com!gateway.univel.com!fcom.cc.utah.edu!cs.weber.edu!terry
From: terry@cs.weber.edu (A Wizard of Earth C)
Subject: Re: VM problems w/unlimited memory?
Message-ID: <1993Mar18.183443.6397@fcom.cc.utah.edu>
Sender: news@fcom.cc.utah.edu
Organization: Weber State University  (Ogden, UT)
References: <1o4spvINNl1v@usenet.INS.CWRU.Edu> <1993Mar16.221837.10302@fcom.cc.utah.edu> <1o81joINNieh@usenet.INS.CWRU.Edu>
Date: Thu, 18 Mar 93 18:34:43 GMT
Lines: 143

In article <1o81joINNieh@usenet.INS.CWRU.Edu> chet@odin.ins.cwru.edu (Chet Ramey) writes:
>In article <1993Mar16.221837.10302@fcom.cc.utah.edu> terry@cs.weber.edu (A Wizard of Earth C) writes:
>
>>Sorry, but the fcntl fails not because it found allocated memory for the
>>closed fd, but because the fd exceeds fd_lastfile in the per process
>>open file table.  The location, since it doesn't exist, isn't addressable.
>
>This requires that the programmer be aware of the implementation
>technique used by the kernel to manage the process `open file table'.
>It's a kernel problem -- we all agree on that.

The failure mode is a kernel one, yes; however, the selection of an fd
at extreme high range is a programming bogosity that should not be
occurring anyway.  Yes, it will cause problems on 386bsd -- it will
also, however, cause problems on SVR4 and AIX, both of which dynamically
allocate their per-process file table.

Certainly this problem *must* have been solved in other environments in the
bash code, right?  Allocation of all memory on an AIX 3.2 box will result
not in a crash, but in killing the process with the largest image size
(rather than the most recently run process or the one doing the excessive
allocation).  This is arguably a *bad thing* for bash to do on AIX, right?

I think any time the programmer makes assumptions about the kernel
architecture (as one is doing when one allocates a real high fd number
and assumes that fd's are going to be allocated in ascending order by the
"open()" call such that you are guaranteed that a real high choice is a
safe one), one has to be aware of it.  Use of fd 19 with only single
digit redirection allowed is okaying off an architectural assumption.

>>The safest would be the call I suggested to return the largest open fd
>>and add one to it for your use of the fd; I don't understand from the
>>code why it's necessary to get anything other than the next available
>>descriptor; the ones the shell cares about, 0, 1, and 2, are already
>>taken at that point; if nothing else a search starting a 3 would be
>>reasonable.
>
>Nope.  Not reasonable.  If a user were to use fd 3 in a redirection
>spec, perhaps to save one of the open file descriptors to, or to open
>an auxiliary input, the file descriptor we so carefully opened to
>/dev/tty and count on to be connected to the terminal would suddenly
>be invalid.  Bash attempts to choose an `unlikely' file descriptor --
>nothing is totally safe, but this minimizes problems. 

Of course, this brings into question whether or not it is reasonable to
have a fixed open fd at all for this particular purpose.

Not only is it simply a probabalistic exercise, since one can't guarantee
the shell isn't execed from a for of some program not protecting the
"unlikely" nature of it's fd choice (ie: not another shell, and not
login; another program that makes the same assumptions without the same
protections), the fact is, it's unnecessary.  Being able to get to the
controlling tty device at a later date is what "/dev/tty" is about; as
long as the controlling tty isn't blown away, opening /dev/tty at some
future time as a transient fd (ie: close it when done) is just as
effective.  If you can come up with an example of a shell that can open
/dev/tty early in it's life, but not later in it's life (ie: it's
controlling tty changes), I'd like to see it.  Otherwise, what we are
talking about is an access-time opyimization by bash based on some
assumptions about architecture which are no longer valid.

>Bash uses the same technique in shell.c to protect the file descriptor
>it's using to read a script -- all shells do pretty much the same
>thing.  (Well, at least it does now.  The distributed version of 1.12
>does not do this.)

There doesn't seem to be a good reason for this if the shell script is
in core in the shell; it should be closed after reading.  Since reading
takes place entirely before execution, there is no conflict with the
shell script itself.  In any event, a shell script is run by a sub-shell,
not the active shell, unless one is playing games with a disk-based
interpreter and context frames within the shell.  In this case, the
traditional recursion within a shell script played by many install
packages would fail anyway.

>Before Posix.2 it was `safe' to assume that fd 19
>could be used for this -- a process was guaranteed to have at least 20
>file descriptors available to it, and there were no multi-digit file
>descriptors in redirections.  This is no longer the case.  Since bash
>currently uses stdio for reading scripts, and stdio provides no
>portable way to change the file descriptor associated with a given
>FILE *, bash attempts to avoid having to do so.  We used to just use
>the file descriptor returned from opening the script, but ran into a
>number of mysterious problems as a result. 

If one is going to make fundamental assumptions about the OS, such as
"there are a finit number of fd's on which collisions can occur" or "the
number of fd's reported by getdtablesize can all be opened or opened out
of sequence without repercussions", one might as well either directly
manipulate the FILE * contents, including the fd, or prepare for failure.

There *is* a documented, *portable* way of replacing the fd associated
with a stream:

	FILE	*newfp;
	FILE	savfp_str;

	memcpy( &savfp_str, oldfp, sizeof( file));
	newfp = fdopen( fd, "rw");	/* actual mode derivable from oldfp*/
	memcpy( oldfp, newfp);


Of course &savfp_str can be treated as if it were a FILE *; but if the
contents of the file struct were what you wanted to modify, this will do it.

Traditionally, the soloution has been to use a "non-portable method" to
directly manipulate the fd in the fp; this is what most shells do; this
is no less non-portable than making assumptions about "safe fd's" or about
range limits equalling operational limits (by using the highest known fd
possible).
	
>>I do *NOT* suggest 19, as this would tend to be bad for
>>most SVR4 systems, as it would have a tendency to push the number of
>>fd's over the bucket limit if NFPCHUNK were 20 and allocation were done
>>from index 20 on up.
>
>I don't really see how a loop going down starting at 19 will cause the
>fd `table' to grow beyond 20.  (That is the code we were talking about,
>right?)

Sorry; I assumed that the traversal would be "up" from 19 to insure that
the lower numbers (deemed "most important" were't tromped on.

>I'll probably put something in the code to clamp the max file descriptor
>to something reasonable, like 256, and press for its inclusion in the
>next release of bash, whenever that is.

This will certainly prevent outright failure on AIX and SVR4 (as well as
386bsd), but it is certainly a non-optimal soloution from the perspective
of unnecessary resource utilization.


					Terry Lambert
					terry@icarus.weber.edu
					terry_lambert@novell.com
---
Any opinions in this posting are my own and not those of my present
or previous employers.
-- 
-------------------------------------------------------------------------------
                                        "I have an 8 user poetic license" - me
 Get the 386bsd FAQ from agate.berkeley.edu:/pub/386BSD/386bsd-0.1/unofficial
-------------------------------------------------------------------------------