*BSD News Article 13446

Newsgroups: comp.os.386bsd.development
Path: sserve!newshost.anu.edu.au!munnari.oz.au!constellation!osuunx.ucc.okstate.edu!moe.ksu.ksu.edu!zaphod.mps.ohio-state.edu!uwm.edu!caen!hellgate.utah.edu!fcom.cc.utah.edu!cs.weber.edu!terry
From: terry@cs.weber.edu (A Wizard of Earth C)
Subject: Re: A challenge to all true hackers: objects and types
Message-ID: <1993Mar27.093602.3486@fcom.cc.utah.edu>
Sender: news@fcom.cc.utah.edu
Organization: Weber State University  (Ogden, UT)
References: <ARNEJ.93Mar24113744@chanur.imf.unit.no> <C4FEo2.8no@sugar.neosoft.com> <1osl3b$8vl@umd5.umd.edu>
Date: Sat, 27 Mar 93 09:36:02 GMT
Lines: 141

In article <1osl3b$8vl@umd5.umd.edu> mark@roissy.umd.edu (Mark Sienkiewicz) writes:
>In article <C4FEo2.8no@sugar.neosoft.com> peter@NeoSoft.com (Peter da Silva) writes:
>>In article <ARNEJ.93Mar24113744@chanur.imf.unit.no> arnej@imf.unit.no (Arne Henrik Juul) writes:
>>> I also think that variant links using environment variables is a BAD
>>> idea.
>>
>>I think that's a reasonable conclusion. How about variant links using
>>some other set of per-process/per-uid symbolic name space?
>
>Using environment variables is a BAD idea for this reason:
>
>	Environment variables do not exist.
>
>They are a fiction that is maintained by the user level programs.  The
>kernel does not maintain them.  They are an array of strings that are
>stored _somewhere_ in the user process, but they are entirely under the
>control of the user process.  The kernel can't even be sure it can find
>them.

This is not true... in kern_execve.c, the offset is calculated in such a way
as to allow it to be recalculated again if required (this is probably not
intentional, but it does work):

	argbuf = (char **) (newframe + MAXSSIZ - 3*ARG_MAX);
	stringbuf = stringbufp = ((char *)argbuf) + 2*ARG_MAX;
	argbufp = argbuf;

The newframe is stored in vm_maxsaddr, which means the "envp" can be
recalculated later in vfs_lookup.c.  Here's a piece of my implementation
of variant links that I've been using in combination with a mod to the
environment in init, some minor mods to rc.local, and Martin Rentors
"NFS boot" disk:

<               /*
<                * Here's where the environment variable is pulled
<                * from within the process address space:
<                * the is derived from the calculation of "stringbuf"
<                * in "kern_execve.c"; if that calculation changes,
<                * so must this calculation.
<                */
<               envp = p->p_vmspace->vm_maxsaddr + MAXSSIZ - ARG_MAX;


>Hewlett Packard has an interesting feature that was mentioned at the
>beginning of this discussion.  It is "Context Dependent Files".  A CDF is
>essentially invisible.  When you access the directory, you don't get the
>directory, but you get a file _within_ the directory.  Which file is based
>on your "context".
>
>Your context is a set of strings that the kernel carries around for you.
>When you access a CDF, the kernel searches for a file that matches a string
>in your context.
>
>A weakness of the HP implementation is that you can't add new strings to your
>context.  (Or maybe the weakness is the documentation sucks... :)  We would
>not have to follow that, of course.

The way this is done is by tagging the inode as a CDF; when a lookup is
done on the inode, you iterate on a field of the inode to arrive at the
actual non-CDF inode to use.

This can be implemented in three ways:

1)	Use of spare fields in the icommon to implement n int fields,
	giving a maximum of n variations (file system states).

2)	Modification of the actual directory structure to allow each
	entry to reference n inodes (modify struct direct in dir.h),
	yeilding a maximum od n variations.

3)	Use the contents of the CDF to provide variable numbers of an
	arbitrary number of attribute/inode pairs.

In any implementation, one must consider the context a name space ID, in
which the lookup is to take place, and modify the directory manipulation
routines appropriately to return the correct context information.  One
error here is the 386BSD UFS implementation uses "vget" rather than the
traditional Net/2 UFS "iget".  This make an iteration difficult without
an in-core vnode, and also makes naming more difficult to seperate from
inode store in future revisions.

The fact that the name and inode stores aren't seperate dictates that the
fsck must change (not true if a real seperation existed, since one would
be guaranteed of the inode checking being invariant of whether or not an
inode was referenced by a directory entry or by a CDF).

Obviously, the HP implementation used a fixed area for it's impelemtation,
most likely the area which would be the block list in a normal inode.

One must also drastically change the directory-name cache, since a
lookup using one index can not be permitted to add an entry to the cache
which would cause a cache hit when another index is used.  Thus the index
itself must be considered part of the naming information when chache
entries are made, or the caching of names in a CDF must be disallowed
(a significant performance penalty).  Cache invalidation on object deletion
occurs per index (since objects are considered per index).  This means
that objects, such as shell scripts, will have a difficult time being shared
between CDF paritions without resorting to addition special case code or
to duplication of information, or to reserving CDF's for execuables only.

>If you could add your own strings to the context, you could do anything
>that you could do with environment variables in symbolic links.

Agreed, but unlikely in practice, since this virtually guarantees a need
for kernel translation of user space values when doing the name-to-index
resoloution.  CDF style implementation is bound to remain a curiousity
of less use than real variant links, if only because it's ability to
redirect is so limited.  User definition of CDF meanings means access to
user definitions for translation.

Note that a variant link may be conditionally expanded to produce additional
depth on a path, as well as simply providing substitution for a singular
path component, something only possible with a CDF if the target of the
substitution is a symbolic link; even then, the link target may still be
required to exist in a CDF space, no matter what it's expansion.

A simple hack of "immediate file" code to place the link itself in the block
list if it is below a minimum length means that an additional block fetch is
not required above and beyond the inode itself.  Thus we may save two
lookups (assuming a type "3" implementation) over a CDF implementation of
the same extensibility.

The cost of a CDF architecture is too high to reasonably assume.  The use
of alternate name spaces should be relegated to providing a canonical
lookup space for localized file names for "well known" files, if they are
to be used at all.  Using them to provide OS dependent "file forks" is an
overindulgence unless one assumes no data will be shared between the
architectures for a given CDF (ie: the shell script example above).


					Terry Lambert
					terry@icarus.weber.edu
					terry_lambert@novell.com
---
Any opinions in this posting are my own and not those of my present
or previous employers.
-- 
-------------------------------------------------------------------------------
                                        "I have an 8 user poetic license" - me
 Get the 386bsd FAQ from agate.berkeley.edu:/pub/386BSD/386bsd-0.1/unofficial
-------------------------------------------------------------------------------