*BSD News Article 7027

Newsgroups: comp.unix.bsd
Path: sserve!manuel.anu.edu.au!munnari.oz.au!uunet!spool.mu.edu!agate!dog.ee.lbl.gov!hellgate.utah.edu!fcom.cc.utah.edu!cs.weber.edu!terry
From: terry@cs.weber.edu (A Wizard of Earth C)
Subject: Re: Repeat of the question about VFS and VOP_SEEK()
Message-ID: <1992Oct25.121136.26473@fcom.cc.utah.edu>
Sender: news@fcom.cc.utah.edu
Organization: University of Utah Computer Center
References: <b3co03lsb3LE00@amdahl.uts.amdahl.com> <1992Oct20.193544.2360@fcom.cc.utah.edu> <BwFu1E.759@pix.com> <1992Oct21.201738.22999@fcom.cc.utah.edu> <BwLp9z.8J2@flatlin.ka.sub.org>
Date: Sun, 25 Oct 92 12:11:36 GMT
Lines: 90

In article <BwLp9z.8J2@flatlin.ka.sub.org>, bad@flatlin.ka.sub.org (Christoph Badura) writes:
|> In <1992Oct21.201738.22999@fcom.cc.utah.edu> terry@cs.weber.edu (A Wizard of Earth C) writes:
|> >A lot of the differences are evolutionary rates differring between systems,
|> >and different choices being made (SVR4 seperate vop_read and vop_write
|> >out of the BSD vop_rdwr for POSIX compliance and to avoid a recursion
|> >loop, for instance).
|> 
|> How could separating vop_rdwr into vop_read and vop_write help POSIX
|> compliance. I'd be very interested in an explanation that takes into
|> account that the SVR4 ufs-vop_read and ufs-vop_write almost
|> instantaneousley call ufs_rwip.

In the SVR4.4 kernel sources, in /usr/src/uts/i386/fs/ufs/ufs_vnops.c, in the
function ufs_write(), it says (paraphrased for legal reasons):

	An ASSERT() is used to insure the behaviour conforms to the 
	agreed upon [in POSIX 1003.1-1988] vnode interface regarding
	the preservation of atomicity in reads and writes.  This
	necessarily disallows calls to ufs_rdwr(), since the ufs_ilock()
	there would then become recursive.

Clearly, if we can agree that POSIX compliant behaviour is what mandates the
atomicity of reads and writes (the part I inserted and put in brackets), then
we can agree that POSIX behaviour mandated the split.

|> >Thus perhaps the best answer is that the interface is ill defined.  In
|> >the previous post referenced above, I referred to the illogicality of
|> >making the call, since a seek offset is an artifact of an open file
|> >descriptor, and is not an attribute of an inode or vnode in most of
|> >the current implementations.  I also pointed out a potentially valid use
|> >for passing the seek down:  predictive read ahead.  The problem here is
|> >that either the read, the seek, or the open would have to be attributed
|> >to flag the descriptor for predictive behaviour if this is to be a
|> >successful optimization.
|> 
|> Since all that is needed for predictive read ahead below the VFS layer
|> is a) a vnode and b) the new seek offset, I can't follow you
|> illogicality claims.

1)	I didn't say it was needed for predictive read ahead, I said this was
	potential use.  I can think of at least 3 other ways a file server
	using a UNIX (or UNIX derived) file system could implement predictive
	read ahead.

2)	It is illogical to make a call to a lower layer when the abstraction
	(a seek offset) is limited in scope to an upper layer (making reads
	and writes relative to the previous read or write in the system call
	layer).

3)	In practice, my suggested use (predictive read ahead) is implemented
	by a modified system call layer eliminating dependence on the seek
	offset, thus obviating the need to notify the file system itself of
	such an animal.

4)	Predictive read ahead based on any mechanism *requires* some method
	of promiscuously informing the file system that the file descriptor
	in question will be used in such a way that predictive read ahead.
	This includes the suggested VOP_SEEK() method.  The fallacy here is
	that predictive read ahead requires a hueristic dependant on the user
	application [potentially a file server] having the ability to benefit
	from the read ahead.  This pretty much pushes the implementation of
	the hueristic into user space (or at least server space for a kernel
	server implementation).  Such an application has better mechanisms
	than seek-based predictions available to it, considering that it likely
	has a back door to the file system anyway.

If we take the specific example of a DOS server implemented in user space on
a UNIX system, it is obvious that zone-caching in user space out-performs
predictive caching in the kernel, simply because of the way DOS executables
(the files which benefit most from prediction) are read for program load.

I think I can safely say the benefits of predictive read ahead are questionable
unless there is a cooperative mechanism which obviates the need to use lseek()
to communicate the read ahead.  I can see the designers leaving it in
there for some future "smarter NFS", but nothing in user space currently
requires nor could benefit from predictive read ahead implemented this way.


					Terry Lambert
					terry@icarus.weber.edu
					terry_lambert@novell.com
---
Any opinions in this posting are my own and not those of my present
or previous employers.

-- 
-------------------------------------------------------------------------------
                                        "I have an 8 user poetic license" - me
 Get the 386bsd FAQ from agate.berkeley.edu:/pub/386BSD/386bsd-0.1/unofficial
-------------------------------------------------------------------------------