*BSD News Article 7083

Path: sserve!manuel.anu.edu.au!munnari.oz.au!uunet!kithrup!hoptoad!decwrl!sdd.hp.com!elroy.jpl.nasa.gov!ames!data.nas.nasa.gov!taligent!apple!veritas!craig
From: craig@Veritas.COM (Craig Harmer)
Newsgroups: comp.unix.bsd
Subject: Re: Repeat of the question about VFS and VOP_SEEK()
Summary: VOP_RDWR() can be atomic; VOP_SEEK() is OK
Keywords: VOP_SEEK VOP_READ VOP_WRITE VOP_RDWR
Message-ID: <1992Oct26.213408.21184@Veritas.COM>
Date: 26 Oct 92 21:34:08 GMT
References: <1992Oct21.201738.22999@fcom.cc.utah.edu> <BwLp9z.8J2@flatlin.ka.sub.org> <1992Oct25.121136.26473@fcom.cc.utah.edu>
Organization: VERITAS Software
Lines: 110

In article <1992Oct25.121136.26473@fcom.cc.utah.edu> terry@cs.weber.edu (A Wizard of Earth C) writes:
}In article <BwLp9z.8J2@flatlin.ka.sub.org>, bad@flatlin.ka.sub.org (Christoph Badura) writes:
}|> In <1992Oct21.201738.22999@fcom.cc.utah.edu> terry@cs.weber.edu (A Wizard of Earth C) writes:
}|> How could separating vop_rdwr into vop_read and vop_write help POSIX
}|> compliance. I'd be very interested in an explanation that takes into
}|> account that the SVR4 ufs-vop_read and ufs-vop_write almost
}|> instantaneousley call ufs_rwip.
}
}In the SVR4.4 kernel sources, in /usr/src/uts/i386/fs/ufs/ufs_vnops.c, in the
}function ufs_write(), it says (paraphrased for legal reasons):
}
}	An ASSERT() is used to insure the behaviour conforms to the 
}	agreed upon [in POSIX 1003.1-1988] vnode interface regarding
}	the preservation of atomicity in reads and writes.  This
}	necessarily disallows calls to ufs_rdwr(), since the ufs_ilock()
}	there would then become recursive.
}
}Clearly, if we can agree that POSIX compliant behaviour is what mandates the
}atomicity of reads and writes (the part I inserted and put in brackets), then
}we can agree that POSIX behaviour mandated the split.

i don't see how atomicity guarantees demand seperate read/write
interfaces.  imagine this code:

ufs_rdwr(vp, uiop, type)
	struct vnode *vp;
	struct uio *uiop;
	int type;
{
	ufs_ilock(VTOI(vp));

	if (type == READ) {
		ufs_read(vp, uiop);
	} else {
		ufs_write(vp, uiop):
	}

	ufs_iunlock(VTOI(vp));
}

assuming the inode lock is not released in ufs_read() or ufs_write()
how is this not atomic with respect to other read and write requests?

i don't see what POSIX has to do with the the splitting of VOP_RDWR
at the vnode interface layer.

also, inode locks in SVR4.0 are recursive, at least for UFS and VxFS.

}|> >Thus perhaps the best answer is that the interface is ill defined.  In
}|> >the previous post referenced above, I referred to the illogicality of
}|> >making the call, since a seek offset is an artifact of an open file
}|> >descriptor, and is not an attribute of an inode or vnode in most of
}|> >the current implementations.  I also pointed out a potentially valid use
}|> >for passing the seek down:  predictive read ahead.  The problem here is
}|> >that either the read, the seek, or the open would have to be attributed
}|> >to flag the descriptor for predictive behaviour if this is to be a
}|> >successful optimization.

the seek offsets are passed down because the file system independent
layer doesn't persume to know the range of valid seek offsets for a
file system type.  this gives the file systems specific code an
opportunity to complain when the seek *system call* is made.
lseek() can return an error if it needs to.


}|> Since all that is needed for predictive read ahead below the VFS layer
}|> is a) a vnode and b) the new seek offset, I can't follow you
}|> illogicality claims.

VOP_SEEK() is pretty useless for predictive read-ahead since most
applications call lseek() and then read() or write() immediately
afterward.  the amount of real time that seperates the calls is
insignificant compared to the time to perform the disk i/o.

you'd be much better off doing read-ahead and write-behind when
the read or write request comes along.

....

}I think I can safely say the benefits of predictive read ahead are questionable
}unless there is a cooperative mechanism which obviates the need to use lseek()
}to communicate the read ahead.  I can see the designers leaving it in
}there for some future "smarter NFS", but nothing in user space currently
}requires nor could benefit from predictive read ahead implemented this way.

if you're talking about using lseek() to "request" a read ahead, that's
silly.  lseek() already has a set of semantics associated with it, and
adding new ones would confuse the issue.  invent a new system call or
convince USL to add the asynchronous I/O systems calls originally planned
for SVR4.0.  

finally, read-ahead (and write-behind) are useful for applications
that don't perform any buffering of their own.  a common application
behavior in Unix is to read an entire file sequentially, or to
truncate a file, write it sequentially, and close it.  file systems
that detect this behavior and modify their behavior appropriately
can provide significant performance improvements.

}					Terry Lambert
}					terry@icarus.weber.edu
}					terry_lambert@novell.com
}---
}Any opinions in this posting are my own and not those of my present
}or previous employers.

-- 
{apple,amdahl}!veritas!craig				craig@veritas.com
(415) 668-3564 (h)					(408) 727-1222 x220 (w)
	[views expressed above aren't Veritas' views, nor should 
	they be mistaken for the views of any responsible person.]