*BSD News Article 14014


Return to BSD News archive

Newsgroups: comp.os.386bsd.bugs
Path: sserve!newshost.anu.edu.au!munnari.oz.au!news.Hawaii.Edu!ames!haven.umd.edu!uunet!mcsun!sun4nl!eur.nl!pk
From: pk@cs.few.eur.nl (Paul Kranenburg)
Subject: Conspiring bugs
Message-ID: <1993Apr5.132031.10000@cs.few.eur.nl>
Keywords: VMSTAT,VM,NFS
Sender: news@cs.few.eur.nl
Reply-To: pk@cs.few.eur.nl
Organization: Erasmus University Rotterdam
Date: Mon, 5 Apr 1993 13:20:31 GMT
Lines: 184

Recently, several bugs and omissions conspired against me to cause heavy
system crashes.

While working on the NET/2 version of vmstat(8) to cooperate with the current
VM statistics variables in the kernel. I noticed that the executable produced
reasonable output the first time it was run. However, subsequent invocations
started to dump core with a bus error. Also, the kernel would panic somewhere
in the NFS code at a later time (now known to be related to vmstat's core
dump). [ I should explain that I run in `dataless' mode: local root filesystem,
/usr NFS mounted from a Sun IPC].

This is what happened. First, a typo in vmstat.c tricked the vmstat into
issuing a "read(fd, 0, 4)" which unfortunately did not lead to an immediate
SIGSEGV, because the hardware does not automatically detect a protection
violation while in kernel mode. However, it does mark the page (which is
mapped to the running executable's text segment) as modified. Upon process
termination the allocated VM object is entered in the object cache. When
the time comes to flush the modified page (say by the pageout daemon or
induced by a `rm a.out') the kernel wants to write the bogus page onto its
backing store.

The vnode pager now takes control and prepares a call to the vnode layer
write routine (VOP_WRITE). This will always fail on a NFS filesystem:
the uio_procp field used by vnode_pager_io() is filled with a NULL pointer
to which the nfs_write() routine reacts badly. In addition to this, the
credentials passed to VOP_WRITE are those of the current process which may
not suffice to make the NFS operation succeed. The following example
demonstrates this:


#include <sys/types.h>
#include <sys/mman.h>
#include <sys/file.h>
#include <fcntl.h>

#define SIZE	4096

main()
{
        char *ad;
        int i,j;
	int fd;

	fd = open("xxx", O_RDWR|O_CREAT, 0666);
	if (fd == -1) {
		perror("open");
		exit(1);
	}
	ftruncate(fd, SIZE);

	ad = mmap(0, SIZE, PROT_READ|PROT_WRITE,
					MAP_FILE|MAP_SHARED, fd, 0);
	if ((int)ad == -1) {
		perror("mmap");
		exit(0);
	}
	for (j = 0; j< SIZE; j++)
		ad[j] = 1;
/*
	munmap(ad, SIZE);
*/

	printf("Sleeping\n");
	sleep(100);
	printf("Done\n");
	return 0;
}

Run this on an NFS filesystem. While the process is sleeping cause its modified
page the get paged out by starting some other memory hog (say X11) under
another userID. The page never makes it back to the file.

While the process on whose behalf a pageout takes place may no longer be
available, we can hang on to the credential structure for IO operations.
The patches attached below take care of this by adding a credentials
field to the vnode pager data. A similar change could be made to the
swap pager to allow swapping on a NFS mounted file (if the rest of the
swapping code would allow for that).

-pk


------------------------------------------------------------------------------

------- vnode_pager.c -------
*** /tmp/da12915	Mon Apr  5 15:16:17 1993
--- vnode_pager.c	Sat Apr  3 12:19:40 1993
***************
*** 149,154 ****
--- 149,157 ----
  		vnp->vnp_flags = 0;
  		vnp->vnp_vp = vp;
  		vnp->vnp_size = vattr.va_size;
+ 		vnp->vnp_cred = p->p_ucred;
+ 		if (vnp->vnp_cred)
+ 			crhold(vnp->vnp_cred);
  		queue_enter(&vnode_pager_list, pager, vm_pager_t, pg_list);
  		pager->pg_handle = handle;
  		pager->pg_type = PG_VNODE;
***************
*** 195,200 ****
--- 198,205 ----
  		vrele(vp);
  	}
  	queue_remove(&vnode_pager_list, pager, vm_pager_t, pg_list);
+ 	if (vnp->vnp_cred)
+ 		crfree(vnp->vnp_cred);
  	free((caddr_t)vnp, M_VMPGDATA);
  	free((caddr_t)pager, M_VMPAGER);
  }
***************
*** 415,421 ****
  	struct iovec aiov;
  	vm_offset_t kva, foff;
  	int error, size;
! 	struct proc *p = curproc;		/* XXX */
  
  #ifdef DEBUG
  	if (vpagerdebug & VDB_FOLLOW)
--- 420,426 ----
  	struct iovec aiov;
  	vm_offset_t kva, foff;
  	int error, size;
! /*	struct proc *p = curproc;		/* XXX */
  
  #ifdef DEBUG
  	if (vpagerdebug & VDB_FOLLOW)
***************
*** 458,466 ****
  		       vnp->vnp_vp, kva, foff, size);
  #endif
  	if (rw == UIO_READ)
! 		error = VOP_READ(vnp->vnp_vp, &auio, 0, p->p_ucred);
  	else
! 		error = VOP_WRITE(vnp->vnp_vp, &auio, 0, p->p_ucred);
  #ifdef DEBUG
  	if (vpagerdebug & VDB_IO) {
  		if (error || auio.uio_resid)
--- 463,471 ----
  		       vnp->vnp_vp, kva, foff, size);
  #endif
  	if (rw == UIO_READ)
! 		error = VOP_READ(vnp->vnp_vp, &auio, 0, vnp->vnp_cred);
  	else
! 		error = VOP_WRITE(vnp->vnp_vp, &auio, 0, vnp->vnp_cred);
  #ifdef DEBUG
  	if (vpagerdebug & VDB_IO) {
  		if (error || auio.uio_resid)

------- vnode_pager.h -------
*** /tmp/da12918	Mon Apr  5 15:16:18 1993
--- vnode_pager.h	Sat Apr  3 12:19:41 1993
***************
*** 47,52 ****
--- 47,53 ----
  struct vnpager {
  	int		vnp_flags;	/* flags */
  	struct vnode	*vnp_vp;	/* vnode */
+ 	struct ucred	*vnp_cred;	/* user credentials */
  	vm_size_t	vnp_size;	/* vnode current size */
  };
  typedef struct vnpager	*vn_pager_t;

------- nfs_bio.c -------
*** /tmp/da12926	Mon Apr  5 15:17:19 1993
--- nfs_bio.c	Fri Apr  2 21:44:10 1993
***************
*** 235,241 ****
  	 * Maybe this should be above the vnode op call, but so long as
  	 * file servers have no limits, i don't think it matters
  	 */
! 	if (uio->uio_offset + uio->uio_resid >
  	      p->p_rlimit[RLIMIT_FSIZE].rlim_cur) {
  		psignal(p, SIGXFSZ);
  		return (EFBIG);
--- 235,242 ----
  	 * Maybe this should be above the vnode op call, but so long as
  	 * file servers have no limits, i don't think it matters
  	 */
! 	if (vp->v_type == VREG && p &&
! 	    uio->uio_offset + uio->uio_resid >
  	      p->p_rlimit[RLIMIT_FSIZE].rlim_cur) {
  		psignal(p, SIGXFSZ);
  		return (EFBIG);