*BSD News Article 3946


Return to BSD News archive

Xref: sserve comp.unix.bsd:3993 comp.bugs.4bsd:1893
Newsgroups: comp.unix.bsd,comp.bugs.4bsd
Path: sserve!manuel!munnari.oz.au!mips!mips!sdd.hp.com!wupost!uunet!iWarp.intel.com|ichips!intelhf!agora!davidg
From: davidg@agora.rain.com (David Greenman)
Subject: 386BSD kernel bugs w/fixes
Message-ID: <1992Aug20.105712.26856@agora.uucp>
Sender: davidg@agora.uucp (David Greenman)
Organization: Open Communications Forum
Date: Thu, 20 Aug 1992 10:57:12 GMT
Lines: 176


   This patch-kit contains a number of bugfixes for 386BSD.

1) The filesystem buffer cache has a major problem with the proper
	queueing of buffers to the correct queue. Specifically, in the
	files /sys/ufs/ufs_vnops.c and /sys/kern/spec_vnops.c the flag
	B_AGE is incorrectly getting set in bp->b_flags when a read
	finishes. This causes almost all buffers to be released to the
	AGE queue where they are then immediately reused for the next
	read - resulting in worst case cache performance. My under-
	standing of the original intent in the setting of B_AGE (in the
	case of a buffered read - a different intent in a write) was to
	cause the quick re-use of buffers that are fragments of a complete
	filesystem block with the assumption that the same request for
	a fragment wasn't likely. The assumption and the algorithm that
	determines a 'fragment' are bogus. The fix is to simply not set
	the flag in the case of a read. This substantially improves file-
	system performance in 386BSD.
2) The FS buffer cache has a bug in the hash index calculation. In the
	file /sys/sys/buf.h the index for bufhash[] that is used in the
	#define BUFHASH almost always yields a zero index. This completely
	defeats the purpose/advantages of the buffer hash table. The
	problem developed during the conversion to a vnode based (rather
	than device based) buffer cache. The fix is to change the equa-
	tion to one that provides a fairly even distribution in the
	hash table. My fix for this substantially improves the perform-
	ance of the incore() routine which finds in-cache buffers.
3) The FS buffer cache has another bug that causes memory for buffers to
	be allocated twice. The bug is in the file /sys/kern/vfs__bio.c,
	in the function getnewbuf(). The problem is that bp->bufsize is
	not initialized when a new buffer is used for the first time.
	Fortunately, allocbuf() frees the previously malloc'd space so no
	memory is lost. However, the variable freebufspace is left incor-
	rectly reduced. The fix for this is to fill in bp->bufsize just
	after mallocing the memory in getnewbuf().
4) The virtual memory system has a bug that will prevent a process from
	growing in its virtual size greater than 6MB data/stack. This is
	caused by the incorrect process limit being checked in /sys/vm/
	vm_unix.c. As they are defined in /sys/i386/include/vmparam.h,
	the constants DFLDSIZ and DFLSSIZ specify the initial limit (the
	limit when the process is created in execve()) for a process. The
	constants MAXTSIZ and MAXSSIZ specify the maximum that a process
	can grow. These constants are copied into process zero's proc->
	p_limit[] and are used as a prototype for all processes. rlim_cur
	contains DFLDSIZ/DFLSSIZ, and rlim_max contains MAXDSIZ/MAXSSIZ.
	rlim_max should therefore be used to limit the process's virtual
	size growth. However, the check is against rlim_cur. The fix is
	to change the check to rlim_max. One might also note that 6MB for
	DFLDSIZ/DFLSSIZ in vmparam.h is very low and should be increased
	to something more reasonable. Most vendors specify an initial
	limit of 32MB.
5) The virtual memory system in 386BSD 0.1 has not yet been completed.
	The system has a stub routine for swapout() which does nothing
	but unset SLOAD and remove the process from the run queue. It
	does not pageout or swapout a single page. The calling of it
	with this behavior causes unnecessary scheduling overhead, and 
	also causes the 'ps' command to omit some process information. The
	temporary fix is to comment out the call to swap_threads() in
	/sys/vm/vm_pageout.c. Of course this fix should be removed when
	the VM code has been completed. (Actually, with a good paging
	algorithm, swapping shouldn't be necessary...but then there's
	tradition... I'm working on writing some better paging code for
	386BSD.)


---
David Greenman
davidg%implode@percy.rain.com

Here are the patches:


*** /sys/ufs/ufs_vnops.c.01orig	Thu Aug 13 23:11:01 1992
--- /sys/ufs/ufs_vnops.c	Sun Aug 16 02:49:02 1992
***************
*** 497,504 ****
--- 497,506 ----
  			return (error);
  		}
  		error = uiomove(bp->b_un.b_addr + on, (int)n, uio);
+ #if 0
  		if (n + on == fs->fs_bsize || uio->uio_offset == ip->i_size)
  			bp->b_flags |= B_AGE;
+ #endif
  		brelse(bp);
  	} while (error == 0 && uio->uio_resid > 0 && n != 0);
  	return (error);
*** /sys/kern/spec_vnops.c.01orig	Sun Jun  7 16:42:13 1992
--- /sys/kern/spec_vnops.c	Sun Aug 16 02:54:32 1992
***************
*** 218,225 ****
--- 218,227 ----
  				return (error);
  			}
  			error = uiomove(bp->b_un.b_addr + on, n, uio);
+ #if 0
  			if (n + on == bsize)
  				bp->b_flags |= B_AGE;
+ #endif
  			brelse(bp);
  		} while (error == 0 && uio->uio_resid > 0 && n != 0);
  		return (error);
*** /sys/sys/buf.h.01orig	Sat Jun 20 20:02:53 1992
--- /sys/sys/buf.h	Sun Aug 16 03:34:06 1992
***************
*** 112,121 ****
  #define RND	(MAXBSIZE/DEV_BSIZE)
  #if	((BUFHSZ&(BUFHSZ-1)) == 0)
  #define	BUFHASH(dvp, dblkno)	\
! 	((struct buf *)&bufhash[((int)(dvp)+(((int)(dblkno))/RND))&(BUFHSZ-1)])
  #else
  #define	BUFHASH(dvp, dblkno)	\
! 	((struct buf *)&bufhash[((int)(dvp)+(((int)(dblkno))/RND)) % BUFHSZ])
  #endif
  
  struct	buf *buf;		/* the buffer pool itself */
--- 112,121 ----
  #define RND	(MAXBSIZE/DEV_BSIZE)
  #if	((BUFHSZ&(BUFHSZ-1)) == 0)
  #define	BUFHASH(dvp, dblkno)	\
! 	((struct buf *)&bufhash[((int)(dvp)/sizeof(struct vnode)+(int)(dblkno))&(BUFHSZ-1)])
  #else
  #define	BUFHASH(dvp, dblkno)	\
! 	((struct buf *)&bufhash[((int)(dvp)/sizeof(struct vnode)+(int)(dblkno)) % BUFHSZ])
  #endif
  
  struct	buf *buf;		/* the buffer pool itself */
*** /sys/kern/vfs__bio.c.01orig	Sun Jun 28 20:49:18 1992
--- /sys/kern/vfs__bio.c	Mon Aug 17 02:40:09 1992
***************
*** 344,349 ****
--- 344,350 ----
  		bp->b_flags = B_BUSY | B_INVAL;
  		bremfree(bp);
  		bp->b_un.b_addr = addr;
+ 		bp->b_bufsize = sz;
  		goto fillin;
  	}
  
*** /sys/vm/vm_unix.c.01orig	Sun Jul  5 11:53:46 1992
--- /sys/vm/vm_unix.c	Mon Aug 17 04:53:40 1992
***************
*** 65,71 ****
  
  	old = (vm_offset_t)vm->vm_daddr;
  	new = round_page(uap->nsiz);
! 	if ((int)(new - old) > p->p_rlimit[RLIMIT_DATA].rlim_cur)
  		return(ENOMEM);
  	old = round_page(old + ctob(vm->vm_dsize));
  	diff = new - old;
--- 65,71 ----
  
  	old = (vm_offset_t)vm->vm_daddr;
  	new = round_page(uap->nsiz);
! 	if ((int)(new - old) > p->p_rlimit[RLIMIT_DATA].rlim_max)
  		return(ENOMEM);
  	old = round_page(old + ctob(vm->vm_dsize));
  	diff = new - old;
***************
*** 113,119 ****
  	 * Really need to check vs limit and increment stack size if ok.
  	 */
  	si = clrnd(btoc(vm->vm_maxsaddr + MAXSSIZ - sp) - vm->vm_ssize);
! 	if (vm->vm_ssize + si > btoc(p->p_rlimit[RLIMIT_STACK].rlim_cur))
  		return (0);
  	vm->vm_ssize += si;
  	return (1);
--- 113,119 ----
  	 * Really need to check vs limit and increment stack size if ok.
  	 */
  	si = clrnd(btoc(vm->vm_maxsaddr + MAXSSIZ - sp) - vm->vm_ssize);
! 	if (vm->vm_ssize + si > btoc(p->p_rlimit[RLIMIT_STACK].rlim_max))
  		return (0);
  	vm->vm_ssize += si;
  	return (1);