*BSD News Article 25180


Return to BSD News archive

Path: sserve!newshost.anu.edu.au!munnari.oz.au!news.Hawaii.Edu!ames!agate!howland.reston.ans.net!gatech!udel!newsserv.cs.sunysb.edu!stark.UUCP!stark!stark!gene
From: stark!gene@newsserv.cs.sunysb.edu (Gene Stark)
Newsgroups: comp.os.386bsd.bugs
Subject: "Copyout" patch to trap.c (FIX)
Date: 20 Dec 93 09:09:53
Organization: Gene Stark's home system
Lines: 129
Distribution: world
Message-ID: <STARK!GENE.93Dec20090953@stark.uucp>
NNTP-Posting-Host: stark.uucp

Several weeks ago I posted a patch to sys/i386/i386/trap.c that would
fix a problem with VM allocation from within a system call (i.e. inside
the copyout routine).  Since then, I have discovered that this fix exposed
another bug which produces system instability.  Although I had already
sent a fix for this new bug to the FreeBSD folks, reading another post
just now reminded me that there might be other people who installed my
first patch who will need the second one, too.  So, here it is.  If you
installed my first patch, it is pretty important to install this one, too.

*** /usr/src/sys/i386/i386/trap.c	Sat Nov 20 14:47:44 1993
--- trap.c	Sat Dec 11 20:46:53 1993
***************
*** 347,352 ****
--- 347,359 ----
  			 */
  			if (nss > vm->vm_ssize)
  				vm->vm_ssize = nss;
+ 			/* 
+ 			 * va could be a page table address, if the fault
+ 			 * occurred from within copyout.  In that case,
+ 			 * we have to wire it. (EWS 12/11/93)
+ 			 */
+ 			if (ispt(va))
+ 			  vm_map_pageable(map, va, round_page(va+1), FALSE);
  			va = trunc_page(vtopte(va));
  			/* for page table, increment wiring
  			   as long as not a page table fault as well */


You should also be aware that the FreeBSD team is working on a revised
pmap module, and the above fixes may be supplanted by totally new pmap.c
and maybe trap.c code in the next release.

							- Gene Stark

(Explanation of bug follows for those interested.)
-----------------------------
The bug is exercised by doing the following in sequence:

(1)  Causing a new page of PTE's to be allocated from within a system
	call, e.g. reading a > 4MB file into emacs.
(2)  Causing pageout to be initiated, e.g. by stopping emacs and then
	running a process that allocates and accesses a large amount
	of memory.

The system panics or does other weird stuff during the paging out.
If instead of (1) above, you do:

(1')  Causing a new page of PTE's to be allocated by simply using
	brk() to increase the break and then accessing the memory.

then the panics don't happen when you do (2).  So there was something
different about the way the new PTE's are allocated during (1) as
opposed to (1').

After tracking through the system for awhile, I found that the PDE that
was correctly allocated during operation (1) was getting zeroed.  Later
on, during (2), a call to pmap_changebit() occurs, which in turn calls
pmap_pte() on a virtual address which requires the zeroed PDE.
Pmap_pte() simply returns 0 in case the following test fails:

	if (pmap && pmap_pde_v(pmap_pde(pmap, va))) {

This is bad, since 0 is for sure not a valid address of a PTE, and
nobody else seems to be checking the return value from pmap_pte().
I put a panic() at the end of pmap_pte() instead of the return(0).

I had a hard time figuring out what was going on, when it occurred to
me that what must be happening is that the page of PTE's is getting
paged out before the pages that those PTE's map.  That is, for some
reason the page of PTE's is not getting wired when it is allocated.

I think I figured out how this happens:  On a normal (non-copyout)
page fault, the trap goes through trap(), which first faults the
page of PTE's, if necessary, then faults the actual page.  The code
that faults the page of PTE's wires the page once it is allocated.
This is done by the following code in trap.c:

		/* check if page table is mapped, if not, fault it first */
#define pde_v(v) (PTD[((v)>>PD_SHIFT)&1023].pd_v)
		if (!pde_v(va)) {
			v = trunc_page(vtopte(va));
			rv = vm_fault(map, v, ftype, FALSE);
			if (rv != KERN_SUCCESS) goto nogo;
			/* check if page table fault, increment wiring */
			vm_map_pageable(map, v, round_page(v+1), FALSE);
		} else v=0;

On the other hand, when copyout() is running, there is first an actual
trap generated from within copyout() by the access to the missing PTE.
This trap is handled in trap().  The page itself is then faulted by
a simulated trap that goes through trapwrite().  Here is the problem:
the code in trap() does double duty for both page table faults and
user faults.  When the trap comes from copyout(), the "logic" (I use the
term loosely!) that checks what kind of fault it is fails to detect that
the virtual address that is getting faulted is actually a page table address,
rather than an ordinary user address.  So, the code shown above that is
supposed to fault and then wire the new page of PTE's does not run, and
instead control passes to the immediately following code:

		rv = vm_fault(map, va, ftype, FALSE);
		if (rv == KERN_SUCCESS) {
			/*
			 * XXX: continuation of rude stack hack
			 */
			if (nss > vm->vm_ssize)
				vm->vm_ssize = nss;
			va = trunc_page(vtopte(va));
			/* for page table, increment wiring
			   as long as not a page table fault as well */
			if (!v && type != T_PAGEFLT)
			  vm_map_pageable(map, va, round_page(va+1), FALSE);
			if (type == T_PAGEFLT)
				return;
			goto out;
		}

The only wiring that would happen here is to bump up the count on a page
of PTE's that was already mapped.  Now, v is zero because the page table
fault code didn't execute.  However, the trap came from kernel mode,
so we do have type == T_PAGEFLT (rather than T_PAGEFLT | T_USER), and this
code doesn't get the page of PTE's wired either.

The fix I am suggesting is above.  It might also be good defensive programming
to put a panic at the end of pmap_pte() in pmap.c as I mentioned above, so
that we know before somebody tries to use a NULL pointer as the address of
a PTE!

--