*BSD News Article 10358

Received: by minnie.vk1xwt.ampr.org with NNTP
	id AA7819 ; Tue, 26 Jan 93 15:00:12 EST
Newsgroups: comp.unix.bsd
Path: sserve!manuel.anu.edu.au!munnari.oz.au!spool.mu.edu!uwm.edu!cs.utexas.edu!news.uta.edu!utacfd.uta.edu!rwsys!sneaky!gordon
From: gordon@sneaky.lonestar.org (Gordon Burditt)
Subject: [386bsd] gdb and 386bsd vs. floating point problems (+gdb patches)
Message-ID: <C1EH04.45M@sneaky.lonestar.org>
Organization: Gordon Burditt
Date: Mon, 25 Jan 1993 08:02:23 GMT
Lines: 247

Is anyone else having terrible floating point problems on 386bsd?
I'm running on a 486DX/33, so the coprocessor is built in.
I seem to be getting npx stack-underflow faults, in contexts where it
makes absolutely no sense to be getting them ( e.g. fldl 16(%ebp);
faddl 44(%ebp); fstpl 24(%ebp), and I get a stack-empty fault on the
fstpl instruction.  Breaking npxprobe() to make it not recognize
the coprocessor and using the math emulator instead makes the program 
run VERY slowly but it works.  The math emulator doesn't seem to 
do stack underflow faults.

A certain program I'm trying to port keeps dying on SIGFPEs.  It's a
client of a server program run on the same machine, and so far only the
client gets the SIGFPEs, although both use floating-point, and there's
a lot of context-switching going on also.  So I get out gdb, and the 
first thing I discover is that gdb doesn't do anything useful when you 
ask for a stack trace, and complains about "Operation not permitted".  
I discover that the user virtual address for the stack end is not a 
constant, so I fix gdb to properly access the stack in core dumps:

Index: /usr/src/usr.bin/gdb/config/i386bsd-dep.c
***************
*** 942,947 ****
--- 942,957 ----
  		 */
  		reg_offset = (int) u.u_ar0 - KERNEL_U_ADDR;
  #else
+ 		/*
+ 		 * 386bsd does not put the stack end in a fixed virtual
+ 		 * location, so we get the beginning and depend on the
+ 		 * MAXSSIZ constant for the full length of the stack to
+ 		 * find the end.
+ 		 * (See code & comments in kern_execve.c, search for USRSTACK)
+ 		 */
+ 		stack_end = (CORE_ADDR) u.u_kproc.kp_eproc.e_vm.vm_maxsaddr 
+ 			+ MAXSSIZ;
+ 
  		data_end = data_start +
  			NBPG * u.u_kproc.kp_eproc.e_vm.vm_dsize;
  		stack_start = stack_end -

Ok, now I can get a stack trace.  The SIGFPEs seem to be coming at random
places all over the code.  Further, getting an assembly file for the problem
code, adding "fwait" instructions before and after every floating-point
instruction, assembling it, and testing the new code doesn't change anything.

"info float" in gdb doesn't do anything.  The code is conditionalled out.
So I fixed it.  Well, this is a bit kludgey, and I'd love to have someone 
point out a mistake so that the problem really isn't as wierd as it seems,
but it seems to work.  I couldn't figure out where to get an exception
status value in addition to the stored one.  There are some fundamental
disagreements between the original code and my reading of Intel manuals 
regarding the order of floating-point registers saved by fsave/fnsave.
Nothing but code in gdb seems to care, though.

Index: /usr/src/usr.bin/gdb/config/i386bsd-dep.c
***************
*** 1758,1771 ****
    
    top = (ep->status >> 11) & 7;
    
!   printf ("regno  tag  msb              lsb  value\n");
!   for (fpreg = 7; fpreg >= 0; fpreg--) 
      {
        double val;
        
!       printf ("%s %d: ", fpreg == top ? "=>" : "  ", fpreg);
        
!       switch ((ep->tag >> (fpreg * 2)) & 3) 
  	{
  	case 0: printf ("valid "); break;
  	case 1: printf ("zero  "); break;
--- 1758,1773 ----
    
    top = (ep->status >> 11) & 7;
    
!   printf ("regno    tag  msb              lsb  value\n");
!   for (fpreg = 0; fpreg <= 7; fpreg++) 
      {
        double val;
        
!       printf ("%s ST%d: ", ((fpreg+top)&7) == 7 ? "=>" : "  ", fpreg);
        
!       /* according to Intel 486 documentation, the registers are stored */
!       /* in LOGICAL order but the tag bits correspond to PHYSICAL registers */
!       switch ((ep->tag >> (((top + fpreg)&7) * 2)) & 3) 
  	{
  	case 0: printf ("valid "); break;
  	case 1: printf ("zero  "); break;
***************
*** 1787,1792 ****
--- 1789,1804 ----
    if (ep->r3)
      printf ("warning: reserved3 is 0x%x\n", ep->r3);
  }
+ #ifdef __386BSD__
+ /* 
+  * 386BSD name for saved fpu state. This had better have the same
+  * layout as the env387 struct.  Note that the size of struct fpacc87
+  * in <machine/npx.h> is actually wrong, due to struct padding, but
+  * the data layout seems to be correct anyway.
+  */
+ #define U_FPSTATE(u) u.u_pcb.pcb_savefpu
+ #define fpstate save87
+ #endif
  
  #ifndef U_FPSTATE
  #define U_FPSTATE(u) u.u_fpstate
***************
*** 1798,1823 ****
--- 1810,1851 ----
    int i;
  #ifndef __386BSD__
    /* fpstate defined in <sys/user.h> */
+ #else
+   /* save87 defined in <machine/npx.h> */
+ #endif
    struct fpstate *fpstatep;
    char buf[sizeof (struct fpstate) + 2 * sizeof (int)];
    unsigned int uaddr;
+ #ifndef __386BSD__
    char fpvalid;
+ #else
+   int fpvalid;
+ #endif
    unsigned int rounded_addr;
    unsigned int rounded_size;
    extern int corechan;
    int skip;
    
+ #ifndef __386BSD__
    uaddr = (char *)&u.u_fpvalid - (char *)&u;
+ #else
+   uaddr = (char *)&u.u_pcb.pcb_flags - (char *)&u;
+ #endif
    if (have_inferior_p()) 
      {
        unsigned int data;
        unsigned int mask;
        
+ #ifndef __386BSD__
        rounded_addr = uaddr & -sizeof (int);
        data = ptrace (3, inferior_pid, rounded_addr, 0);
        mask = 0xff << ((uaddr - rounded_addr) * 8);
        
        fpvalid = ((data & mask) != 0);
+ #else
+       data = ptrace(3, inferior_pid, (caddr_t)uaddr, 0);
+       fpvalid = (data & FP_WASUSED) != 0;
+ #endif
      } 
    else 
      {
***************
*** 1825,1831 ****
  	perror ("seek on core file");
        if (myread (corechan, &fpvalid, 1) < 0) 
  	perror ("read on core file");
!       
      }
    
    if (fpvalid == 0) 
--- 1853,1861 ----
  	perror ("seek on core file");
        if (myread (corechan, &fpvalid, 1) < 0) 
  	perror ("read on core file");
! #ifdef __386BSD__
! 	fpvalid = (fpvalid & FP_WASUSED) != 0;      
! #endif
      }
    
    if (fpvalid == 0) 
***************
*** 1847,1853 ****
        ip = (int *)buf;
        for (i = 0; i < rounded_size; i++) 
  	{
! 	  *ip++ = ptrace (3, inferior_pid, rounded_addr, 0);
  	  rounded_addr += sizeof (int);
  	}
      } 
--- 1877,1883 ----
        ip = (int *)buf;
        for (i = 0; i < rounded_size; i++) 
  	{
! 	  *ip++ = ptrace (3, inferior_pid, (caddr_t)rounded_addr, 0);
  	  rounded_addr += sizeof (int);
  	}
      } 
***************
*** 1861,1866 ****
--- 1891,1900 ----
      }
    
    fpstatep = (struct fpstate *)(buf + skip);
+ # ifdef __386BSD__
+   /* not sure where to get exception status */
+   print_387_status (0, (struct env387 *)fpstatep);
+ #else
    print_387_status (fpstatep->status, (struct env387 *)fpstatep->state);
  #endif
  }

Ok, now I can see what happens when I get a SIGFPE.  Invariably the
exceptions shown are INVALID, LOS, and FSTACK, and the stack is shown
as empty.  The address of the last npx exception is often 0, but sometimes
it shows the address of an instruction that's actually in my code.

Ok, so why am I getting these exceptions?  I can think of several
reasons:

- Flakey hardware.  It's fairly new hardware, but there might be problems
  with it.  It's a 486DX, though, so there's less a motherboard manufacturer
  can goof up than if they were wiring a 386 and 387 together.
- Flakey code generation by gcc.  Well, when I see code like:

	fldl 16(%ebp)
	faddl 44(%ebp)
	fstpl 24(%ebp)
	fwait
	...

if I'm going to get an exception from the fstp (why? - the stack should
have something on it), shouldn't I get the exception at the fwait
or before (486 eip, not npx exception address), NOT dozens to hundreds 
of instructions later?  And what's wrong with this code, anyway?  There 
should be something on the stack.

- Flakey library code.  I suspected for a while that library code was
  doing "fninit" instructions when it shouldn't, but I think I have
  isolated the problem to exclude library code.

- Flakey OS floating point save/restore code.  I suspect this the most,
  but I haven't been able to prove it.  I've tried putting fwait 
  instructions around just about every floating-point instruction
  in the kernel, and nothing makes much difference.  (If you do an
  frstor, which starts saving the context, and yank the address space
  out from under the npx by re-loading control register 3 or doing
  a task switch, don't you NEED an fwait first?  There isn't one.  But 
  fixing it doesn't change the problem.)  I tried setting the 486 NE 
  bit in control register 0.  No change.  I'm still wondering what 
  those outb calls for ports 0xb1 and 0xf0 do.

- Flakey debugger.  If my fixes to "info float" don't do what I think
 they do, the problem might be something entirely different.

						Gordon L. Burditt
						sneaky.lonestar.org!gordon