*BSD News Article 13275


Return to BSD News archive

Newsgroups: comp.os.386bsd.bugs
Path: sserve!newshost.anu.edu.au!munnari.oz.au!metro!ipso!runxtsa!bde
From: bde@runx.oz.au (Bruce Evans)
Subject: Re: Floating exceptions?
Message-ID: <1993Mar24.071239.17071@runx.oz.au>
Organization: RUNX Un*x Timeshare.  Sydney, Australia.
References: <f0XUP76@quack.kfu.com>
Date: Wed, 24 Mar 93 07:12:39 GMT
Lines: 193

In article <f0XUP76@quack.kfu.com> mrapple@quack.kfu.com (Nick Sayer) writes:
>0.2.2 running on a 486-50 with 16M RAM. I compiled xv_calctool
>(patchlevel 12). If I try and find ln(10000)/ln(10), it crashes with
>a floating point exception. The same code on a sun does not.

I think Sun (SPARC) systems use the IEEE default of all floating point
exceptions masked.  i386 Un*xes traditionally unmask the worst of the
exceptions, because too many programs don't bother to check for them.
386BSD does the same as the other i386 Unix*es here.  Someday I want to
use the IEEE default.  Maybe it's sufficient to check for exceptions in
exit().  But first the libraries have to be improved.

>If I try and get a stack-trace on the resulting core, it
>crashes in routines that haven't the slightest thing to do with
>floating point.
>
>Has anyone seen this behavior before? Might there be some delay between
>the occurance of the problem and the exception or something?

There are several delays.  First, the i387 delays reporting an error
until the next FPU instruction (an ISA h/w bug sometimes causes it to
report an error immmediately).  Second, a 386BSD kernel bug sometimes
delays the delivery of the signal about an error.  Third, looking at
things with a debugger tends to cause errors to be reported too early.

This is the README from my npx-0.4.tar.Z package where some of the bugs
are fixed.  npx-0.4 only works on 486's.

---
There are many bugs in floating point error handling in 386BSD-0.1.  Here
are my fixes for most of them.  I have tested them on a 486DX and (in a
slightly different form) on a 386 (no 387).  How well the fixes work
depends on the system:

486DX:
	Floating point error handling now uses exception 16 instead of
	IRQ13 to report errors (the method is reported at boot time).
	Exception 16 is designed correctly so it is possible for the
	kernel to get everything right.
486SX:
	?
386/387:
	IRQ13's at inappropriate times are now detoxified.  FP errors are
	still sometimes reported early at unpredictable times (after the
	kernel preempts the process) and at predictable times (after the
	usual program executes certain unusual FP instructions, and when
	it gives up control to a debugger).
386/287:
	?
All h/w:
	Context switching and exit() now never clobber the FP context.
	SIGFPE's are now delivered as soon as possible.
Emuluator:
	Still lacks error handling.  It now needs to handle fwait but
	doesn't.

-----
Files
-----

README:
	o This file.
fpetest.c:
	o Test program.  Run as "fpetest -z" to see the options.  To stress
	  the system, run several copies concurrently.  This will crash
	  386BSD-0.1 eventually.  To demonstrate the exit() bug in
	  386BSD-0.1, run the program "double x; main() { x = x + 1; }" in
	  a shell loop concurrently.  This might crash the system.  After
	  applying the patches, run the tests overnight.  This should not
	  crash the system.
npx.diff:
	o Patches.  Apply using "cd /; patch -p <somewhere/npx.diff"
	  or by editing out the pieces that apply to the individual
	  directories (/sys/i386/i386, /sys/i386/include and /sys/i386/isa)
	  and working in each directory separately.
	o All patches are relative to the 386BSD-0.1 distribution except
	  the one for machdep.c.  The patch for machdep.c is small and
	  unimportant and should work anyway.
npx.c:
	o Complete replacement for /sys/i386/isa/npx.c.  Since the asm is
	  now written correctly, it should work with gcc-2.
test.486.ex16:
	o Output from running "fpetest" on a 486DX using exception 16 error
	  reporting (after these patches have been applied).  Bit 0x0008
	  (CR0_TS) in the machine status word may vary.
test.486.irq13:
	o Output from running "fpetest" on a 486DX using IRQ13 error
	  reporting (after these patches have been applied and the
	  exception 16 initialization in npx.c has been deleted).

-------
Changes
-------

/sys/i386/conf/Makefile.i386:
	o It now depends on machine/specialreg.h and it always depended
	  on $S/net/netisr.h.  The patch is not included here.  (mkdep
	  needs to be fixed to handle asm files and to handle the -p
	  option propery for genassym.c.)
/sys/i386/i386/locore.s:
	o Avoid any bogus IRQ13 from fnsave.
	o Update FP flags in pcb to reflect the fact that fnsave clobbers
	  the state.  I don't know how 0.1 worked without this.  In 0.0,
	  context switches sometimes clobbered the state.
	o Clear npxproc when it becomes invalid.  This is required at least
	  for the new checks in npx.c.
	o Fully upport the FPU exception (#16).  Have to handle it like
	  IRQ13 except for IRQ stuff.
	o Finish traps and syscalls with doreti() instead of spl0() to handle
	  AST's.  This is required to handle asynchronous signals ASAP when
	  they occur in kernel mode.  Even (bogus) IRQ13's can occur while
	  in kernel mode.  npxdna() allows them because it is too much
	  trouble to stop them, and they can be nested in the trace trap
	  handler.  spl0() cannot do enough because the stack frame is
	  inconvenient.  
/usr/src/sys/i386/i386/machdep.c:
	o CR0_TS is now used for emulation, not CR0_EM.  Actually, the
	  changed line should be deleted.  If we have NPX, then npxinit()
	  will do the work.  We may as well have NPX if we have math
	  emulation since the h/w support is small compared with the
	  emulator.
/sys/i386/i386/vm_machdep.c:
	o _Completely_ free the coprocessor when we are done with it.
	  Without the fix, another process may inherit the exiting process's
	  FP state, and npxintr may use a NULL pointer (npxproc).
/sys/i386/include/npx.h:
	o The "standard" npx control words are all braindamaged.
/sys/i386/include/specialreg.h:
	o Fix comments (CR0_EM isn't for npx emulation!).
	o Add some defines for 486 (npx uses only CR0_NE).
/sys/i386/isa/icu.s:
	o Support aston() by checking astpending in doreti().  The changes
	  to locore.s cause doreti() to be called early enough for signals
	  to be delivered ASAP.  The change to icu.s alone is sufficient
	  to fix itimers (they used to have about 10 Hz precision instead
	  of 100 Hz).
/sys/i386/isa/isa.c:
	o Utility routine for probing isa interrupts.
/sys/i386/isa/npx.c:
	o Cleaned up inline asm.
	o Probe for exception 16 working and IRQ13 not working.  Use
	  exception 16 if possible.  It should always work on 486DX's.
	  I doubt it will work on 386's (ISA probably requires it to
	  be broken).  I don't know what happens on 486SX's.
	o Set CR0_EM and toggle CR0_TS for emulation.  CR0_EM is no good
	  for emulating an x87 because it doesn't trap fwait's.  The
	  emulator needs to be fixed to handle fwait's.  Now it botches
	  even the decoding of them.
	o Fixed order of initialization.
	o Fixed npxintr() and npxdna() to handle nested interrupts.

---
Etc
---

The library has a lot more foating point bugs.  I have fixed the following.
The fixes are not included here.

/usr/src/include/math.h:
	o Stop gcc from crashing when it tries to compile HUGE_VAL.  The
	  crash is due to bugs in the library atof and in the kernel's
	  floating point error handling.
/usr/src/lib/libc/i386/gen/fixdfsi.s:
	o Fix to round towards 0 as specified by ANSI.
/usr/src/lib/libm/common_source/pow.c:
	o Avoid overflow bug (the patch was botched for 0.1).
	o Use volatile variable to stop gcc from optimizing away calculations
	  that are being made for their side effects on the FPU exception
	  flags.  (This stuff is broken in other ways but...)
/sys/i386/include/float.h:
	o DBL_MAX was too large and might overflow (actually it doesn't).


I have not fixed the following library bugs.

/usr/src/lib/libc/i386/gen/fixunsdfsi.s:
	o Same bug as for fixdfsi.  Someone posted fixed versions of both.
/usr/src/lib/libc/i386/stdlib/atof.c:
	o atof() is inaccurate and allows overflow exceptions.
/usr/src/lib/libc/stdio/vfprintf.c:
	o Inaccurate.
/usr/src/lib/libc/stdio/vfscanf.c:
	o scanf uses atof() so it's broken too.
/usr/src/lib/libm/*.
	o STDC functions aren't STDC conformant (they allow exceptions, at
	  least with the current FP control word, and don't set errno).
/usr/libexec/cc1:
	o gcc uses atof() so it's broken too.
binaries:
	Damaged FP constants may have been compiled into a lot of programs.
---
-- 
Bruce Evans  bde@runx.oz.au