*BSD News Article 4957

Newsgroups: comp.unix.bsd
Path: sserve!manuel!munnari.oz.au!spool.mu.edu!agate!dog.ee.lbl.gov!hellgate.utah.edu!fcom.cc.utah.edu!cs.weber.edu!terry
From: terry@cs.weber.edu (A Wizard of Earth C)
Subject: Re: Program dies with FP Exception
Message-ID: <1992Sep13.083846.6134@fcom.cc.utah.edu>
Sender: news@fcom.cc.utah.edu
Organization: Weber State University  (Ogden, UT)
References: <STARK.92Sep13002650@sbstark.cs.sunysb.edu>
Date: Sun, 13 Sep 92 08:38:46 GMT
Lines: 67

In article <STARK.92Sep13002650@sbstark.cs.sunysb.edu> stark@cs.sunysb.edu (Gene Stark) writes:
>Here's a tough one I've been trying to track down -- maybe somebody out there
>who knows more can guess what is going on.
>
>I am running 386BSD on a 486/33 system with 4MB RAM and a 210MB Connor IDE
>drive.  A program I was working on dies on Signal 8 (Floating point exception)
>in a perfectly repeatable fashion.  It is not so easy to tell where the
>exception actually comes from, though, because the signal seems to be getting
>delivered to the process much later, when it is leaving the system after
>a call to "write".  I haven't been able to get a small test program that
>repeats the bug, however there seem to be several crucial elements involved:
>
>	(1)  A call to "atof", which returns a double that is then
>		stored in a temporary on the stack.  Removing the call
>		removes the error.
>
>	(2)  The actual magnitude of the number being converted by "atof".
>		I found that the string "1e10" and "1e12" cause the error,
>		but "1e9", "1e6", and "0.0" do not.
>
>	(3)  Some later "write" system calls.  The signal is actually
>		delivered on the fourth call to write after the atof.
>		What is happening in the interim is just C code without
>		any other system calls.  I do not know what causes the
>		signal to get delivered when it actually does.

First of all, like all other signals, the SIGFPE gets delivered to a process
as a result of the sigtrampoline code.  The *only* way you get a signal is
on return from a system call.  The problem is that there appears to be no
code in the library which forces a check for the exception *immediately*
after the floating point function call.  This is aggravated by the fact
that GCC likes to in-line 386 floating point (from what little experimentation
I've done).  This has the effect of defeating any fixes made at the library
level to hit the sigtrampoline code to check for an exception.

Second, are you using a real FPU, or are you using the emulation?  I know
that I *could* try it myself, but I prefer to arrive at an expected answer
before experimenting (I guess my physics background shows).

Third, you were aware that for a 16 bit value to be multiplied/divided, you
have to have a 32 bit area to receive the value, and for a 32 bit, you have
to have a 64 bit receiver?  Perhaps you are truly getting an exception.

Fourth, I believe that the math stuff is actually not being done at the
highest floating point resoloution (I read this in the newgroup here, so
I could be totally wrong 8-)).  This would lend credence to the idea that
you are actually getting an exception.

Fifth, there is a well known problem that causes 'ps' to die with the same
exception -- the problem occurs when you have a double lvalue and assign it
to an undeclared (int) rvalued function.  Are you sure that atof() is declared
extern double somewhere?

Hope this helps narrow the problem.


					Terry Lambert
					terry_lambert@gateway.novell.com
					terry@icarus.weber.edu
---
Any opinions in this posting are my own and not those of my present
or previous employers.
-- 
-------------------------------------------------------------------------------
                                        "I have an 8 user poetic license" - me
 Get the 386bsd FAQ from agate.berkeley.edu:/pub/386BSD/386bsd-0.1/unofficial
-------------------------------------------------------------------------------