*BSD News Article 1827


Return to BSD News archive

Xref: sserve comp.unix.sysv386:21236 comp.unix.bsd:1862 comp.os.mach:1814 news.answers:1986
Path: sserve!manuel!munnari.oz.au!news.hawaii.edu!ames!haven.umd.edu!darwin.sura.net!jvnc.net!rutgers!cbmvax!snark!eric
From: eric@snark.thyrsus.com (Eric S. Raymond)
Newsgroups: comp.unix.sysv386,comp.unix.bsd,comp.os.mach,news.answers
Subject: Known Bugs in the USL UNIX distribution
Message-ID: <1h1yhK#1O2sD67m6NKP370phM0N6vbN=eric@snark.thyrsus.com>
Date: 6 Jul 92 15:41:39 GMT
Expires: 4 Oct 92 23:00:00 GMT
Sender: eric@snark.thyrsus.com (Eric S. Raymond)
Followup-To: comp.unix.sysv386
Lines: 500
Approved: news-answers-request@MIT.Edu

Archive-name: usl-bugs
Last-update: Mon Jul  5 11:49:45 EDT 1992
Version: 5.0

What's new in this issue:
   * More on the alleged suid-root core dump bug.
   * IEEE standard conformance problems in SVr4 C.

I. Introduction

This posting lists known bugs in System V Release 4 implementations, and known
fixes applied by various porting houses.  It was formerly part of the
386-buyers-faq issues 1.0 through 4.0, and is still best read in conjunction
with the pc-unix/software FAQ descended from that posting.

This document is maintained and periodically updated as a service to the net by
Eric S.  Raymond <eric@snark.thyrsus.com>, who began it for the very best
self-interested reason that he was in the market and didn't believe in plonking
down several grand without doing his homework first (no, I don't get paid for
this, though I have had a bunch of free software and hardware dumped on me as a
result of it!).  Corrections, updates, and all pertinent information are
welcomed at that address.

This posting is periodically broadcast to the USENET group comp.unix.sysv386
and to a list of vendor addresses.  If you are a vendor representative, please
check to make sure the information on your company is current and correct.  If
it is not, please email me a correction ASAP.  If you are a knowledgeable user
of any of these products, please send me a precis of your experiences for the
improvement of future issues.

The bug descriptions often include indications of fixes by the various porting
houses to their current releases.  These are:

Consensys UNIX Version 1.3			abbreviated as "Cons" below
Dell UNIX Issue 2.2				abbreviated as "Dell" below
Esix Revision A					abbreviated as "Esix" below
Micro Station Technology SVr4 UNIX		abbreviated as "MST" below
Microport System V Release 4.0 version 4	abbreviated as "uPort" below
UHC Version 3.6					abbreviated as "UHC" below
SCO Open DeskTop 1.1				abbreviated as "SCO" below

II. General Bugs

1. Dropout problems with tty devices
   The most serious problem anyone has reported is that the USL asy driver is
flaky and occasionally drops characters at above 4800 baud.
   Microport, Dell, Esix, and UHC say that they believe they've fixed this.
However, Dell, at least, was mistaken when they first made this claim; a more
detailed description of the problem is given below.  I have been assured that
this is on the fix list for the next Dell release.
   Bela Lubkin at SCO comments "386 interrupt latency vs. unbuffered UARTs.
This is a tough problem.  Nobody's driver should drop characters with a
turned-on 16550.  It's not so easy with a 16450.  Anyone with 16450s or lower
should be able to solve their problems by dropping in a 16550."

2. Suid programs dump core when signalled
   Mark Snitily of SGCS says that under many SVr4s, signalling a
process that is running suid root will cause it to core-dump.  He says
Dell and MST have fixed this, and SCO doesn't suffer from this.
   On the other hand, David Wexelblat writes "In Known Bugs in the USL Code,
regarding core dumps from signalling suid-root SVR4 programs, Microport does
not display this problem (either 3.1 or 4.1).  I have reason to believe that
Mark Snitily is incorrect about this.  I am almost positive that he was seeing
a bug in X386 (both 1.1 and 1.2) that we have fixed in X386 1.2E that caused
X386 to dump core if you tried to kill it while it was not on the active VT (we
have provided him the fix)."
   More data on this as it becomes available.

3. DMAs on large ISA machines may fail
   On ISA machines with more that 16MB of RAM, SVr4 may try to do DMA
from outside the bus's address space, causing serious problems.  UNIX ought
to do an in-memory copy to within the low 16MB but the USL base code doesn't.
   Dell says they've fixed this, and that's been confirmed by a user.
   UHC says they've fixed this; they add that the special buffer-allocation
logic to handle the problem can be turned off with a tunable kernel parameter
if you've got less than 16M.
   Microport says they've fixed this in their new 4.1 release, shipping early
March.
   Esix offers a patch to correct this problem.
   SCO used to have a similar bug but fixed it long ago.

4. There is a cylinder limit on disk size
   Stock USL code is limited to 1,024 cylinders per Winchester, which
might cause problems with some disk drives.
   Microport, Dell, Esix, MST, and UHC have fixed this.
   Bela Lubkin says "SCO's boot filesystem must lie below 1024 cylinder mark;
anything else can be anywhere.  This is more-or-less a limitation of the BIOS
interface that the bootstrap loader must use.  Could be circumvented by going
directly to controller hardware in the bootstrap loader, but that would be
horrendously complex with all the controllers & host adapters to be supported."
This limit probably applies to all other UNIXes as well.

5. shmat(2) vs. vfork(2)
   The shmat(2) call is known to interact bady with vfork(2).  Specifically,
if you attach a shared-memory segment, vfork(), and then the child releases
the segment, the parent loses it too!  Workaround; use fork(2).  UHC and
Microport both suspect that they still have this bug and opine that anyone
who uses vfork deserves to lose.  Dell has no plans to fix it.

6. X11R4 performance problem
   Stock X11R4 is said to hog the processor if you use the
LOCALCONNECT option.

7. UFS file system problems
   In stock USL 4.0.3, you can't use a UFS file system as the root; the
system hangs if you try.  Dell, Esix, Microport, MST and UNIX have fixed this.
   David Aitken, the UNIX product manager at UHC, writes "The ufs as root file
system [problem] was not really a bug, just a little oversight on USL's part -
we have fixed it completely by adding one line to the /stand/boot script:
rootfstype=ufs!"  He adds that they've been using ufs on their lab machines for
over 10 months with no trouble, and the latest UHC release defaults to ufs if
you have more than 120MB of disk.

8. A security hole in login
   David Wexelblat <dwex@mtgzfs3.att.com> reports: "There is a HUGE security
hole in /bin/login in all USL derived SVR4s before 4.0.4.  Refer to CERT
advisory CA-91:08, dated 5/23/91.  This is known to be present in AT&T SVR4
2.1, and Microport SVR4 3.1.  ESIX claims to have fixed it, Microport reports
that it is fixed in 4.1.  I won't give any more details unless necessary.
Suffice to say that this bug allows any non-privileged user on an SVR4 system
to get read-write access to any file on the system."

9. COFF problems with long filenames
   A source at Dell urges: "Our SVR4v2 did some stuff that USL didn't get
around to until SVR4v4.  Try Dell UNIX 2.1 with a COFF program on a large UFS
filesystem in a directory with long names.  Runs on Dell UNIX.  Breaks on
others."  I don't have more definite info yet.

10. Flakeouts in the Wangtek device driver
   Dell reports that USL's Wangtek device driver is seriously flaky.  "How'd
you like a multi volume backup where the second and subsequent volumes don't
follow on from the previous volumes?"  UHC confirms this and is actively
working on the problem.
   An anonymous SCOer says "The QIC02 tape controller `standard' is seriously
flaky.  Our driver's in pretty good shape but nobody will ever have a truly
solid driver that supports every QIC02 controller you can find."

11. A kernel declaration bug
    A botch in Dell's /etc/conf/pack.d/kernel/space.c (which is present in
Microport 4.0.3 and 4.0. 4 and may also be present in other SVr4s) can step on
the linesw[] table.  The problem is that the domain name array initialization
is wrong and too short; thus, when it's set, data past the end of the array can
be stomped.  To fix this, find the following near line 247:

	char srpc_domain[] = SRPC_DOMAIN;

and change it to

	char srpc_domain[256] = SRPC_DOMAIN;

then rebuild the kernel.  The value 256 is not magic; you just want to make
sure the array is sufficiently large to contain your domain name.

12. fread(3) does the wrong thing on pipes and FIFOs
   Ed Hall <edhall@rand.org> writes: "Unlike the raw read() system call,
fread() is supposed to be able to make several partial read's to satisfy the
data requested by its arguments.  The exceptions are an EOF or an error on the
stream.  This characteristic is quite useful when moving data through pipes or
over network connections, since partial reads are quite common in these cases.
Well, the version of fread() in ESIX 4.0.3 (and likely other Sys5R4's) only
does a single physical read, and if it only satifies part of the requested
number of bytes, that's all you get.  This can sting you even if you carefully
check the value returned by fread(), since the value returned is rounded down
to the number of complete "nitems" read, although your position in the stream
can be up to size-1 bytes beyond that point.  Neither ferror() nor feof()
indicate anything is wrong when this happens."
   This bug (which is also present in 4.0.4) is serious and nasty and should
be high on every porting house's list to fix.  It appears to be peculiar to
USL 4.0.3 and 4.0.4; 4.0.2 does *not* have it, nor does SCO.
   A USL source claims it has been fixed in 4.1.

13. Process accounting is broken
    In 4.0.3, process accounting doesn't work.  From examining the accounting
scripts, it appears that /usr/lib/acct/accton is supposed to set a return code
depending on whether accounting was switched on already or not.  However, it
always returns the same result - accounting switched off.  This means that the
/usr/lib/acct/ckpacct script, which is run every hour to keep the proccess
accounting log in check, instead turns off accounting the first time it is run
after booting.  The same happens with the nightly /usr/lib/acct/monacct
program.  I don't yet know whether this bug is present in 4.0.4.

III. SCSI Support Problems

1. sar is confuesed by SCSI
   Sar -d doesn't work on SCSI drives.  No report of any SVr4 having fixed
this yet.  SCO fixed it in 3.2.4.

2. A configuration problem
   Stock USL requires you to jumper your SCSI devices to fixed IDs
during installation (it can be changed to any other ID after).
   Dell and UHC have fixed this.  The requirement is definitely still present
in Esix.

3. Synchronous SCSI hang problem
   David Wexelblat <dwex@mtgzfs3.att.com> reports: "Stock SVR4.0.3 will hang
the SCSI bus with a 1542 in synchronous mode.  Dell fixed this, and this has
been given to Microport [ed note: Microport 4.0.4 fixed the problem; MST UNIX
and Esix 4.0.3 still have this problem; I have not yet been able to determine
if ESIX 4.0.4 does].  In the file /sbin/bcheckrc, change the line:

	echo MARK > /dev/rswap

to
		
	echo MARK | dd of=/dev/rswap bs=512 conv=sync > /dev/null 2>&1

The magic is apparently the conv=sync, which forces a 512 byte block
to be written.  The original echo writes 4 bytes, which apparently causes
synchronous SCSI to go out to lunch.

Now, you ask, how can I fix this, since the system won't boot?  There are
a couple of methods.  First, if possible, disable synchronous negotiation
(1542 jumper J5-1 removed, plus whatever you may need to do to your drive).
Then boot up, edit /sbin/bcheckrc, then shutdown, restrap for synchronous,
then reboot.  Everything should be OK.

That's the easy way.  Unfortunately, some hard drives will only work
in synchronous mode.  Well, you can still recover from this phenomenon.
Here's how:

        1) Install on your hard drive
        2) Boot from the first boot floppy.  When it tells you to, insert
           the second boot floppy.  At the first prompt, hit <DEL> to
           break out to a shell.
        3) Mount your hard drive under /mnt with the following command
           (replace FS-TYPE with s5, s52, or ufs, whichever you used for
           for your root partition):

                /etc/fs/FS-TYPE/mount /dev/dsk/c0t0d0s1 /mnt

        4) Now edit /mnt/sbin/bcheckrc:

                ed /mnt/sbin/bcheckrc

           You may want the 'ed' man page handy (I barely remember how to
           to use 'ed' :->).  For simplicity, you can delete/comment out
           the offending line, then replace it with the correct line later.
        5) Unmount the hard drive:

                umount /mnt

        6) Reboot from the hard drive.  Everything should come up OK. and
           you can finish editing /sbin/bcheckrc, if necessary.

Note that you perform these actions at your own risk.  The first version was
performed by me on Microport SVR4, and the second was performed by someone
else (on my suggestion) on ESIX SVR4."

IV. Development Tools Problems

1. General UCB library brokenness
   The BSD compatibility libraries were badly broken in USL code.  A Dell
source adds "That meant that almost all the apps derived from them were broken
too.  Most stuff like automount will die when you send a SIGHUP, instead of
rereading the map file.  You can get a system into very strange states when
that happens."  Esix and UHC's BSD libraries are USL stock.  I don't yet know
the status of other ports.  Microport has run into things they think may be
symptoms of this but have no fix yet.

   Ron Guilmette <rfg@ncd.com> writes "[Library lossage] may be easily
demonstrated by attempting to build and link the GNU C compiler with
`-L/usr/ucblib -lucb'.  The resulting compiler will most certainly
crash and die."

2. USL emulation of BSD signals doesn't work
   A different source reports that the the USL implementatation of BSD signals
is broken in both 4.0.3 and 4.0.4; in particular, the sigvec() family doesn't
work properly.  It is possible to make minor tweaks to source to make such apps
work properly with the native USL signals implementation.

   Here's more on the signals problem, thanks to Richard <rc@siesoft.co.uk>:
------------------------------------------------------------------------------
The problem is to do with the signal() function that is within the BSD
compatability libc. 

To reproduce the problem do the following:

#include <stdio.h>
#include <sys/types.h>
#include <signal.h>
#include <sys/siginfo.h>

main()
{
	signal(SIGPIPE,SIG_IGN);
	pause();
}

and compile it with cc xx.c -o xx /usr/ucblib/libucb.a

If you run the program and then signal it with a SIGPIPE, the program
will die, even though you've told it to ignore SIGPIPE.

The fix is difficult unless you've got source cos there's a missing 'else'
clause from the signal() code. This is the only signal fault I've found in
the BSD signal functions, details of the rumoured sigvec problem would be
useful?

If you're trying to compile an application you could change the application
code to do the following, this does work..

void
catch(s)
int	s;
{
	/* DO NOTHING */
	;
}

main()
{
	signal(SIGPIPE,catch);
	pause();
}

SUMMARY
You can only change a signal handler to a function handler, any number of
times.  Any attempt to set the handler to SIG_DFL, or SIG_IGN will fail.

This bug has given some people working with X11R5 aggro, causing the X server
to die when you close a client. 
------------------------------------------------------------------------------

3. Possible string library problems
   There are also persistent rumors of problems in the BSD-emulation string
libraries.  I have not been able to pin down specifics on this.

4. Compiler problems
   Ronald Guilmette <rfg@ncd.com> also reports the following:

------------------------------------------------------------------------------
/* Here is a bug in the original SVR4 C compiler (aka C Issue 5) which
   effectively prevents you from making good use of the `const' and
   `volatile' qualifiers defined by ANSI C in conjunction with pointer
   types and typedef statements.  Compile this code and you will get:

   "qualifiers.c", line 23: left operand must be modifiable lvalue: op "="

   ...if your copy of the svr4 C compiler still has the bug.  Note that
   given these declarations, the ANSI C standard say that the thing pointed
   to by the variable `pci' should be considered to be constant... not the
   variable `pci' itself.  (The GCC compiler, either version 1.x or version
   2.x, correctly compiles this example without complaint.)
*/

typedef const int *ptr_to_const_int;

ptr_to_const_int pci;

int i;

void main ()
{
  pci = &i;
}
------------------------------------------------------------------------------
/* Here is a subtle bug in the original SVR4 C compiler (aka C Issue 5)
   which prevents you from first declaring a tagged type (i.e. a struct
   type or a union type) in a parameter list, and then defining that tagged
   type later on within the same scope.  (Note that according to the ANSI C
   standard, the scope in which parameters get declared and the outermost
   block of a function body are one and the same scope.  Thus, this really
   is legal ANSI C code!)

   Try compiling this with your C compiler on SVR4.  If your compiler still
   has the bug, you will get:

   "tagged_type.c", line 24: warning: dubious tag declaration: struct S
   "tagged_type.c", line 28: warning: improper member use: i
   "tagged_type.c", line 28: warning: improper member use: i
   "tagged_type.c", line 31: warning: dubious tag declaration: struct S
   "tagged_type.c", line 35: warning: improper member use: i
   "tagged_type.c", line 35: warning: improper member use: i

   (The GCC compiler also had this bug in version 1.x, but it has been fixed
   in version 2.x.)
*/

void foobar1 (arg)		/* use old-style without prototypes */
    struct S *arg;
{
  struct S { int i; };		/* define the type `struct S' */

  arg->i = arg->i;		/* legal according to ANSI C rules! */
}

void foobar2 (struct S *arg)	/* use new-style with prototypes */
{
  struct S { int i; };		/* define the type `struct S' */

  arg->i = arg->i;		/* legal according to ANSI C rules! */
}
------------------------------------------------------------------------------
/* Here is a serious bug in the original SVR4 `dump' program which dumps
   out parts of object files in either plain hex form or symbolically.

   To see the `dump' program get a segfault and die, save this code under
   the name `dump-bug.c' and then do:

	cc -g -c dump-bug.c
	dump -v -D dump-bug.o

   The bug arises whenever `dump' tries to read Dwarf debugging information
   for an array of pointers to any "user defined" type (e.g. `struct S' in
   this example).  Past that point, `dump' is totally confused, so further
   Dwarf debugging information finally causes it to go belly-up.
*/

struct S { int i; };
struct S *array[10];
int j;
------------------------------------------------------------------------------
It appears that the svr4 C compiler (for x86 machines) doesn't conform real
well to either the letter or the spirit of the IEEE 754 floating-point
standard.  In particular, "unordered comparisons" and other operations on
NaNs don't always produce the result that that the IEEE 754 standard calls
for.
------------------------------------------------------------------------------

   Both 4.0.3 and 4.0.4 USL versions are missing the documented dial.h
file from their /usr/include directory.

V. The FUBYTE bug

(Thanks to Christoph Badura <bad@flatlin.ka.sub.org> for this info)

The kernel function fubyte() is documented to return a positive value when
given a valid user space address and -1 otherwise. In the latter case u.u_error
is set to EFAULT.  USL SysV R4.0.3 has a sign extension bug in the
implementation of fubyte() for local file descriptors (i.e. not opened via
RFS), which causes fubyte() to return negative values if the byte fetched has
its high bit set. This bug doesn't affect STREAMS drivers, as they don't call
(and in fact are normally unable to call) fubyte().  Thus writing a byte with
the high bit set to certain character device drivers returns with -1 and errno
set to EFAULT.

The bug may affect any character device driver that calls fubyte(). It's not
limited to serial card drivers. The bug is noticed most often with serial card
drivers, since uucp uses byte values > 127 very early during g-protocol setup
and drivers for serial cards tend to use fubyte() quite often.

Note also that the bug's effect is different if the driver checks for a -1
return value of fubyte() or just a negative one. In the former case it is
possible to pass bytes with the 8 bit set through fubyte(), except for 0xff
which is -1 in two's complement. That makes the bug more obscure.

The fix is easy.  First, make a backup copy of the kernel object file
/etc/conf/pack.d/kernel/vm.o!  A disassembly of vm.o(lfubyte) should reveal
*exactly* one mov[s]bl (move byte to long w/sign extend).  That one needs to be
patched into a movzbl (zero extend). The difference is one bit in the second
byte of the opcode.

The movsbl has the bit pattern 00001111 1011111w mod/rm-byte.
The movzbl has the bit pattern 00001111 1011011w mod/rm-byte.

The 'w' bit is 0 for the instruction in question. So the opcodes are 0f be and
0f b6. Here is the diff -c from dis -F lfubyte showing the patch applied to
the Dell 2.1 kernel:

*** vm.o	Mon Mar  9 00:31:38 1992
--- vm.o.org	Mon Mar  9 00:32:40 1992
***************
*** 22,28 ****
  	11c90:  85 c0                 testl  %eax,%eax
  	11c92:  75 09                 jne    0x9 <11c9d>
  	11c94:  8b 45 08              movl   8(%ebp),%eax
! 	11c97:  0f b6 00              movzbl (%eax),%eax
  	11c9a:  89 45 fc              movl   %eax,-4(%ebp)
  	11c9d:  c7 05 d8 13 00 00 00 00 00 00 movl   $0x0,0x13d8
  	11ca7:  83 3d dc 13 00 00 00  cmpl   $0x0,0x13dc
--- 22,28 ----
  	11c90:  85 c0                 testl  %eax,%eax
  	11c92:  75 09                 jne    0x9 <11c9d>
  	11c94:  8b 45 08              movl   8(%ebp),%eax
! 	11c97:  0f be 00              movsbl (%eax),%eax
  	11c9a:  89 45 fc              movl   %eax,-4(%ebp)
  	11c9d:  c7 05 d8 13 00 00 00 00 00 00 movl   $0x0,0x13d8
  	11ca7:  83 3d dc 13 00 00 00  cmpl   $0x0,0x13dc

Of course there is a workaround at the driver level.  Canonically, one would do
this by checking for fubyte() returning -1 *and* u.u_error being set to EFAULT
(u.u_error is cleared upon entering a system call).  However, in R4.0.3
fubyte() does NOT set u.u_error.  It *does* set u.u_fault_catch.fc_errno.

Cristoph reports that Dell V.4 can be object-patched successfully to fix this.
I do not know the status of the other ports.

Another poster (Marc Boucher <marc@cam.org>) adds:

On ESIX SVR4.0.3 Rev. A, the instruction movsbl in question can be changed to
movzbl (as described above) with a binary-editor on file
/etc/conf/pack.d/kernel/vm.o. At offset 0x11eb0, change 0xbe to 0xb6.

Before patching, verify that your /etc/conf/pack.d/kernel/vm.o is the same as
mine!  On my system, the /bin/sum generated checksum of vm.o was "4440 222".

The bug is still present in stock USL 4.0.4.  SCO doesn't have this problem,
which suggests it may be due to a compiler code generation error.
--
	Send your feedback to: Eric Raymond = eric@snark.thyrsus.com