*BSD News Article 70618


Return to BSD News archive

Path: euryale.cc.adfa.oz.au!newshost.anu.edu.au!harbinger.cc.monash.edu.au!news.rmit.EDU.AU!news.unimelb.EDU.AU!munnari.OZ.AU!news.mel.connect.com.au!news.mira.net.au!news.vbc.net!samba.rahul.net!rahul.net!a2i!olivea!charnel.ecst.csuchico.edu!psgrain!usenet.eel.ufl.edu!newsfeed.internetmci.com!in2.uu.net!news.dca.net!dca.net!awhite
From: Andrew White <awhite@dca.net>
Newsgroups: comp.unix.bsd.bsdi.misc
Subject: FIX (?!): BusLogic BT-946C firmware problem
Date: Tue, 11 Jun 1996 04:49:33 -0400
Organization: DCANet - Delaware Common Access Network
Lines: 154
Message-ID: <Pine.BSI.3.91.960611044017.2184A-100000-100000@dca.net>
NNTP-Posting-Host: dca.net
Mime-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
To: bsdi-users@bsdi.com


SUMMARY: 
  The BusLogic PCI SCSI controller (BT-946C) has firmware problems
  that cause irrecoverable hangs on the SCSI bus under BSD/OS 2.x.
  This message outlines migrating from the BT-946C to another
  SCSI controller without reformatting the disk.

I am sending this message out to the BSDI-users mailing list and to
comp.unix.bsd.bsdi.misc in the hopes that it saves someone a lot of time and
trouble  -- I certainly found this to be a vexing problem.

The problem I experienced was that one of our BSD/OS servers, running
BSD/OS 2.01 at the time, would suddenly freeze (up to 5 times per day,
ouch!).  Shortly thereafter, I would see kernel messages on the console
which repeated every thirty seconds or so.  The messages were of the form: 

    bha0: command timed out for XX seconds

The only way to recover from this problem was a cold reboot.  Donn Sealy of
BSDI posted this explanation in January 1994 as to what causes this type of
error:

--
From donn@BSDI.COM  Thu Jan 20 08:26:18 1994
Date: Thu, 20 Jan 94 13:26:18 -0500
From: donn@BSDI.COM (Donn Seeley)
Message-Id: <9401201826.AA05857@BSDI.COM>
To: gks1!greg%ucdavis.edugks1!greg@ucdavis.edu, bsdi-users@bsdi.com
Subject: Re: aha0: command timed outRe: aha0: command timed out
Cc: bsdi-users@BSDI.COM
Status: OR

Currently the aha driver performs the following operations if a command
times out:

        detect stale command by checking timestamps
        issue a host adapter abort for the given command
        receive notification of the aborted command
        deliver notification to the machine-independent SCSI disk code

The machine-independent SCSI disk (sd) code does the following:

        observe an error
        call sderror() to print an error message
        use return value from sderror to classify the error
        if it's a retryable error and we haven't exceeded the retry count,
                retry the command
        otherwise return an EIO error

This procedure usually works fine if the problem lies with the SCSI
target rather than the SCSI host adapter.  If the host adapter
itself is completely wedged, then we never receive notification of
the aborted command, and we loop waiting for the host adapter to
acknowledge an abort.  At this stage, we could try a host adapter
reset if we notice that the host adapter is being uncooperative,
but experience says that this doesn't work -- the only cure is an
ISA bus reset (that is, a reboot).  If the host adapter is not a
critical component in your system, I suppose the driver could just
mark the adapter as 'dead' and continue.  Unfortunately most people
have their root disk on the host adapter and they really do need
to reboot when the timeout message cycles.  It's difficult to come
up with a good strategy for dealing with broken host adapters
because the possibilities for breakage are quite large and the
number of examples is quite small.

Ideally host adapters would never wedge :-),

Donn
--

In this case, the problem was indeed a "wedged" adapter.  Calls to BSDI tech
support confirmed that there exists a firmware problem with the Buslogic
(which Buslogic tech support naturally denies).  Apparently, the machine
which hosts BSDI's web site, www.bsdi.com, was switched from a BusLogic PCI
adapter to an NCR PCI controller because of this same problem!

The "fix" is to replace the SCSI adapter.  I decided to upgrade to BSD/OS
2.1 and use an Adaptec AHA-2940 PCI adapter, although I could have just as
easily used one of the many NCR PCI controllers as well.  

The problem with switching adapters for me was that I'd chosen to use aha
boot blocks, which only work with BusLogic and Adaptec 15XX-ISA controllers. 
I also did not have an FDISK table on my boot disk, which is apparently
required for the AHA-2940.

I followed the following procedure to create an FDISK table and write BIOS
boot blocks to my boot disk (and maintain my existing filesystems/data),
which allowed me to boot from the Adaptec controller.  I compiled this
procedure via several conversations with BSDI tech support and from a post
to bsdi-users by Paul Bonman of BSDI.  Please note that while this
procedure worked for me, it may not work for you -- please call BSDI tech
support (and not me!) if you run into problems using it. 

  1. Print the current config of your boot disk with this command 
     for reference before beginning:

     disksetup sd0

  2. Format a floppy to save the first 16 sectors of the boot disk
     to floppy, in case things get really messed up!

     fdformat /dev/rfd0c floppy
     dd if=/dev/rsd0c of=/dev/rfd0c bs=512 count=16

  3. Enter single user mode ("shutdown now"), and use disksetup -i
     to create the FDISK table etc.  Verbatim from Paul Borman's
     post to bsdi-users:

        Run disksetup -i to begin writing FDISK table.  When prompted,
        say you have coresidency.  Say that BSD/OS and DOS is *not* 
        your setup.  Once you get into the FDISK screen, add exactly
        one partition that starts at 0 and is the whole disk of 
        type BSDI, make sure you mark it active!  Use your old 
        BSD disklabel when asked (it should still work).  Install 
        bootany.sys, but when asked if BSD is bootable, say NO!!! 
        Install the appropriate boot blocks (almost always the 
        bios bootblocks these days) and write everything out.  
        The 2.1 version of bootany will not ask any questions and 
        should boot BSD directly.  If you had said "YES" that BSD 
        was bootable it would have prompted you to press <F1>.  
        See bootany(8) for more information about bootany. 

     This procedure worked fine for me (thanks Paul!).  Note that
     I had to go into my CMOS setup and into the setup of my
     BusLogic and Adaptec controllers to enable "Large disk access
     mode for >1GB disks (DOS only)" in order to have the geometry
     for BIOS and BSDI agree about the size of the disk.

     Also note that after following the above procedure, I could
     no longer boot from the BT-946C controller, but I could boot
     from the Adaptec 2940.


Note that if you need to restore the partition table, boot records,
etc. you can use the floppy created in step 2 to restore this
information to sd0.  First you need to un-write protect this
area of the disk

     disksetup -W sd0

Then use dd to restore the information from the floppy.  

      dd if=/dev/rfd0c of=/dev/rsd0c bs=512 count=16


Be careful out there folks!

-Andrew White

Andrew White             | DCANet: Internet Access for the Delaware Valley
andrew@dca.net           | Offering dialup, ISDN, and dedicated Internet access
(302) 654-1019           | in the 215/302/610 area codes.  
http://andrew.white.org/ | e-mail: info@dca.net  web: http://www.dca.net/