*BSD News Article 46587


Return to BSD News archive

Path: sserve!newshost.anu.edu.au!harbinger.cc.monash.edu.au!simtel!lll-winken.llnl.gov!decwrl!svc.portal.com!news1.best.com!blob.best.net!not-for-mail
From: dillon@best.com (Matt Dillon)
Newsgroups: comp.unix.bsd.freebsd.misc
Subject: Re: two crash problems, anyone have any ideas? (similar experiences?)
Date: 8 Jul 1995 22:08:21 -0700
Organization: Best Internet Communications, Inc. (info@best.com)
Lines: 63
Distribution: world
Message-ID: <3tno85$3oo@blob.best.net>
References: <3tfjvp$hbe@blob.best.net> <3tnjan$35g@shell1.best.com>
NNTP-Posting-Host: blob.best.net

:In article <3tnjan$35g@shell1.best.com>,
:Russell Carter <rcarter@best.com> wrote:
:>In article <3tfjvp$hbe@blob.best.net>, Matt Dillon <dillon@best.com> wrote:
:>>Configuration:
:>>    
:>>    128M memory, 130+ users, three SCSI disks (barracudas), load averages
:>>    around 10, NCR PCI SCSI controller, Etherlink III (ISA) ethernet.
:>>    pentium-90.
:>>
:>>    FreeBSD 2.0.5-RELEASE-BEST (SHELL) #4: Thu Jun 29 01:57:18 PDT 1995
:>>    (with last set of patches patched in).
:>>
:>>Problem #1:
:>>
:>>    Heavily loaded machine is running along.  Then, for no good reason,
:>>    anything requiring disk I/O comes to a screaming halt... if I happen
:>>    to have a vmstat running, it continues to go, but attempting to do
:>>    anything (such as ^C or run a program from an existing shell prompt)
:>>    blocks forever.
:>>
:>>    The vmstat shows a large number of processes blocking, virtually none
:>>    running, and disk I/O going to zero.
:>>
:...
:>
:>Precise symptoms duplicatable on my lightly loaded P54C-100, 64 MB, ncr+cdrom
:>+st32550N+conner4326 DAT *WHENEVER* I try to backup 1GB+ of
:>stuff to the DAT *AND* the DAT is living inside the case.  I have watched
:>the tps/s go to zero using ncrcontrol, then the file systems just *ping*
:>vanish.  Any further access causes an input/output error.  My solution: pull
:>the DAT out of the case.  Your possible problem: a drive overheating?

    It's beginning to look like my problem #1 might be an out-of-date 
    loadable kernel module for NFS.

    I am crossing my fingers.  After talking with some of the FreeBSD
    gurus, we turned on the NFS option in the config and recompiled
    the kernel + most of the NFS utilities, and fixed a few bugs that
    popped up.  But, fortunately, these bugs generated panics and core dumps
    rather then just hang the machine so they were easy to track down.

    So far, so good.

    (I assume the bug fixes will be melded into the next set of patches).

    --

    Re: your tape problems.  At some point just before the 2.0.5 release
    we had similar problems, where a tape error would screw the whole
    SCSI subsystem.  However, it has not occured since and we do
    a regular backup every day of about 10 GB of material.  We use NCR
    controllers too.  In our case though we got a bunch of kernel
    printf()'s to console regarding SCSI tape errors just before the system
    would block on all SCSI ops.

    You may want to double check your SCSI termination.  I had a 
    problem on a linux box about a year ago and it turned out that my
    termination was screwed up and the DAT drive was for some reason
    sensitive enough to error out when the disk drives never did.

						-Matt