*BSD News Article 59309

Path: euryale.cc.adfa.oz.au!newshost.anu.edu.au!harbinger.cc.monash.edu.au!news.mel.connect.com.au!munnari.OZ.AU!news.ecn.uoknor.edu!paladin.american.edu!europa.chnt.gtegsc.com!gatech!news.mathworks.com!fu-berlin.de!news.belwue.de!news.uni-stuttgart.de!news.rhrz.uni-bonn.de!RRZ.Uni-Koeln.DE!zpr.uni-koeln.de!se
From: se@ZPR.Uni-Koeln.DE (Stefan Esser)
Newsgroups: comp.unix.bsd.freebsd.misc
Subject: Re: NCR810 SCSI problems under 2.1.0
Date: 17 Jan 1996 18:41:32 GMT
Organization: Institute for Mathematics, University of Cologne, Germany
Lines: 140
Sender: se@Sysiphos (Stefan Esser)
Message-ID: <4djfss$68c@news.rrz.uni-koeln.de>
References: <4dbr3s$oaa@neptune.niwa.cri.nz>
NNTP-Posting-Host: sysiphos.mi.uni-koeln.de
To: wdk@frc.niwa.cri.nz (Wayne Knowles)

In article <4dbr3s$oaa@neptune.niwa.cri.nz>, wdk@frc.niwa.cri.nz (Wayne Knowles) writes:
|> 
|> I have recently upgraded my old 486 DX 33 to a Pentium, and went with
|> the NCR810 PCI SCSI Controller cuz it was cheap, fast, nothing on the
|> board to go wrong, and everyone else running FreeBSD uses them.  Well
|> I have been having major problems getting the NCR810 board to work.

Sorry to hear that ...

|> As an act of desperation, I installed Windows NT 3.51 on the same machine,
|> and the NCR810 performed without fault, which makes me beleive it is not
|> a hardware fault.

That's not a proof. Win/NT is different, but  
Win95 seems to have only one outstanding disk
access at any time, according to some article
I read recently.

|> Any advice would be appreciated, otherwise I may have to buy an expensive
|> Adaptec controller :-(

Well, I'm afraid you'd be disappointed ...
(But if you try the Adaptec, please let me know what you find ...)

|> Platform:
|>    FreeBSD 2.1.0
|>    100MHz Pentium, 16MB RAM, AMI BIOS with SMDS Support
|>    Diamond Stealth 64 Video VRAM (PCI)
|>    "No Name" NCR 810 PCI SCSI Controller
|>    Adaptec 1542CF
|>    Toshibe 3401B SCSI CDROM
|>    Archive Viper 150MB Tape
|>    2 x Seagate ST3610N
|>    1 x DEC DSP 3105S

From this list, it is not obvious which devices
are connected to the NCR, and which to the AH1542.
If all were on the NCR's SCSI bus, then you might
have too long a cable to reliably use 10MHz transfers.
(I've found 1.5m to be the maximum in that case, 
with active terminators on both ends of the cable !

|> Problem:
|>    SCSI Controller dies after a period of activity.  Appears to happen
|>    more with Reads than with writes.
|>    Reading from the Archive Viper tape will cause the problem almost
|>    instantly.
|> 
|>    I understand that the NCR 810 is fussy about termination and parity
|>    and have checked them several times.  To be sure of it I have
|>    connected all but one hard disk onto my old Adaptec 1542CF controller.

Ok. This answers my above question ...

(BTW: Your bug report is of premium quality. It 
really contains all information I might imagine 
to have asked for :-)

|>    The following error occurs with only the DEC DSP3105 SCSI Disk connected
|>    to the NCR Board, Parity Enabled & Terminated.  I have also tried with
|>    an Active Terminator on the SCSI Bus, but get the same error.
|> 
|>      ncr0:2: ERROR (a0:10) (1-a3-0) (8/13) @ (e84:19000000).

Ok. Let's see: 

dstat=a0: DMA Fifo Empty + Bus Fault
sist =10: Reselected by another device

Well, this looks a lot like a PCI bus problem ...

Last time I had somebody report a Bus Fault, it 
was completely solved by disabling PCI burst mode
and PCI buffers ...

I do not expect the Triton to be that buggy (and 
in fact a lot of people seem to have no problem 
using the NCR with a Triton), but the Bus Fault 
indicates, that the NCR tried to access memory
through the Triton chip set, and didn't succeed.

|> 	     script cmd = 89030000

The NCR did indeed fail in the data_in code. It 
just started a new scatter/gather segment, but
either got the wrong address from the segment
table, or failed doing the DMA write. Below is
the corresponding peace of NCR code (this is not 
the official NCR SCRIPTS syntax).

0x19000000:	SCR_MOVE_TBL ^ SCR_DATA_IN,
0x????????:		offsetof (struct dsb, data[ i]),
0x89030000:	SCR_CALL ^ IFFALSE (WHEN (SCR_DATA_IN)),
0x????????:		PADDR (checkatn),

|>    The command being executed (19000000) is always the same, althougth the
|>    e84 value does change.

The above four lines are repeated for each of
33 possible scatter/gather segments. 

|>    Setting the BIOS to the Failsafe options (Pipeline Burst disabled etc)
|>    gives exactly the same error - 

Well, should have read on before suggesting this
might be worth a try ... :)

This leaves cache and memory as possible causes
for the bus fault ...

|> Extra Information
|> 
|> Kernel messages (Booted with -v option)

Thanks! 
Everything seems to be configured correctly.


There is one more thing you may want to try:

Use "ncrcontrol" to disable tagged command 
queues:

# ncrcontrol -s tags=0

Even if I think you are suffering from a 
harware problem, it might be a useful test.


I've been using the 530MB version of your 
drive (the DSP3053) with tags enabled as
the system drive of my system, so I'd expect
it to work fine...

Regards, STefan
-- 
 Stefan Esser, Zentrum fuer Paralleles Rechnen		Tel:	+49 221 4706021
 Universitaet zu Koeln, Weyertal 80, 50931 Koeln	FAX:	+49 221 4705160
 ==============================================================================
 http://www.zpr.uni-koeln.de/~se			  <se@ZPR.Uni-Koeln.DE>