*BSD News Article 2326


Return to BSD News archive

Newsgroups: comp.unix.bsd
Path: sserve!manuel!munnari.oz.au!uunet!gatech!destroyer!caen!hellgate.utah.edu!fcom.cc.utah.edu!gateway.univel.com!gateway.novell.com!ithaca!terry
From: terry@ithaca.npd.Novell.COM (Terry Lambert)
Subject: Satanic boot problem tracked to CMOS, wd.c
Message-ID: <1992Jul23.152046.13374@gateway.novell.com>
Keywords: 386bsd wd.c boot CMOS satan
Sender: terry@ithaca (Terry Lambert)
Nntp-Posting-Host: ithaca.eng.sandy.novell.com
Organization: Novell NPD -- Sandy, UT
Date: Thu, 23 Jul 1992 15:20:46 GMT
Lines: 126


	Well, curiousity got the better of me... and I found what I believe
to be *the* boot problem... well, several boot problems, actually.


	The magic file? usr/src/sys.386bsd/i386/i386/machdep.c!


1)	The value of 'maxmem' is global.  This should result in it being
	auto-initialized to 0, if the compiler is a compiler.  If either
	the 'biosbasemem' or 'biosextmem' is "invalid", then the value
	of maxmem is set by "maxmem = min (maxmem, 640/4);" to zero.
	This will result in 0, which is clearly incorrect, as the boot
	code is obviously running in RAM somewhere... besides, maxmem
	is calculated off 'Maxmem' directly after the if statement,
	blowing the value to 0-1, which puts us at 0xffffffff for our
	amount of memory.

	Correction:  First, this is incorrect; the value being set in
	the default case should be 'Maxmem', not 'maxmem'.  It is very
	arguable that the min of 0 and anything will be zero; why is
	the 'min()' function called at all in this case?  It is also
	arguable that a base memory of less than 640K is unable to boot
	386BSD, so the forced default should be 640K in the "bad CMOS"
	case.  If the machine actually has less than 640K, it will fail
	anyway; but if the thing *has* 640K, this will allow it to boot.

2)	If the amount of extended memory is not greater than 0, or the
	biosbasemem is not equal to 640, 'Maxmem' is *never* set.  This
	is the missing "not handled" case which would more correctly be
	the second "else".
	
	Correction: I suggest propmting the user for the amount of memory
	in the machine at this point, and jumping to just after the
	"#endif" for "NDDB" to avoid reiterating the boundry check code.


	I suspect that one of these two (fatal) cases are being triggered
by my CMOS having "incorrect" values.  There are several reasons this
might occur:

1)	The CMOS truly has "incorrect" values.  A diagnostic to this effect,
	along with what the values retrieved were, and a "Hit any key to
	continue" message immediately following the "degraded mode" message
	would greatly help debugging this.  This is, I believe, the case,
	although the reason the values are "incorrect" is that "_rtcin" is
	broken.

2)	The CMOS has the correct values, but the read of the CMOS fails
	due to timing; most likely, this is related to the reset rate of
	various items on my bus.  I suspect that the longest delay reset
	items, specifically the built-in bus mouse, are the most likely
	suspects if this is indeed the cause.  Again, the modified
	machdep.c would help me narrow this.


	The HP Vectra problems could easily be realted.  Dollars to donuts
says that my AT&T machines and the Vectra store their CMOS values in a
strange place, unexpected by BSDI.  There is code to the effect that
"probing breaks certain 386 AT relics"; I suppose *NOT* probing is the
cause of our problems.  I suspect that only one location is being used,
and that the entire memory is being listed there.  Again, without a boot
diagnostic with suffucuent delay, I have no way of telling.


	Additional notes on boundry conditions:

	I would suggest that the expression "maxmem = Maxmem - 1;" be
checked for a minimum and maxum bounds (it immediately follows the "if"
on line 876 of machdep.c).  This is more likely to be the intent of the
misuse of the "min()" expression for the first case of the "if".



	What I suspect: '_rtcin' in locore.s is broken.  Specifically,
it reads as follows:

		.globl  _rtcin
	_rtcin: movl    4(%esp),%eax
		outb    %al,$0x70
		subl    %eax,%eax       # clr eax
		inb     $0x71,%al       # Compaq SystemPro
		ret


	This should probably look like the following to guarantee that it
is more generic (and therefore more likely to work):

		.globl  _rtcin
	_rtcin: movl    4(%esp),%eax
		outb    %al,$0x70
		inb     $0x71,%al               # Compaq SystemPro/ATT/HP
		andl    $0x000000ff, %eax       # Fix big nasty bug
		ret


	I believe that the zeroing of eax is detrimental, and have removed
it; only a byte of the value returned is defined... the rest is undefined,
and is set by the setup program to whatever.


One of the reasons I need these fixes is to rebuild the kernel: the machine
386BSD currently runs on at Weber State University (1 whole box) has a
problem with both memory and disk space.  The machine I was doing file
system developement on 0.0 has been confiscated to teach NetWare classes on
(somewhat ironic, considering that I work for Novell), and this has brought
me to a halt.  My cross-compilation environment has died on a couple of the
new header files; I really can't justify the time to fix this until I have
386bsd up on at least one real box, and I can't get it up on a real box until
I have fixed binaries 8-(.


	Any of the partial (machdep.c) or full (locore.s) fixes suggested
on a dist.fs disk would be greatly appreciated!  I'm sure that this would be
very helpful in diagnosing the HP Vectra problem, if it didn't fix it
outright, and would certainly serve to expose a lot of internals students
to BSD as well as System V.


					Regards,
					Terry Lambert
					terry_lambert@gateway.novell.com
					terry@icarus.weber.edu
---
Disclaimer:  Any opinions in this posting are my own and not those of
my present or previous employers.