*BSD News Article 9342

Received: by minnie.vk1xwt.ampr.org with NNTP
	id AA5605 ; Fri, 01 Jan 93 01:50:25 EST
Newsgroups: comp.unix.bsd
Path: sserve!manuel.anu.edu.au!munnari.oz.au!spool.mu.edu!darwin.sura.net!bogus.sura.net!pandora.pix.com!stripes
From: stripes@pix.com (Josh Osborne)
Subject: Re: S3 question - Amancio, are you there?
Message-ID: <Bzy9wD.9Ez@pix.com>
Sender: news@pix.com (The News Subsystem)
Nntp-Posting-Host: pandora.pix.com
Organization: Pix Technologies -- The company with no adult supervision
References: <VIXIE.92Dec26034105@cognition.pa.dec.com> <1992Dec27.081525.29228@netcom.com>
Date: Mon, 28 Dec 1992 03:33:47 GMT
Lines: 178

In article <1992Dec27.081525.29228@netcom.com> hasty@netcom.com (Amancio Hasty Jr) writes:
>In article <VIXIE.92Dec26034105@cognition.pa.dec.com> vixie@pa.dec.com (Paul A Vixie) writes:
[...]
>>I see that the two greatest bit-bangers of the average computer are available
>>as VESA cards: display, and disk.  I'm still formulating my disk controller
>>questions and perhaps I'll ask them in a future post.  Right now I'm trying
>>to solve the S3 mystery.

One problem with VESA LB and disk drives, (I think) VESA LB doesn't allow
bus mastering cards.  For SCSI (at least) this could be quite useful.  Of
corse with current tech disk drives you need 3 fast disks running at once
to use all the ISA bus.  Or you need (say, IDE) controlers with cache on them,
but it would be better to have a auto-sizing disk cache in main memory (like
SunOS, or Linux), because it would be (a) faster, and (b) useable as core if
thats more useful then disk cache, (c) you know if it is flushed to disk 
or not.

>>At work I have a EISA/SVGA/34020 board.  It is very fast when run under
>>Windows 3.1; however, Microsoft had access to the 34020 specs and I don't,
>>so I can't figure out how to port the X server to it and noone in this
>>newsgroup seems to have done that either.  It's too bad -- a 34020 with
>>a minimal BITBLT interpreter downloaded into it would make for a lightening
>>fast X11 server with the 34020 as almost a co-processor.  However, I'm
>>fairly sure that the 34020's days are numbered given something called "S3"
>>and the "GUI Accelerator" that seem to be taking the market by storm.

The 34020 docs are available from TI, I have a set somewhere.  The cross 
compiler is quite expensiave, and the old version makes poor code.  Someone
got a old gcc to work (more or less) with it.  The 34020 is fairly quick,
I would like to see a 34020 running X on it :-)  (I know it would be faster
to do most of the X stuff on the [34]86 and let the TI bang bits).

The GUI accel's are doing better then the 34020 cards because they are cheap,
however I think you can build a 34020 card as cheap as a S3, but nobody has.

>>I know that SVGA is more or less a hack on the IBM VGA spec to allow more
>>pixels; what I don't know is what an "SVGA S3" is.  I have gathered from
>>context in posts on this newsgroup that it is some kind of graphics
>>accelerator chipset and that there are several different revisions of
>>it and that different board manufacturers have had different results.
>>Yet, VGA is fundamentally a frame buffer that has some hardware assist
>>for certain operations.  Where does S3 fit in?  Is it another IO port, or
>>just more opcodes to the existing VGA IO port?  Or just a faster implementation
>>of the VGA spec?

This is answered well below, but I thought I would point out that:
 * VGA only allows 64K of the video memory to be mapped into the PC's addr space
 at once
 * Most SVGAs allow 128K at once, normally 2 64K windows.
 * Some more useful, but more disgusting ways of viewing video memory are also
 available.
 * A small number of SVGA chipsets can map all of the video memory into the
 PC, but I don't know if the video cards can do it.  The 386BSD kernal will
 need to be wacked to make it work anyway.
 * The S3 adds a bunch of IO addrs on top of a normal looking SVGA chipset.

>>There are two reasons I need to know this.  First, if the VGA really is "just
>>a frame buffer", then given a fast CPU and VESA it should be trivial to get
>>the MIT CFB server running and have it run near the theoretical maximum
>>(though at some potentially unneccessary cost in main CPU cycles).  If on
>>the other hand VGA is like EGA in that you can only map certain parts into
>>memory at a time and it's generally cheaper to send high-level commands and
>>let the graphics hardware figure out how to achieve them, then I see a
>>problem.

In genneral you can only map part of the video memory at a time.

>>What problem?  Well, DEC did this really neat thing called the "Dragon" chip
>>set back on their MicroVAX II/GPX.  It was really really fast -- if you wrote
>>your application in FORTRAN on VMS.  On the other hand if you ran under X11,
>>things ran doggishly slow and the visual results were often less than perfect.
>>This is because the _only_ way to talk to a Dragon is in high-level op-codes,
>>and the model X11 lived in was incompatible with the one the Dragon used --
>>so achieving one X11 operation often took several, or hundreds, of Dragon
>>operations.  Since the Dragon's speed came from its economy of scale, the
>>speed was less than amazing.

I don't know much about the dragon (is that the hardware made out of N 
vipers?), but the S3, Mach8, Mach32, and even the 8514/a (or whatever it is)
have accel for short line segments which I think match up quite well with
the MI code in DDX's use of "spans" (not 100% short lines have limited length,
spans do not), so even when the exact graphics command X wants is not supported
by the hardware, this is (and should be faster then just pushing bits onto
a dumb buffer, except for really small spans).

[...]
>>So here comes S3.  Is it the salvation to all the world's woes?  That depends.
>>Given VESA, one can access the VGA's "array" at memory speed (barring refresh
>>stalls -- that whole thing isn't dual-ported, is it?).  Is that enough?  Or,
>>if not, is it the S3 that gives one the extra performance and/or op-codes that
>>make X11 sing?  And, if that last is true, why isn't an S3 on EISA or even ISA
>>"fast enough" ?

I *think* (someone *please* correct me if I am wrong!) most of the numbers
(even the 70k+ ones) were with ISA S3 cards (they may have been in a EISA
system 'tho).

[...now the Hasty-miester speekith...]
>The image write, read and fill operations' performance was increased by
>using vga banking.We experienced a 10x performance improvement when 
>we switched to vga banking. In the 8514/a architecture, all data transfer
>between the cpu and co-processor is done via the data transfer register.
>Also, we have to transfer the images a line at time inside a loop.
>If there is one area in which the S3 architecture suffers this is it!
>Ideally, I would like to see the chip do dma transfers from memory
>to the card and have it calculate the offsets into its memory and 
>the logical converse - have the chip  transfer a block of memory
>to consecutive region in the hosts memory.

How about XCopyPlane (in XOR mode)?  I don't have a S3 card (yet), but thats
the single most important thing for my application...

[...]
>The 801/805 and 928 architectures are capable of mapping their entire video
>memory to the host's address space. Currently, we only map 64k bytes at a
>time. This limitation is mostly imposed to us by the kernel!

Can the video cards do this?  I assume the problem w/ the kernal is allocating
physicly contigous RAM?  The best way to do this is add a new flag to the
memmory allocator.  The simplest way is to have the device probe allocate the
VM you need during boot when most allocations will be contigous, confirm that
is _is_ contigous and go on...

>Further performance improvements were achieved by compiling the server
>with gcc-2.3.1. Some of the x11perf results were nearly twice as fast!
>Overall performance improvement, using xbench, proved to be around %15.

Did you remember to use -m 486 (to produce code that runs fast on the 486,
but still runs on the 386), or just have it do 386 code?

[...]
>Slowly, the server is evolving from its pure 8514/a architecture to the
>S3 architecture. The next major jump will be when 16 bit or 24 bit
>color gets implemented :-)

I thought the next big jump would be when you can map in 1+M of video memory
and use it...

[...]
>Next, is how does the S3 architecture fair agains other accelerated cards?
>
>The January issue of Byte magazine voted the Actix's GraphicEngine32 (801)
>as one of the best overall graphic accelarated cards for window applications.
>At least on Byte's tests the 801 was faster than the ATI Ultra Pro (mach 32).
>And, I really doubt that the tests were executed at low clock frequencies.
>However, the article did not state the dot clock frequency which the tests 
>were executed at.  The other faster cards were based on the 34020 and cost
>more than $1400.

People have had the S3 for long enough to make good use of it, the Mach32 may
be too new for good drivers to be available yet.  If people decide that the
34020 cards don't need to emulate SVGA/EGA/CGA/Herc in hardware the price
should drop by more then $1000, if they insist on doing that the price may
drop by about $1000.  This would be the best card for X, because the 34020
is fully programable and can be made more X orientated then windows orientated.

Also, the 340xx has super great control over the display (size/shape/res/
borders).  The 34020 can even use the VRAM serial write regs...

[...]
>On the topic of local bus IDE cards:
>
>It takes about 6 and 50 seconds to recompile the kernel with  gcc-1.39.
>With an ISA IDE card, it takes about 7.5 minutes :-)
>
>How much does it cost? $89.

What does "6 and 50 seconds" mean?  Most IDE local bus cards mainly add lots
of cache.  We can do better by adding more RAM to the main system and using
it wisely...

[...]
-- 
           stripes@pix.com              "Security for Unix is like
      Josh_Osborne@Real_World,The          Multitasking for MS-DOS"
      "The dyslexic porgramer"                  - Kevin Lockwood
We all agree on the necessity of compromise.  We just can't agree on
when it's necessary to compromise.       - Larry Wall