*BSD News Article 54234


Return to BSD News archive

Path: euryale.cc.adfa.oz.au!newshost.anu.edu.au!harbinger.cc.monash.edu.au!news.uwa.edu.au!classic.iinet.com.au!swing.iinet.net.au!news.uoregon.edu!tank.news.pipex.net!pipex!news.mathworks.com!newsfeed.internetmci.com!bloom-beacon.mit.edu!cambridge-news.cygnus.com!meissner
From: meissner@cygnus.com (Michael Meissner)
Newsgroups: comp.unix.bsd.freebsd.misc,gnu.gcc.help,comp.os.linux.misc
Subject: Re: gcc optimisations when compiling the kernel
Date: 03 Nov 1995 20:41:00 GMT
Organization: Cygnus Support
Lines: 95
Message-ID: <MEISSNER.95Nov3154100@tiktok.cygnus.com>
References: <478mtj$e2v@plato.ucsalf.ac.uk>
NNTP-Posting-Host: tiktok.cygnus.com
In-reply-to: mark@plato.ucsalf.ac.uk's message of 1 Nov 1995 20:56:19 -0000
Xref: euryale.cc.adfa.oz.au comp.unix.bsd.freebsd.misc:8373 gnu.gcc.help:13739 comp.os.linux.misc:68865

In article <478mtj$e2v@plato.ucsalf.ac.uk> mark@plato.ucsalf.ac.uk (Mark Powell) writes:

| Looked in the handbook and the FAQ and couldn't find anything in there on
| which optimisations to use when compiling the kernel. Although I knew the
| -m486 flag would produce slightly faster code on a 486 with only slightly 
| larger binaries, I was informed that it would actually reduce performance on
| a Pentium. However, I did some tests on an unloaded 90MHz Pentium running
| FreeBSD 2.0.5 with the supplied gcc v2.6.3 used to compile Dhrystone v2.1
| 
| All figures in Dhrystones/second taken as an average of 3 x 10,000,000 run
| samples, when drhystone is compiled with the options on the left.

First of all, let me say that Dhrystone is pretty worthless as a benchmark
these days.

| gcc                                      80290.1
| gcc -fomit-frame-pointer                 81700.4
| gcc -O2                                 111624.2
| gcc -O                                  112035.1
| gcc -O2 -m486                           114561.9
| gcc -O -m486                            117896.4
| gcc -O -fomit-frame-pointer             127490.0
| gcc -O2 -fomit-frame-pointer            129067.8
| gcc -O2 -m486 -fomit-frame-pointer      130799.1
| gcc -O -m486 -fomit-frame-pointer       131268.6
| 
| Strange that the -O seems to have the edge over -O2 in most of the tests,
| although the edge is only very slight and can probably be ignored due to
| the general variance in dhrystone results.

As I recall, dhrystone has one loop of the form:

	int *p, i;

	for (i = 0; i < n; i++)
		something (i, p[i])

Strength reduction (which is turned on by default for -O2) converts this into
something like:

	int *p, i, tmp

	for (i = 0, tmp = 0; i < n; i++, tmp += 4)
		something (i, (int *)((char *)p + tmp))

However, on the x86 (and 88k which is where I noticed it), you have an
addressing mode that can do:

	reg + reg*4

This means there is now one extra instruction in the inner loop, and also one
more register needed.  Strength reduction doesn't notice this special case.
There is also the infamous strength reduction bug that caught several
kernels....

| Although -fomit-frame-pointer gives only a slight performance increase
| with no optimisation, it seems useful when optimisation is on.
| Compiling the FreeBSD 2.0.5-RELEASE kernel with my own configuration gives
| the following /kernel sizes:
| 
| gcc -O2 -m486 -fomit-frame-pointer	952K
| gcc -O2 -fomit-frame-pointer		888K
| gcc -O2					868K
| 
| It would seem that the default kernel compile flags should be:
| 
| -O2		( or -O, anyone with good gcc knowledge care to comment? )

The main difference in code generated between -m386 and -m486 is that the
compiler believes it can do pushes from memory on the 386, but on the 486 it
loads the memory into a register and then pushes it, which is a faster
sequence.  However, it can also force some other value spilled out of a
register in order to hold a temporary.

| to get the smallest possible kernel, but:
| 
| -O2 -fomit-frame-pointer
| 
| to get good performance for only slightly more binary, and:
| 
| -O2 -m486 -fomit-frame-pointer
| 
| would seem okay for everyone except people with 4Mb RAM. Are there really
| a lot of these?
| Comments welcome.

Well -fomit-frame-pointer can be a mixed blessing.  On one hand, it makes one
more register available, which for the x86 is helpful, but on the other hand,
if you look at the way the x86 encodes instructions, it actually makes for
larger instructions to reference things off of the stack pointer as compared to
the frame pointer.
-- 
Michael Meissner, Cygnus Support (East Coast)
Suite 105, 48 Grove Street, Somerville, MA 02144, USA
meissner@cygnus.com,	617-629-3016 (office),	617-629-3010 (fax)