*BSD News Article 58220


Return to BSD News archive

Newsgroups: comp.bugs.2bsd
Path: euryale.cc.adfa.oz.au!newshost.anu.edu.au!harbinger.cc.monash.edu.au!news.mel.connect.com.au!munnari.OZ.AU!news.ecn.uoknor.edu!news.cis.okstate.edu!news.ksu.ksu.edu!news.physics.uiowa.edu!math.ohio-state.edu!howland.reston.ans.net!newsfeed.internetmci.com!in1.uu.net!news.new-york.net!wlbr!sms
From: sms@wlv.iipo.gtegsc.com (Steven M. Schultz)
Subject: Extremely annoying inefficiency in the C compiler (#289)
Sender: news@wlbr.iipo.gtegsc.com (System Administrator)
Organization: GTE Government Systems, Thousand Oaks CA USA
Message-ID: <DKqtus.LC1@wlbr.iipo.gtegsc.com>
X-Nntp-Posting-Host: wlv.iipo.gtegsc.com
Date: Sat, 6 Jan 1996 04:59:16 GMT
Lines: 221

Subject: Extremely annoying inefficiency in the C compiler (#289)
Index:	lib/ccom/c00.c 2.11BSD

Description:
	The C compiler generates unnecessary copies of strings into the Data
	segment!

Repeat-By:
	Copy the following test program to /tmp/x.c 
======
#define MSG "This is a test message\n"

main()
	{
	write(2, MSG, sizeof(MSG));
	}
======

	and compile with "cc -O -S x.c".  

	Look at the generated code (reformatted slightly for brevity):

_main:
	jsr	r5,csv
	mov	$30,(sp)
	mov	$L4,-(sp)
	mov	$2,-(sp)
	jsr	pc,*$_write
	cmp	(sp)+,(sp)+
	jmp	cret
.data
L4:.byte 124,150,151,163,40,151,163,40,141,40,164,145,163,164
   .byte 40,155,145,163,163,141,147,145,12,0
L5:.byte 124,150,151,163,40,151,163,40,141,40,164,145,163,164
   .byte 40,155,145,163,163,141,147,145,12,0

	NOTE the extra (and completely unused) data generated at "L5:".  It
	is an entire copy of the 'ERR' string corresponding to the 
	"sizeof (ERR)" in the program.

	In fact, the simpler test case:

		int i;
		i = sizeof ("this is a test");

	causes the string to be generated into the object file when only
	the size is being asked for.

Fix:
	The C compiler is not only counting the characters in the string
	(the $30 in the generated code is the correct length of the string)
	but is also emitting the string into the object file.

	"sizeof string" should NOT cause a copy of the string to be 
	generated.

	This was initially spotted when I did a "strings /bin/csh" and
	noticed two instances of "longjmp botch" in the output.  Upon
	investigating further it was determined that the module setjmperr.c
	in lib/libc/pdp/gen used the construct "write(2, MSG, sizeof(MSG));"
	and that two copies of the error message were in each program that
	called setjmperr().

	There were a total of 4 modules in libc.a which could enter duplicate
	strings into applications.  I have no idea how many applications there
	are which use "sizeof (string)" in them but for each use of that
	form of 'sizeof' additional D space is taken up.

	The fix is fairly simple.  In phase 0 of the C compiler when a
	STRING is about to be generated the operator stack is scanned
	backwards for LPARN (left paren) and SIZEOF operators.  Both of these
	must be checked for in  order to cover all the following cases:

		sizeof "foo";
		sizeof ("foo");
		sizeof (("foo"));

	If it is determined that the string is inside a 'sizeof' operator
	then a new function is called which only counts the characters in
	the string but does not generate any intermediate object code.

	To install this fix cut where indicated and save to a file (/tmp/289).

	Then:

		patch -p0 < /tmp/289
		cd /usr/src/lib/ccom
		make
		make install
		make clean

	The old version of the C compiler is saved by the "make install"
	command (the phases of the compiler are backed to /lib/oc0 and 
	/lib/oc1 respectively).  If anything appears to be compiling wrong
	you can fall back to that old version.

	Thus far I've recompiled libc.a, the C compiler and 'csh' and
	all is well.

	You may want to wait to recompile libc.a, there will be an update
	coming out soon for libc.a.  But if you wish to recompile libc.a
	now:

		cd /usr/src/lib/libc
		make clean
		make
		make install
		make clean

==========================cut here=======================
*** /usr/src/lib/ccom/c00.c.old	Sat Jul  3 21:33:38 1993
--- /usr/src/lib/ccom/c00.c	Fri Jan  5 20:12:17 1996
***************
*** 1,7 ****
  /* C compiler
   *
   *
-  *
   * Called from cc:
   *   c0 source temp1 temp2 [ profileflag ]
   * temp1 gets most of the intermediate code;
--- 1,7 ----
  /* C compiler
   *
+  *	2.1	(2.11BSD)	1996/01/04
   *
   * Called from cc:
   *   c0 source temp1 temp2 [ profileflag ]
   * temp1 gets most of the intermediate code;
***************
*** 474,479 ****
--- 474,489 ----
  	strflg = 0;
  }
  
+ cntstr()
+ {
+ 	register int c;
+ 
+ 	nchstr = 1;
+ 	while ((c = mapch('"')) >= 0) {
+ 		nchstr++;
+ 	}
+ }
+ 
  /*
   * read a single-quoted character constant.
   * The routine is sensitive to the layout of
***************
*** 581,587 ****
  	int *op, opst[SSIZE], *pp, prst[SSIZE];
  	register int andflg, o;
  	register struct nmlist *cs;
! 	int p, ps, os;
  	char *svtree;
  	static struct cnode garbage = { CON, INT, (int *)NULL, (union str *)NULL, 0 };
  
--- 591,597 ----
  	int *op, opst[SSIZE], *pp, prst[SSIZE];
  	register int andflg, o;
  	register struct nmlist *cs;
! 	int p, ps, os, xo = 0, *xop;
  	char *svtree;
  	static struct cnode garbage = { CON, INT, (int *)NULL, (union str *)NULL, 0 };
  
***************
*** 634,640 ****
  
  	/* fake a static char array */
  	case STRING:
! 		putstr(cval, 0);
  		cs = (struct nmlist *)Tblock(sizeof(struct nmlist));
  		cs->hclass = STATIC;
  		cs->hoffset = cval;
--- 644,676 ----
  
  	/* fake a static char array */
  	case STRING:
! /*
!  * This hack is to compensate for a bit of simplemindedness I'm not sure how
!  * else to fix.  
!  *
!  *	i = sizeof ("foobar");
!  *
!  * or
!  *	i = sizeof "foobar";
!  *
!  * would generate ".byte 'f,'o','o,'b,'a,'r,0" into the data segment!
!  *
!  * What I did here was to scan to "operator" stack looking for left parens
!  * "(" preceeded by a "sizeof".  If both are seen and in that order or only
!  * a SIZEOF is sedn then the string is inside a 'sizeof' and should not 
!  * generate any data to the object file.
! */
! 		xop = op;
! 		while	(xop > opst)
! 			{
! 			xo = *xop--;
! 			if	(xo != LPARN)
! 				break;
! 			}
! 		if	(xo == SIZEOF)
! 			cntstr();
! 		else
! 			putstr(cval, 0);
  		cs = (struct nmlist *)Tblock(sizeof(struct nmlist));
  		cs->hclass = STATIC;
  		cs->hoffset = cval;
*** /VERSION.old	Sun Dec 31 12:11:36 1995
--- /VERSION	Fri Jan  5 19:38:01 1996
***************
*** 1,4 ****
! Current Patch Level: 288
  
  2.11 BSD
  ============
--- 1,4 ----
! Current Patch Level: 289
  
  2.11 BSD
  ============