*BSD News Article 50377


Return to BSD News archive

Path: euryale.cc.adfa.oz.au!newshost.anu.edu.au!harbinger.cc.monash.edu.au!simtel!zombie.ncsc.mil!cs.umd.edu!not-for-mail
From: torek@elf.bsdi.com (Chris Torek)
Newsgroups: comp.unix.bsd.bsdi.misc
Subject: Re: Major bug?  (probably programmer error :)
Date: 4 Sep 1995 18:41:40 -0700
Organization: Berkeley Software Design, Inc.
Lines: 79
Message-ID: <42g9sk$9ee@elf.bsdi.com>
References: <42ff8m$b3d@clarknet.clark.net> <VIXIE.95Sep4112251@wisdom.vix.com> <42fpad$rie@clarknet.clark.net>
Reply-To: torek@bsdi.com
NNTP-Posting-Host: elf.bsdi.com

In article <VIXIE.95Sep4112251@wisdom.vix.com> vixie@wisdom.vix.com
(Paul A Vixie) notes that seeking by ...
>>>       lseek(fd,-1L*sizeof(b),1);
>>>       write(fd,&b,sizeof(b));

creates a file with a large `hole' (a sparse file).

>>Try this:
>>        printf("offset: %ld\n", -1L*sizeof(b));

Actually, this should be printed with `%lu' on any current BSD/OS system.

In article <42fpad$rie@clarknet.clark.net> Alan Weiner <alweiner@clark.net>
asks:
>Doesn't -1L*sizeof(b) do an implicit conversion to long?  

No.  In general, -1L*sizeof(b) will typically be either `unsigned
long' or `long'.  The type of the result of sizeof() is size_t,
which is (by definition in the ANSI C standard) an unsigned integral
type.  For various reasons, it will almost always be either `unsigned
int' or `unsigned long'.

On platforms in which sizeof(int) < sizeof(long), if the type of
`size_t' is `unsigned int', `-1L * sizeof(b)' will convert sizeof(b)
from unsigned int to signed long and will result in a negative
number.  For instance, if sizeof(b) is 6U, on such a machine this
will compute -1L * 6U, giving -6L.  This might occur on a PC running
MS-DOS.

On the other hand, on platforms on which sizeof(int) < sizeof(long)
but the type of `size_t' is `unsigned long', the multiplication
will be done in `unsigned long' and will result in a positive
number.  If ULONG_MAX is 0xffffffffffffffff and sizeof(b) is 6UL,
on this platform, -1L * sizeof(b) will be 0xfffffffffffffff0
or 18446744073709551600 (i.e., ULONG_MAX + 1 - 6).  This might
occur on, say, a DEC Alpha.

On platforms on which sizeof(int) == sizeof(long), the type of
size_t can be either `unsigned int' or `unsigned long' without
affecting the result.  The multiplication will be done in unsigned
arithmetic and will again result in ULONG_MAX + 1 - 6 (assuming
6 bytes for `b').

All existing BSD/OS 1.x and 2.x platforms have size_t defined as
`unsigned int' and sizeof(int) == sizeof(long).  We therefore fall
in this lattermost category.  Since our ULONG_MAX is 0xffffffff
(4294967295), you get values like 4294967290.

You might legitimately wonder how this ever worked.  The answer is
that this bug was dormant -- it could only show up when we supported
larger files (and 9GB disks, etc.).

In BSD/OS, file sizes and offsets have type `off_t'.  In 1.x,
`off_t' was simply `long'.  This is how the bug was hidden:  you
would compute -1L * 6U as 4294967290, but then add the current file
offset (6) to the given lseek offset (4294967290), resulting in an
overflow (4294967296) which was truncated to 0.  The second write()
call would thus rewrite the desired bytes.

In BSD/OS 2.x (2.0 and 2.0.1), `off_t' is a 64-bit integral type.
The addition no longer overflows -- 4294967290 + 6 is simply
4294967296.

Because the type of `sizeof' is implementation-defined, code that
intends to seek backwards over a record it just read should say:

	lseek(fd, -(off_t)sizeof record, SEEK_CUR);

rather than:

	lseek(fd, (off_t)-sizeof record, SEEK_CUR);

Such code will work anywhere the basic idea works, regardless of
the relative sizes and types of `off_t' and `size_t'.
-- 
In-Real-Life: Chris Torek, Berkeley Software Design Inc
Berkeley, CA	Domain:	torek@bsdi.com	+1 510 549 1145
  `... if we wish to count lines of code, we should not regard them as
   ``lines produced'' but as ``lines spent.'' '  --Edsger Dijkstra