*BSD News Article 71018

Path: euryale.cc.adfa.oz.au!newshost.anu.edu.au!harbinger.cc.monash.edu.au!news.mel.connect.com.au!munnari.OZ.AU!news.ecn.uoknor.edu!news.ysu.edu!usenet.ins.cwru.edu!gatech!newsfeed.internetmci.com!news.uoregon.edu!vixen.cso.uiuc.edu!usenet
From: Vlad <roubtsov@uiuc.edu>
Newsgroups: comp.unix.bsd.misc
Subject: Re: Q: fast file system
Date: Fri, 14 Jun 1996 02:08:20 -0500
Organization: Baphomet's Throne
Lines: 194
Message-ID: <31C10FB0.41C67EA6@uiuc.edu>
References: <31BDB7F7.41C67EA6@uiuc.edu> <31BE2D8E.136F432E@lambert.org> <31BE51CC.41C67EA6@uiuc.edu> <31C0B472.37FEDA0E@lambert.org>
NNTP-Posting-Host: mossberg-74.slip.uiuc.edu
Mime-Version: 1.0
Content-Type: text/plain; charset=iso-8859-1
Content-Transfer-Encoding: 8bit
X-Mailer: Mozilla 3.0b4 (X11; I; FreeBSD 2.1.0-RELEASE i386)

Terry Lambert wrote:
> 
> Vlad wrote:
> ]
> ] Terry Lambert wrote:
> ] >
> ] > Vlad wrote:
> ] > ] (1) Is is true that only files without indirect blocks can have
> ] > ] the last bit allocated in a fragment ?
> ] >
> ] > If you allocate the last block in a frag, it is no longer a frag,
> ] > it's an ordinary block.
> ]
> ] ??? I really don't understand what you mean. My question was if
> ] the following scenario was true: fragments were introduced in
> ] BSD4.2 so that it was possible not to waste too much storage
> ] to internal fragmentation with larger block sizes (!40-50% with
> ] 4096-blocks with _no_ individually addressable fragments!) and
> ] yet have the benefit of higher throughput due to larger blocks.
> 
> Clusters were introduced to combat internal fragmentation.

No. Not by canonical definition of "internal fragmentation". Clusters
improve throughput + reduce the size of certain data structures.
Internal fragmentation means the remainder of last block goes unused
(logically it belongs to that one file only), it can't be given to
another file UNLESS you decide to break blocks into individually
addressable fragments. Just like paging removes _external_ memory
fragmentation, but introduces _internal_ instead. I think you may be
confused by the same stem in 2 words: fragmentation and fragments in
this context. A quote from my CS book (I confess I don't have proper BSD
manuals -- but then I wouldn't have these questions :):

"...The harware disk sector is usually 512 bytes. A block size larger
than 512 bytes is desirable for SPEED. However, because UNIX file
systems usually contain a very large number of SMALL files, much larger
blocks would cause excessive INTERNAL FRAGMENTATION...
	The 4.2BSD solution is to use two block sizes for files which have NO
INDIRECT BLOCKS: all the blocks of a file are of large _block_ size
(such as 8k), except the last. The last block is an appropriate multiple
of a smaller _fragment_ size (for example, 1024) to fill out the
file..."
<key words capitalized by me>


> There is *no such thing* as a frag in the middle of a file, so

I never said anything about the middle of a file !!!!

> there is nothing use of frags could do or not do to combat
> internal fragmentation.

Oh yes, there is: in the example above a 8193-byte file will waste
nearly one whole block (2nd) without fragments (using only 1 byte of
it), while with 1024-fragments it will only waste 1023 bytes of a
fragment of its second block.


> You seem bound and determined to have fragmentation, in the
> DOS sense of the word, apply to FFS.  It doesn't.  A "frag" is
NO!!! And I _do_ know about DOS -I've written my own assembler routines
to format DOS drives...But DOS is another story.

> a division of a block, not a division of allocability.  The
> reported "fragmentation" is not a measure of the unusable space,
> it's a measure of the use of fragment blocks.

Amen! That's what I've meant all along! (except I think reported
fragmentation relates to contigous/discontigous allocation, not to
fragments of blocks [unfortunate terminology]. Come on, on my FreeBSD
system I have <1% fragmentation everywhere -- that's because most of
files were installed exactly one after another at the same time, even
though each one of them may have a fragment at the end for all I
care...)
> 
> ] Well then, small files are the ones who contribute the most to
> ] the waste. Hence no real need to bother with _fragment_
> ] allocation overhead for large files.
> 
> No, no, no... each file contributes equally to "fragmentation".
> It doesn't matter if the partial block ("frag") is at the end of
> a bunch of whole blocks, or is the first thing in the file.

Excuse me, but you are wrong: 10^6 1-byte sized files will take up at
least 10^6 fragments, i.e. nearly a GIGABYTE of disk storage (assuming
that smallest fragment is 1024). But 1 10^6-byte sized file will use
~1Mbyte + 1fragment at most -- 3 orders of magnitude difference. In case
you think this example is extreme, have a look at table 1 in "A Fast
File System for UNIX" by McKusick, Joy, Leffler, and Fabry. (Yes, a
fragment is a fragment everywhere -- but RELATIVE contribution is
different for files of different sizes).

> ] You remember of course that after we've allocated 12 direct
> ] data bocks to a file we have to get out first singly-indirect
> ] block, then doubly-indirect blocks etc. Well then, 12*4096 is
> ] 48k and it seems like a natural threshold to stop allocation
> ] on fragment level (i.e. as soon as we begin using the first
> ] indirect pointer in the dinode structure) -- that was my
> ] impression from what I've read so far. Do you see what I am
> ] trying to say ? In other words small enough files can share
> ] fragments of blocks between themselves, but large ones can't.
> 
> This isn't true, really; it has to do with the block bitmap.
> 
> Yes, effectively, the FFS does fail to fully implement the
> idea: a file with indirect blocks actually can't have frags
> because of the use of the sign bit for indirect blocks.  Really,
> there should either be a seperate frag pointer, or the block
> allocation should know that the block is a frag block, and
> one of the "spare" char fields should be used as a bitmap.
> 
> ] I mean in the situation like
> ] this:
> ]
> ]         |__A__|__B__|__B__|__B__|
> ] (_one_ block with 4 fragments, all 4 allocated to 2 _distinct_ files
> ] A and B) -- how is this situation recorded ? I mean it's Ok with the
> ] free block bitmap (1 bit for each _fragment_, but that only indicates
> ] free/allocated status), but block pointers in _inodes_ point to whole
> ] _blocks_, don't they ? How are individual fragments resolved on inode
> ] level ?
> 
> It's recorded as a block pointer and an internal allocation bitmap.

But a GLOBAL bitmap can't tell _who_ owns the fragment, it can only tell
that's it's been taken (or not). Unless every inode somehow has a tiny
PRIVATE bitmap just for its own single fragment...

> 
> If you were to record a block as 8 * (block / 8), then you have
> either a frag +  a char (1 bit per frag) bitmap, or you have a

Would this char be stored in an inode then ?

> full allocation of block-(block/8) < size < block... in which case
> the last frag isn't a frag.
> 
> Maybe the confusing part is that a frag is not a physical disk

Nope.

> You should probably examine /usr/src/sys/ufs/ffs/fs.h:

I guess I'll have to. And I've tried to, maybe not enough effort :(


> /*
>  * Addresses stored in inodes are capable of addressing fragments
>  * of `blocks'.                  / ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 |--------------------------------|
How, damn it !!??? Are some bits of 32-bit block pointers in inodes used
as fragment bitmaps of sorts or what ?

>  * be optionally broken into 2, 4, or 8 pieces, each of which is
>  * addressable; these pieces may be DEV_BSIZE, or some multiple of
>  * a DEV_BSIZE unit.
>  *
>  * Large files consist of exclusively large data blocks.  To avoid
>  * undue wasted disk space, the last data block of a small file may be

I was right about small files--------------------------^^^^^

>  * allocated as only as many fragments of a large block as are
>  * necessary.  The file system format retains only a single pointer
>  * to such a fragment, which is a piece of a single large block that
>  * has been divided.  The size of such a fragment is determinable from
>  * information in the inode, using the ``blksize(fs, ip, lbn)'' macro.
>  *
>  * The file system records space availability at the fragment level;

I need to know details of how this is done-------------^^^^^^^^^^^^^
That's the crux of my problem....I want to start mentally at an inode
and eventually get to all the blocks + this last fragment !

>  * to determine block availability, aligned fragments are examined.

> 
> You may also want to look at the frag caclulation macro's (later
> in the same file).
> 
> Frags are stored per cylinder group, as lists of fragged blocks
> (see cg_frotor usage in the per cylinder group structure).
> 
> You should probably read the FFS paper (on ftp.sage.usenix.org),
> or in the O'reilly published 4.4 books.
> 
> You should also read about the 4.4 changes to the fragging in
> "The Design and Implementation of the 4.4 BSD Operating System".


Thanx for the pointers. I may appear stubborn because I want to
_understand_ :)
Vlad.
До