*BSD News Article 70934

Path: euryale.cc.adfa.oz.au!newshost.anu.edu.au!harbinger.cc.monash.edu.au!news.rmit.EDU.AU!news.unimelb.EDU.AU!munnari.OZ.AU!spool.mu.edu!sgigate.sgi.com!uhog.mit.edu!news.mathworks.com!newsfeed.internetmci.com!in1.uu.net!news.artisoft.com!usenet
From: Terry Lambert <terry@lambert.org>
Newsgroups: comp.unix.bsd.misc
Subject: Re: Q: fast file system
Date: Thu, 13 Jun 1996 17:38:10 -0700
Organization: Me
Lines: 125
Message-ID: <31C0B472.37FEDA0E@lambert.org>
References: <31BDB7F7.41C67EA6@uiuc.edu> <31BE2D8E.136F432E@lambert.org> <31BE51CC.41C67EA6@uiuc.edu>
NNTP-Posting-Host: hecate.artisoft.com
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
X-Mailer: Mozilla 2.01 (X11; I; Linux 1.1.76 i486)

Vlad wrote:
] 
] Terry Lambert wrote:
] >
] > Vlad wrote:
] > ] (1) Is is true that only files without indirect blocks can have
] > ] the last bit allocated in a fragment ?
] >
] > If you allocate the last block in a frag, it is no longer a frag,
] > it's an ordinary block.
] 
] ??? I really don't understand what you mean. My question was if
] the following scenario was true: fragments were introduced in
] BSD4.2 so that it was possible not to waste too much storage
] to internal fragmentation with larger block sizes (!40-50% with
] 4096-blocks with _no_ individually addressable fragments!) and
] yet have the benefit of higher throughput due to larger blocks.

Clusters were introduced to combat internal fragmentation.

There is *no such thing* as a frag in the middle of a file, so
there is nothing use of frags could do or not do to combat
internal fragmentation.  Frags exist only at the end of files.

You seem bound and determined to have fragmentation, in the
DOS sense of the word, apply to FFS.  It doesn't.  A "frag" is
a division of a block, not a division of allocability.  The
reported "fragmentation" is not a measure of the unusable space,
it's a measure of the use of fragment blocks.

] Well then, small files are the ones who contribute the most to
] the waste. Hence no real need to bother with _fragment_
] allocation overhead for large files.

No, no, no... each file contributes equally to "fragmentation".
It doesn't matter if the partial block ("frag") is at the end of
a bunch of whole blocks, or is the first thing in the file.

] You remember of course that after we've allocated 12 direct
] data bocks to a file we have to get out first singly-indirect
] block, then doubly-indirect blocks etc. Well then, 12*4096 is
] 48k and it seems like a natural threshold to stop allocation
] on fragment level (i.e. as soon as we begin using the first
] indirect pointer in the dinode structure) -- that was my
] impression from what I've read so far. Do you see what I am
] trying to say ? In other words small enough files can share
] fragments of blocks between themselves, but large ones can't.

This isn't true, really; it has to do with the block bitmap.

Yes, effectively, the FFS does fail to fully implement the
idea: a file with indirect blocks actually can't have frags
because of the use of the sign bit for indirect blocks.  Really,
there should either be a seperate frag pointer, or the block
allocation should know that the block is a frag block, and
one of the "spare" char fields should be used as a bitmap.

] I mean in the situation like
] this:
] 
]         |__A__|__B__|__B__|__B__|
] (_one_ block with 4 fragments, all 4 allocated to 2 _distinct_ files
] A and B) -- how is this situation recorded ? I mean it's Ok with the
] free block bitmap (1 bit for each _fragment_, but that only indicates
] free/allocated status), but block pointers in _inodes_ point to whole
] _blocks_, don't they ? How are individual fragments resolved on inode
] level ?

It's recorded as a block pointer and an internal allocation bitmap.

If you were to record a block as 8 * (block / 8), then you have
either a frag +  a char (1 bit per frag) bitmap, or you have a
full allocation of block-(block/8) < size < block... in which case
the last frag isn't a frag.

Maybe the confusing part is that a frag is not a physical disk
block in size, inless it happens to accidently match it?

For a 4k FS, 4k/8 = 512 = disk block size.  This is just a
consequence of the 8 bit bitmap dividing into the 4k and giving
512.  for an 8k FS, the smallest frag size is 1k, even though the
disk block size is still 512.

You should probably examine /usr/src/sys/ufs/ffs/fs.h:

[ ... ]
/*
 * Addresses stored in inodes are capable of addressing fragments
 * of `blocks'. File system blocks of at most size MAXBSIZE can
 * be optionally broken into 2, 4, or 8 pieces, each of which is
 * addressable; these pieces may be DEV_BSIZE, or some multiple of
 * a DEV_BSIZE unit.
 *
 * Large files consist of exclusively large data blocks.  To avoid
 * undue wasted disk space, the last data block of a small file may be
 * allocated as only as many fragments of a large block as are
 * necessary.  The file system format retains only a single pointer
 * to such a fragment, which is a piece of a single large block that
 * has been divided.  The size of such a fragment is determinable from
 * information in the inode, using the ``blksize(fs, ip, lbn)'' macro.
 *
 * The file system records space availability at the fragment level;
 * to determine block availability, aligned fragments are examined.
 */
[ ... ]

You may also want to look at the frag caclulation macro's (later
in the same file).

Frags are stored per cylinder group, as lists of fragged blocks
(see cg_frotor usage in the per cylinder group structure).

You should probably read the FFS paper (on ftp.sage.usenix.org),
or in the O'reilly published 4.4 books.


You should also read about the 4.4 changes to the fragging in
"The Design and Implementation of the 4.4 BSD Operating System".


                                        Terry Lambert
                                        terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.