*BSD News Article 24115

Xref: sserve comp.os.386bsd.questions:6947 comp.os.386bsd.bugs:1837 comp.os.386bsd.misc:1627 misc.test:29161
Path: sserve!newshost.anu.edu.au!munnari.oz.au!metro!sequoia!ultima!kralizec.zeta.org.au!godzilla.zeta.org.au!not-for-mail
From: bde@kralizec.zeta.org.au (Bruce Evans)
Newsgroups: comp.os.386bsd.questions,comp.os.386bsd.bugs,comp.os.386bsd.misc,misc.test
Subject: Re: [FreeBSD-1.0R] Epsilon -> Release patches - problems
Date: 10 Nov 1993 18:00:38 +1100
Organization: Kralizec Dialup Unix Sydney - +61-2-837-1183, v.32bis and v.42bis
Lines: 91
Message-ID: <2bq3imINN53k@godzilla.zeta.org.au>
References: <CG5LAE.4o3@agora.rain.com>
NNTP-Posting-Host: godzilla.zeta.org.au

In article <CG5LAE.4o3@agora.rain.com>,
David Greenman <davidg@agora.rain.com> wrote:
>>You are (again) incorrect.  It simply means that assuming the worst
>>possible memory fragmentation you can still allocate a 4k buffers.
>>That's fine if you only care about 4k file systems.  Given 8k file
>>systems, it is still not that difficult to get enough fragmentation to
>>cause noticable performance degradation--the most likely case being if
>>you also try to use 4k file systems at the same time (say, on a floppy
>>disk).

There are much worse cases than that.  E.g., 4K block allocated, 28K hole,
4K block, 28K hole, ...  Only 1/8 of the address space is allocated, but
no request for > 28K can be satisfied and the kernel will panic.  Such
a pattern is very unlikely but it's not easy to prove that it is
impossible.  The buffer cache won't request > 28K unless you have
expanded MAXBSIZE to >= 32K, but other parts of the kernel might.

>Because of the limit on number of buffers, even if all of the headers
>point to 4k buffers, and even if all of the 4k buffers occupy every
>other page in the malloc area, as soon as you want to expand a buffer
>to be 8k, the FS cache releases one of the 4k buffers, and you then
>have an 8k hole. Like I said, even with worst-case fragmentation,
>there is no problem.

Sorry, unless you have changed vfs__bio.c, then the new space has to
be allocated before the old space can be freed so that the old space
can be copied.

My version of vfs__bio.c and kern_malloc.c (almost) fix this by
releasing free buffers until malloc() succeeds. It's still hard
to prove that this works in all cases of interest, because there
might be a lot of buffers in use or severe fragmentation in the
memory allocated for non-buffers.  However, if

    (virtual address space size size in pages)
    >= N * ((max memory required for non-buffers) + allocbufspace)

then the worst case is every N'th page allocated, so free blocks
of size ((N - 1) * NBPG) are guaranteed.  Take N = (2 + 1) to
support MAXBSIZE = 8K, N = (4 + 1) to support MAXBSIZE = 16K, etc.
This leaves the problems limiting the memory required for non-buffers
and allocbufspace all being used up by in-use buffers.

>>The worst case would obviously be alternating allocations of 4k and 8k
>>blocks; it is easy to see why this would cause many unfillable 4k
>>fragments in the address space.  Assuming less adversarial timing, the

It's not that obvious.  If there are a lot of 8K blocks then allocbufspace
limits the total number of blocks (unless nbuf is a stronger limit, like
I think it is in FreeBSD).  I think the worst case is closer to alternating
allocations of 4K blocks and holes; then there's no way to allocate an
8K block.  This case is almost handled by the old hack to 0.1:

	bufpages = min( NKMEMCLUSTERS*2/5, bufpages );

This allows for NKMEMCLUSTERS*2/5 allocated blocks and the same number
of holes, so at most 4/5 of the address space is allocated for the
buffer cache.  The remaining 1/5 of the address space will usually
provide a free block to coalesce with one of the holes to produce an
8K hole.  The original version of the hack:

	bufpages = min( NKMEMCLUSTERS/2, bufpages );

is not so good because there is no remaining 1/5 of the address space.

>Again, the limit on the total number of buffers makes this problem null.
>
>FreeBSD's malloc code no longer holds on to freed page-sized allocations
>as the code in 386BSD did. This make all the difference.

The bufpages limit should have been 5 times lower in 0.1 to handle this
problem for MAXBSIZE = 8K!  Memory was wasted in the malloc buckets for
sizes 512, 1K, 2K, 4K and 8K (5 sizes gives the factor of 5).

There's still the problem of internal fragmentation.  The worst case for
the buffer cache is nbuf pages allocated for 512-byte fragments.  Then
8 times as much space will be allocated as when 512-byte fragments are
packed, so the allocbufspace limit is not much help.  Most versions of
386BSD depend on the nbuf limit being much stronger than the allocbufspace
limit for this case.

I don't use 8K file systems but did a lot of testing on DOS file systems
with block sizes of 512 and 2K while testing my fixes for these problems.
The limit on nbuf is unacceptable when the block size is 512 and when
there are a lot of fragments.  E.g., nbuf = 128 to suit a 1M cache for
an 8K file system reduces you to a 64K "cache" for 512-byte file systems.
I use nbuf = allocbufspace / 512 (which is equivalent to no limit).
One problem with my version is that this results in a lot of empty
buffers that clog up the LRU list.
-- 
Bruce Evans  bde@kralizec.zeta.org.au