*BSD News Article 69421


Return to BSD News archive

#! rnews 8388 bsd
Path: euryale.cc.adfa.oz.au!newshost.anu.edu.au!harbinger.cc.monash.edu.au!news.mel.connect.com.au!munnari.OZ.AU!news.ecn.uoknor.edu!solace!nntp.uio.no!news.cais.net!bofh.dot!news.mathworks.com!newsfeed.internetmci.com!in1.uu.net!news.artisoft.com!usenet
From: Terry Lambert <terry@lambert.org>
Newsgroups: comp.unix.bsd.freebsd.misc
Subject: Re: Linux vs. FreeBSD ... (FreeBSD extremely mem/swap hungry)
Date: Sat, 25 May 1996 18:45:13 -0700
Organization: Me
Lines: 148
Message-ID: <31A7B7A9.EFD261C@lambert.org>
References: <3188C1E2.45AE@onramp.net> <4o3ftc$4rc@zot.io.org> <31A5A8F6.15FB7483@zeus.co.uk> <31A5D0A8.59E2B600@zeus.co.uk> <DrxB6M.Iyn@kithrup.com> <31A6D551.41C67EA6@zeus.co.uk>
NNTP-Posting-Host: hecate.artisoft.com
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
X-Mailer: Mozilla 2.01 (X11; I; Linux 1.1.76 i486)

Damian Reeves wrote:
]     Sean> You can roll your own malloc, of course, and several
]     Sean> packages provide their own.  The kernel-level memory
]     Sean> allocation system calls are brk() (sbrk() on some systems is
]     Sean> an actual system call; on others, it's a wrapper for the
]     Sean> brk() system call) and mmap().
] 
] Indeed, I already have.  However its not going to help my statically
] linked Netscape from taking less memory is it?

So run the Linux Netscape (since FreeBSD runs Linux binaries).  The
BSDI Netscape uses a poor memory allocator because they used the
libc allocator.

Note: sbrk'ing pages back to the system, like the GNU malloc used
on Linux does, is less efficient than calling munmap.  This is
because the allocation area is linear when you use an sbrk based
allocation/return mechanism.  As a result, the frees using sbrk
only free back the pages to the last used page.  If there are
unused pages in the middle, you are stuck carrying them around.
The best approach is probably a pool-based allocator, with zones
for high, medium, and low persistence objects.

Btw: since the Linux libc data is linked into the application
with Linux shared library technology, I can make a libc change
which changes the size of the data such that the interface doesn't
change (a minor rev bump instead of a major), and then Netscape
no longer conforms to LGPL's "relink" clause.  There are a *lot*
of reasons to not use the GNU malloc, besides the many technical
ones.

]     Sean> Historically, BSD has backed memory with swap space.
]     Sean> FreeBSD, however, does not -- it uses the same method you
]     Sean> ascribe to Linux.  It is called "lazy allocation."  In fact,
]     Sean> Net/2 used this same method -- it was a hold-over from the
]     Sean> Mach VM code that Net/2 and later used.  (I believe that
]     Sean> Lite or Lite2 tries to keep track of how much swap space is
]     Sean> used, so won't allow it.  FreeBSD, however, does allow it.)
] 
] OK I didn't realise that FreeBSD already did this, I was trying to
] come up with some reason why FreeBSD used so much swap, now its even
] harder to explain.


In point of fact, this is called a memory overcommit architecture,
and even 386BSD had this when it was originally released because
it used the Mach VM.

Again, the reason you can't see why FreeBSD is "taking" so much
swap is because you are looking at the clean+dirty pages instead
of just the dirty pages.  The clean pages are in core and in swap
because it's faster to retrieve them from there, not because the
pages are unusable by another process (in fact, the will be LRU'ed
out on demand).  This is *normal* for a unified VM/buffer cache,
since any page in core is in fact in cache.

What's wasteful is that Linux has all this high speed access RAM
(physical memory) and all this medium speed access RAM (swap) and
all this low speed access RAM (the disk, and the program images
being used as swap store), and it is preferentially discarding
perfectly good medium speed access pages in favor of low speed
access pages and making the numbers you are misinterpreting look
"good".

] Very true, no guarantees can be made whether a program will SEGV on
] Linux.  Important system daemons need to be very careful that they
] will malloc(), touch, then free() enough memory on startup to increase
] their data size so that subsequent memory allocated to the process
] later on in time will always be available.  Even then fragmentation
] makes this almost impossible to achieve.  It could be said that it is
] impossible to use a Linux style memory manager for mission critical
] applications, unless they do all their processing in a fixed size
] static buffers.  One could install a SEGV signal handler that tried to
] restart the process in a sensible state after a memory fault, but this
] is a gross hack.  Then again, how much UNIX code actually checks for
] the malloc() return code to be zero and handles it appropriately?

The traditional "fix" for this problem is to mmap /dev/zero as
copy-on-write (make you wonder what sbrk is for...) for some
large amount of memory, then touch the pages for daemons.  The
allocation space is inherently sparse (unlike sbrk space), and
can be returned on a page-by-page basis vi munmap.


> 
> One thing I noticed with our Linux box today is that apart from
> init/kswapd the minimum text size of all the other processes on the
> machine was 204k.  Now, a 'size' on /bin/bash (which is /bin/sh on
> Linux), shows almost exactly 204k of code.  It would thus appear than
> on a fork()/exec(), extraneous text pages are not freed back to the
> OS.  Hopefully those extra pages are still shared between the other
> processes, otherwise every unique program will take a minimum of 200k
> of VM (although it shouldn't swap it out to swapspace but re-read off
> the
> filesystem).  I've yet to check this on BSD yet.
> 
>     >> I am not discussing load here, that is irrelevant.  Load and
>     >> paging are two totally different things.
> 
>     Sean> Not completely.  (I could back it up, but why bother?  You'd
>     Sean> just claim it was irrelevant.)
]     >> Do you know if the mount_mfs() process invokes the kernel
]     >> sbrk() call to manually return unused pages in its data segment
]     >> back to the OS?
] 
]     Sean> Well, actually, that's not how MFS works.  It allocates a
]     Sean> fixed size of memory (based on command line arguments, or
]     Sean> the default if none are given), "formats" that memory so
]     Sean> that it looks like a UFS filesystem, and then goes into
]     Sean> kernel mode and never comes back until the filesystem is
]     Sean> unmounted.  Since the VM allocation (in FreeBSD) is lazy,
]     Sean> neither phyiscal or swap memory are used until the memory is
]     Sean> actually touched (read or written).
] 
]     Sean> When a file is removed, however, it doesn't get "returned to
]     Sean> the system."  It will just sit there eating up space;
]     Sean> however, the filesystem will be able to reuse it, hopefully.
] 
] Aha, so MFS is going to eat my swap and never return it then.

Clean pages are returnable; it would be possible to build a dynamic
sizing mfs, but why bother?  Why not just set the stick bit on the
files if you want them to stay in core?

] My argument has nothing to do with a Linux/BSD war, Linux is merely
] something I can use to compare BSD against on the same hardware.  My
] argument is why should my xbiff take 1.3MB of RSS.  Do you think that
] is a useful use of memory?  Back in days gone by, one used to run a
] UNIX server with 4MB of ram which supported 20 odd interactive users.
] Now you'd plump for 32MB or maybe even 64MB as a minimum to achieve
] this, yet the requirements of the users have hardly changed.

Again, you are misinterpreting the numbers.  Consider that a shared
libc may be counted against the RSS but not the VSZ; the size
you think you are using is incorrect.  Divide the resident pages
for the libc by the number of processes using it and subtract
that from the libc resident pages, and then subtract that value
from the number for each libc using process.

In other words, you are counting shared libraries once for every
process using them.  Stop it, it's the wrong thing to do.


                                        Terry Lambert
                                        terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.