*BSD News Article 73079

Path: euryale.cc.adfa.oz.au!newshost.anu.edu.au!harbinger.cc.monash.edu.au!munnari.OZ.AU!news.mel.connect.com.au!news.syd.connect.com.au!gidora.zeta.org.au!not-for-mail
From: bde@zeta.org.au (Bruce Evans)
Newsgroups: comp.unix.bsd.freebsd.misc
Subject: Re: Parallel laplink abuse leads to death of kernel secondary timer
Date: 8 Jul 1996 06:56:52 +1000
Organization: Kralizec Dialup Unix
Lines: 63
Message-ID: <4rp8ak$1jp@godzilla.zeta.org.au>
References: <4rn4ip$kcr@harbinger.cc.monash.edu.au>
NNTP-Posting-Host: godzilla.zeta.org.au
Keywords: printer port laplink cable timer

In article <4rn4ip$kcr@harbinger.cc.monash.edu.au>,
Mike  Mc Gaughey <mmcg@cs.monash.edu.au> wrote:
>Hi, all,
>
>I'm running FreeBSD 2.1.0-R on two machines, connected by a parallel
>laplink cable.  The first is a 486/33, with a printer port on a multi
>...
>large FTP, `top' reports that 60-70% of CPU time (on each machine) is
>spent processing interrupts (whether or not the printer port on the P90
>is set up to generate interrupts :).

There are software interrupts and timeouts even if there are no hardware
interrupts.

>Here's the interesting bit: if an FTP lasts for more than about a
>minute, `systat' (when displaying vmstat) complains that `the kernel
>secondary timer has died' - and it stays dead, even after the link
>becomes quiescent.  As far as I can tell from the kernel sources, this
>timer runs at some multiple of the primary (real-time) timer, and is

It is supposed to be as independent as possible of the primary timer.
In FreeBSD, it uses an independent hardware timer with a normal
frequency of approximately 1.28 times the frequency of the primary
timer.

>used mainly for profiling; after it dies, various statistics aren't
>updated (e.g. the CPU times and percentages on `top' become static).

It is used mainly for collecting statistics _not_ related to profiling.
If profiling is enabled for any process, then (in FreeBSD) the secondary
timer is run 8 times as fast for collecting profiling statistics and
normal statistics are collected on every 8th secondary timer interrupt
(so they aren't much affected by profiling).

>What I want to know is: is this a major problem?  Is there anything
>else (besides statistics gathering) that relies on this secondary
>clock, which breaks if the clock stops?  Is there a more gentle way of

The statistics are used by the scheduler and who-knows-what applications.

>restarting the clock (than a reboot)?  Is there any problem with just

Faking a statclock interrupt might work:

	echo "set ipending=0x100" | gdb -k -w /kernel /dev/mem

>leaving it dead?  Does anyone know why it dies?  Does the (large) time

The (nonstandard) options AUTO_EOI_2 and DUMMY_NOPS may break it.

>spent processing printer port interrupts mean that clock-related
>interrupts are lost, resulting in the (permanent) failure of the
>secondary clock?  Is there a software fix for this?  Should I be

The driver certainly delays clock interrupts for too long.  It should
use splimp() instead of splhigh() (and arrange for splimp() to mask tty
interrupts).  This might cause some packets to be lost, but delaying
clock interrupts for more than a few hundred usec isn't acceptable.

Clock interrupts shouldn't be completely lost - multiple ones should
be coalesced.  Perhaps delaying the interrupt handler doesn't work.
-- 
Bruce Evans  bde@zeta.org.au