Introduction:

Alpine4Linux is a userlevel FreeBSD 4.8 networking stack running on top
of a stock Linux kernel. The original idea is attributed to [1]. However
I must point out the Alpine4Linux is *not* a port of the original Alpine
(referred to as Alpine4BSD henceforth). In fact there is not a single line 
of code common to the two implementations.

Alpine4Linux has two components:
1. A daemon (alpine_server), that runs the FreeBSD stack code and does
   network I/O on behalf of processes wishing to use the FreeBSD stack.
2. Shared libraries (libClientSocket.so and libAlpineSupport.so) that
   hijack networking related system calls and divert them to the
   alpine_server. 

Why Alpine4Linux:

I did this project because I was fascinated by the idea of running
kernel components in a userlevel process. But seriously, I don't know why 
anyone would want to run a FreeBSD stack in userspace on a Linux box. 

The authors of the original Alpine4BSD paper cite better debugging 
and faster compile-test cycles when doing network protocol development.
That seems to be as good a reason as any.

Supported versions:

The FreeBSD stack is the 4.8-RELEASE version (downloaded on April 8 2003).

The only Linux dependency that the code has is that it support PF_PACKET
family of sockets and it should support makecontext() and swapcontext(). 
Other than that it should run on any Linux distribution.
uname -a on my Linux box:
Linux localhost 2.4.20-13.8 #1 Mon May 12 12:20:54 EDT 2003 i686 i686 i386 GNU/Linux

How does it work:

The alpine_server is a Linux program that acts like a FreeBSD kernel as
far as the networking stack is concerned. Just like the FreeBSD kernel it
provides client programs with a socket layer. It also does network I/O 
on behalf of client programs. But that is where the similarity ends. 

A kernel presents a system call interface to client programs. The
alpine_server presents a RPC interface. RPC here simply means that 
it listens for requests over the network. 

For e.g. if the client program tries to open a socket(), a message 
will be sent to the alpine_server. The message will indicate the 
type of the request (REQ_SOCKET) and its parameters (AF_INET,SOCK_STREAM).
The response will contain the type of the response (RESP_SOCKET) and
the return value (socket_fd, errno).

Client programs link with 2 shared libraries libClientSocket.so and
libAlpineSupport.so using LD_PRELOAD and LD_LIBRARY_PATH environment
variables. These libraries "intercept" the socket related functions before
they can be processed conventionally by the Linux libc. Instead each
socket related system call (e.g. socket(), bind(), connect()) is transformed
into a message to the alpine_server.

The alpine_server and its clients communicate over a standard TCP socket
bound to 127.0.0.1.

Requirements:

An unmodified networking stack:

The sys/net, sys/netinet and certain files under sys/kern must be from
the stock FreeBSD-4.8 release. This requirement was mostly satisfied.
I had to make 3 changes to work around some differences between Linux 
and FreeBSD. The changes are trivial and do not change functionality. 
See Appendix A for more details on these changes.

Ability to use unmodified Linux binaries:

It should be possible to simply define the LD_PRELOAD and LD_LIBRARY_PATH
variables and run any dynamically linked Linux networking application
against the FreeBSD stack.

E.g. It is possible to configure the FreeBSD stack interfaces using the
stock 'ifconfig' on Linux. I have also run 'telnet', 'nmap' and 'ping' 
against the alpine_server with no problems. Unfortunately it was too
difficult to use the Linux 'route' command against the FreeBSD stack.
Alpine4Linux provides the 'route' program from FreeBSD, ported to Linux,
that can be used to configure routing in the FreeBSD stack.

Reuse as much of the FreeBSD kernel code as possible:

This requirement is subjective but I think Alpine4Linux utilizes
a *lot* of unmodified FreeBSD kernel code. In fact the alpine_server 
defines only two non-trivial functions that are required by the kernel:
mi_switch() and scheduler().

scheduler() runs the main select() loop in the alpine_server.
mi_switch() deals with switching the FreeBSD kernel execution context.
These functions are described in detail later in this document.

Alpine4BSD uses unmodified sysinit, timeouts, tsleep() and wakeup(),
descriptor management for e.g.

Implementation:

Sending and receiving packets:

The alpine_server is invoked with the name of the interface (on the host
OS) that it uses to send/receive packets. The IP address and subnet that
are used by the FreeBSD stack are also specified on the command line.

E.g. ./alpine_server eth0 10.11.12.13 255.255.255.0
This tells the alpine_server to use the "eth0" interface on Linux to
send/receive packets. It also assigns 10.11.12.13/24 as the IP address
of the FreeBSD stack.

The alpine_server first opens a socket of family PF_PACKET. This is the
recommended way to do raw packet I/O on Linux. A BPF program is compiled,
so that only packets destined for the FreeBSD stack are injected into the
stack. The BPF filter expression is "host <ip_specified_on_command_line>".
Lets call this file descriptor the 'linux_pcap_fd'.

Next we open the "tap" pseudo-device in the FreeBSD stack. This device
presents an Ethernet device interface to the FreeBSD stack.
On the other side the "tap" device returns an 'fd' that can be read and
written to inject raw ethernet packets into the FreeBSD stack. Lets call
this file descriptor the 'freebsd_tap_fd'.

The alpine_server now configures this "tap" device by setting its MAC
address to the MAC address of the interface specified on the command line.
It also sets the IP address of the "tap" device to that specified on the
command line.

Now the job of the alpine_server is simply to read a packet from
'linux_pcap_fd'; run the packet through the BPF filter, and write
the packet to 'freebsd_tap_fd'. In the other direction it reads from
'freebsd_tap_fd' and writes to 'linux_pcap_fd'.

Simulating interrupts:

The alpine_server puts the 'linux_pcap_fd' in its select() read fdset. 
Whenever a packet arrives at the interface, select() returns and 
the packet can be read, filtered and injected into the FreeBSD stack. 

Alpine4Linux acts like a true interrupt driven stack because we
inject packets into the FreeBSD stack as and when we get them.

Software interrupts:

The alpine_server has only one thread of control running at any point in
time. There is no need to lock data structures because this thread
of control cannot be preempted; it has to voluntarily relinquish CPU by
calling mi_switch(). Thus all the spl* functions are no-ops in Alpine4Linux.

setsoftnet() is also a no-op in Alpine because we run the netisrs
periodically. In Alpine4Linux this happens at every tick (1/HZ secs).

The function do_netisrs() defined in kern/kern_netisr.c is called periodically
by the alpine_server. This function calls all the netisrs ready to run,
and gives them the opportunity to drain packets from their packet queues.

Initialization:

Alpine4Linux initializes the kernel data structures as if the kernel had
booted itself. The alpine_server contains main() that is the entry point
into the program. main() in turn calls init386() followed by mi_startup().

init386():
init386() was rewritten to only initialize the tunable variables in the
kernel like "hz" or "tick". It also initializes physical memory dependent
variables like "maxusers" and "maxproc". Alpine4Linux makes the FreeBSD
kernel believe that it is running on a machine with 1Gbytes of physical
memory.

mi_startup():
This is the stock mi_startup() from the FreeBSD kernel, since we support
sysinit in libAlpineSys.so. This function does not return and control
ends up in the scheduler() function. Alpine4Linux defines the scheduler()
function in alpine_server. Eventually control lands in the main select()
loop defined in sched_main_loop().

Timer management:

Timeout:
Timer management in Alpine4Linux is very simple. In the main select() loop,
we call hardclock() every 10 msec (this interval is based on kern.hz).
If there is any event in the current timer wheel bucket, softclock() is 
called from hardclock(). At that point the stock FreeBSD code is used to deal
with timeout events. slowtimo() and fasttimo() are indirectly called using 
this mechanism.

tsleep() and wakeup():

Alpine4Linux uses the stock tsleep() and wakeup() functions from FreeBSD
without any modifications. The blocking behavior of a process in the kernel
is implemented by mi_switch() that is defined outside the kernel.

Multiple execution contexts in the stack:

The alpine_server provides networking services to multiple clients at the
same time. It is thus imperative that the alpine_server not block in the
kernel. This is the same constraint that the FreeBSD kernel itself operates
under. Anytime a client process does an action that causes it to block (e.g.
a blocking read() on a socket), the alpine_server must store the execution
context and switch to another client process that is ready to run. If there
are no client processes ready to run, the alpine_server blocks in select().
The select() loop of alpine_server is analogous to the idle loop of a Unix
kernel.

The alpine_server provides multiple execution contexts (one for each client),
using the makecontext(3) function available in Linux. It switches between
execution contexts in the FreeBSD kernel using swapcontext(3).

The alpine_server itself executes in a 'ucontext_t' that is accessible as
a global variable (sched_thread->ut_ctx). The alpine_server (and hence 
the FreeBSD stack) executes in this context for system level events like 
timeouts, network I/O etc. 

The alpine_server executes in a 'ucontext_t' associated with a client process
whenever it is executing code in the FreeBSD kernel on behalf of the client
process. For e.g. if the client process sends a messages to the alpine_server
to read() from a socket, the alpine_server first creates a 'ucontext_t' and
switches execution to the newly created context. If all goes well and there
is data to be read, we will reply back to the client_process; the newly
created 'ucontext_t' will be destroyed and control passes back to the
main 'sched_thread' context. If there is not enough data to be read, then
the ucontext_t will need to block in tsleep() and it will call mi_switch().
mi_switch() does a swapcontext() to the main 'sched_thread'. The sched_thread 
either idles on select() or does a swapcontext() to process a request 
from another client.

The alpine_server also has to take care of "woken-up" execution contexts.
For e.g. Consider an execution context that was put to sleep because there
was not enough data to satisfy a read(). When data arrives on that socket,
that execution context becomes runnable (i.e. p->p_stat == SRUN). We check
for such "woken-up" processes just before the sched_thread sleeps in select().
It traverses all the ucontexts that are sleeping state *but* their proc
structure is runnable i.e.(ut->ut_state==UTS_SLEEPING && p->p_stat==SRUN).
if such a ucontext is found it does a swapcontext() to it. The blocked
ucontext resumes execution after the mi_switch() statement in tsleep() just
like it would in a stock FreeBSD kernel.

Interaction with the Linux kernel:

Alpine4Linux is a pure userlevel process and requires no Linux kernel
modifications. But Alpine4Linux is handing out file descriptors to
client programs (fd = socket()); it needs to ensure it does not step on
the Linux kernel's toes when it does so.

Therefore we need to map file descriptors between the FreeBSD stack and the
host OS. To see why we need this, consider a case where an application
issues a socket() system call. This call is intercepted by the libClientSocket
library and a file descriptor is assigned to the newly created socket by
the FreeBSD stack. Lets call this file descriptor the 'alpine_fd'. We need
to ensure that the value we return to the client application is an fd
that is not already been allocated and will not be allocated in the future.
Hence we open("/dev/null") on the host OS and create a mapping between
linux_fd and alpine_fd. The fd that is returned to the client from the
socket() system call is the linux_fd.

When the client comes back to do read() or write() with the linux_fd, we
will map that fd to the alpine_fd and use it in the FreeBSD stack.

Alpine4Linux compared to Alpine4BSD:

Look at the file "differences_from_alpine4bsd.txt" under the docs/ directory
for salient differences between Alpine4Linux and Alpine4BSD.

Performance:

I have not done any performance measurements for Alpine4Linux, because I
am confident that its performance sucks! Since Alpine4Linux does message
passing between the client program and alpine_server there are a lot of
copies of when reading or writing data.

Future work:

Support IPv6, IPSEC etc.

References:
[1] Alpine: A User-Level Infrastructure for Network Protocol Development
    David Ely, Stefan Savage, David Wetherall
    http://alpine.cs.washington.edu/

Appendix A:

The following are the descriptions of the 3 changes I had to make in the
FreeBSD stack to make it work on Linux. All changes are trivial and do
not affect functionality.

netinet/if_ether.c: 
printf on Linux does not have the %D modifier

netinet/in.c: 
ifconfig on Linux do not zero out sin_zero in sockaddr_in when doing
SIOCSIFADDR, SIOCSIFDSTADDR and SIOCSIFBRDADDR. This causes problems 
when binding to that IP address, because ifa_ifwithaddr() compares the 
entire 16 bytes; but there is garbage in the last 8 bytes of ifa->ifa_addr. 
The fix was to zero out sin_zero of ia->ia_addr, ia->ia_dstaddr and 
ia->ia_broadaddr in in_ifinit().

net/if.c:
We map the Linux SIOCSIFHWADDR to the FreeBSD SIOCSIFLLADDR. However 
Linux does not define the sa_len member in its sockaddr structure.
We should not return EINVAL if if the 'sa_len' does not match 
'sdl->sdl_alen' in if_setlladdr().