Introduction: Alpine4Linux is a userlevel FreeBSD 4.8 networking stack running on top of a stock Linux kernel. The original idea is attributed to [1]. However I must point out the Alpine4Linux is *not* a port of the original Alpine (referred to as Alpine4BSD henceforth). In fact there is not a single line of code common to the two implementations. Alpine4Linux has two components: 1. A daemon (alpine_server), that runs the FreeBSD stack code and does network I/O on behalf of processes wishing to use the FreeBSD stack. 2. Shared libraries (libClientSocket.so and libAlpineSupport.so) that hijack networking related system calls and divert them to the alpine_server. Why Alpine4Linux: I did this project because I was fascinated by the idea of running kernel components in a userlevel process. But seriously, I don't know why anyone would want to run a FreeBSD stack in userspace on a Linux box. The authors of the original Alpine4BSD paper cite better debugging and faster compile-test cycles when doing network protocol development. That seems to be as good a reason as any. Supported versions: The FreeBSD stack is the 4.8-RELEASE version (downloaded on April 8 2003). The only Linux dependency that the code has is that it support PF_PACKET family of sockets and it should support makecontext() and swapcontext(). Other than that it should run on any Linux distribution. uname -a on my Linux box: Linux localhost 2.4.20-13.8 #1 Mon May 12 12:20:54 EDT 2003 i686 i686 i386 GNU/Linux How does it work: The alpine_server is a Linux program that acts like a FreeBSD kernel as far as the networking stack is concerned. Just like the FreeBSD kernel it provides client programs with a socket layer. It also does network I/O on behalf of client programs. But that is where the similarity ends. A kernel presents a system call interface to client programs. The alpine_server presents a RPC interface. RPC here simply means that it listens for requests over the network. For e.g. if the client program tries to open a socket(), a message will be sent to the alpine_server. The message will indicate the type of the request (REQ_SOCKET) and its parameters (AF_INET,SOCK_STREAM). The response will contain the type of the response (RESP_SOCKET) and the return value (socket_fd, errno). Client programs link with 2 shared libraries libClientSocket.so and libAlpineSupport.so using LD_PRELOAD and LD_LIBRARY_PATH environment variables. These libraries "intercept" the socket related functions before they can be processed conventionally by the Linux libc. Instead each socket related system call (e.g. socket(), bind(), connect()) is transformed into a message to the alpine_server. The alpine_server and its clients communicate over a standard TCP socket bound to 127.0.0.1. Requirements: An unmodified networking stack: The sys/net, sys/netinet and certain files under sys/kern must be from the stock FreeBSD-4.8 release. This requirement was mostly satisfied. I had to make 3 changes to work around some differences between Linux and FreeBSD. The changes are trivial and do not change functionality. See Appendix A for more details on these changes. Ability to use unmodified Linux binaries: It should be possible to simply define the LD_PRELOAD and LD_LIBRARY_PATH variables and run any dynamically linked Linux networking application against the FreeBSD stack. E.g. It is possible to configure the FreeBSD stack interfaces using the stock 'ifconfig' on Linux. I have also run 'telnet', 'nmap' and 'ping' against the alpine_server with no problems. Unfortunately it was too difficult to use the Linux 'route' command against the FreeBSD stack. Alpine4Linux provides the 'route' program from FreeBSD, ported to Linux, that can be used to configure routing in the FreeBSD stack. Reuse as much of the FreeBSD kernel code as possible: This requirement is subjective but I think Alpine4Linux utilizes a *lot* of unmodified FreeBSD kernel code. In fact the alpine_server defines only two non-trivial functions that are required by the kernel: mi_switch() and scheduler(). scheduler() runs the main select() loop in the alpine_server. mi_switch() deals with switching the FreeBSD kernel execution context. These functions are described in detail later in this document. Alpine4BSD uses unmodified sysinit, timeouts, tsleep() and wakeup(), descriptor management for e.g. Implementation: Sending and receiving packets: The alpine_server is invoked with the name of the interface (on the host OS) that it uses to send/receive packets. The IP address and subnet that are used by the FreeBSD stack are also specified on the command line. E.g. ./alpine_server eth0 10.11.12.13 255.255.255.0 This tells the alpine_server to use the "eth0" interface on Linux to send/receive packets. It also assigns 10.11.12.13/24 as the IP address of the FreeBSD stack. The alpine_server first opens a socket of family PF_PACKET. This is the recommended way to do raw packet I/O on Linux. A BPF program is compiled, so that only packets destined for the FreeBSD stack are injected into the stack. The BPF filter expression is "host ". Lets call this file descriptor the 'linux_pcap_fd'. Next we open the "tap" pseudo-device in the FreeBSD stack. This device presents an Ethernet device interface to the FreeBSD stack. On the other side the "tap" device returns an 'fd' that can be read and written to inject raw ethernet packets into the FreeBSD stack. Lets call this file descriptor the 'freebsd_tap_fd'. The alpine_server now configures this "tap" device by setting its MAC address to the MAC address of the interface specified on the command line. It also sets the IP address of the "tap" device to that specified on the command line. Now the job of the alpine_server is simply to read a packet from 'linux_pcap_fd'; run the packet through the BPF filter, and write the packet to 'freebsd_tap_fd'. In the other direction it reads from 'freebsd_tap_fd' and writes to 'linux_pcap_fd'. Simulating interrupts: The alpine_server puts the 'linux_pcap_fd' in its select() read fdset. Whenever a packet arrives at the interface, select() returns and the packet can be read, filtered and injected into the FreeBSD stack. Alpine4Linux acts like a true interrupt driven stack because we inject packets into the FreeBSD stack as and when we get them. Software interrupts: The alpine_server has only one thread of control running at any point in time. There is no need to lock data structures because this thread of control cannot be preempted; it has to voluntarily relinquish CPU by calling mi_switch(). Thus all the spl* functions are no-ops in Alpine4Linux. setsoftnet() is also a no-op in Alpine because we run the netisrs periodically. In Alpine4Linux this happens at every tick (1/HZ secs). The function do_netisrs() defined in kern/kern_netisr.c is called periodically by the alpine_server. This function calls all the netisrs ready to run, and gives them the opportunity to drain packets from their packet queues. Initialization: Alpine4Linux initializes the kernel data structures as if the kernel had booted itself. The alpine_server contains main() that is the entry point into the program. main() in turn calls init386() followed by mi_startup(). init386(): init386() was rewritten to only initialize the tunable variables in the kernel like "hz" or "tick". It also initializes physical memory dependent variables like "maxusers" and "maxproc". Alpine4Linux makes the FreeBSD kernel believe that it is running on a machine with 1Gbytes of physical memory. mi_startup(): This is the stock mi_startup() from the FreeBSD kernel, since we support sysinit in libAlpineSys.so. This function does not return and control ends up in the scheduler() function. Alpine4Linux defines the scheduler() function in alpine_server. Eventually control lands in the main select() loop defined in sched_main_loop(). Timer management: Timeout: Timer management in Alpine4Linux is very simple. In the main select() loop, we call hardclock() every 10 msec (this interval is based on kern.hz). If there is any event in the current timer wheel bucket, softclock() is called from hardclock(). At that point the stock FreeBSD code is used to deal with timeout events. slowtimo() and fasttimo() are indirectly called using this mechanism. tsleep() and wakeup(): Alpine4Linux uses the stock tsleep() and wakeup() functions from FreeBSD without any modifications. The blocking behavior of a process in the kernel is implemented by mi_switch() that is defined outside the kernel. Multiple execution contexts in the stack: The alpine_server provides networking services to multiple clients at the same time. It is thus imperative that the alpine_server not block in the kernel. This is the same constraint that the FreeBSD kernel itself operates under. Anytime a client process does an action that causes it to block (e.g. a blocking read() on a socket), the alpine_server must store the execution context and switch to another client process that is ready to run. If there are no client processes ready to run, the alpine_server blocks in select(). The select() loop of alpine_server is analogous to the idle loop of a Unix kernel. The alpine_server provides multiple execution contexts (one for each client), using the makecontext(3) function available in Linux. It switches between execution contexts in the FreeBSD kernel using swapcontext(3). The alpine_server itself executes in a 'ucontext_t' that is accessible as a global variable (sched_thread->ut_ctx). The alpine_server (and hence the FreeBSD stack) executes in this context for system level events like timeouts, network I/O etc. The alpine_server executes in a 'ucontext_t' associated with a client process whenever it is executing code in the FreeBSD kernel on behalf of the client process. For e.g. if the client process sends a messages to the alpine_server to read() from a socket, the alpine_server first creates a 'ucontext_t' and switches execution to the newly created context. If all goes well and there is data to be read, we will reply back to the client_process; the newly created 'ucontext_t' will be destroyed and control passes back to the main 'sched_thread' context. If there is not enough data to be read, then the ucontext_t will need to block in tsleep() and it will call mi_switch(). mi_switch() does a swapcontext() to the main 'sched_thread'. The sched_thread either idles on select() or does a swapcontext() to process a request from another client. The alpine_server also has to take care of "woken-up" execution contexts. For e.g. Consider an execution context that was put to sleep because there was not enough data to satisfy a read(). When data arrives on that socket, that execution context becomes runnable (i.e. p->p_stat == SRUN). We check for such "woken-up" processes just before the sched_thread sleeps in select(). It traverses all the ucontexts that are sleeping state *but* their proc structure is runnable i.e.(ut->ut_state==UTS_SLEEPING && p->p_stat==SRUN). if such a ucontext is found it does a swapcontext() to it. The blocked ucontext resumes execution after the mi_switch() statement in tsleep() just like it would in a stock FreeBSD kernel. Interaction with the Linux kernel: Alpine4Linux is a pure userlevel process and requires no Linux kernel modifications. But Alpine4Linux is handing out file descriptors to client programs (fd = socket()); it needs to ensure it does not step on the Linux kernel's toes when it does so. Therefore we need to map file descriptors between the FreeBSD stack and the host OS. To see why we need this, consider a case where an application issues a socket() system call. This call is intercepted by the libClientSocket library and a file descriptor is assigned to the newly created socket by the FreeBSD stack. Lets call this file descriptor the 'alpine_fd'. We need to ensure that the value we return to the client application is an fd that is not already been allocated and will not be allocated in the future. Hence we open("/dev/null") on the host OS and create a mapping between linux_fd and alpine_fd. The fd that is returned to the client from the socket() system call is the linux_fd. When the client comes back to do read() or write() with the linux_fd, we will map that fd to the alpine_fd and use it in the FreeBSD stack. Alpine4Linux compared to Alpine4BSD: Look at the file "differences_from_alpine4bsd.txt" under the docs/ directory for salient differences between Alpine4Linux and Alpine4BSD. Performance: I have not done any performance measurements for Alpine4Linux, because I am confident that its performance sucks! Since Alpine4Linux does message passing between the client program and alpine_server there are a lot of copies of when reading or writing data. Future work: Support IPv6, IPSEC etc. References: [1] Alpine: A User-Level Infrastructure for Network Protocol Development David Ely, Stefan Savage, David Wetherall http://alpine.cs.washington.edu/ Appendix A: The following are the descriptions of the 3 changes I had to make in the FreeBSD stack to make it work on Linux. All changes are trivial and do not affect functionality. netinet/if_ether.c: printf on Linux does not have the %D modifier netinet/in.c: ifconfig on Linux do not zero out sin_zero in sockaddr_in when doing SIOCSIFADDR, SIOCSIFDSTADDR and SIOCSIFBRDADDR. This causes problems when binding to that IP address, because ifa_ifwithaddr() compares the entire 16 bytes; but there is garbage in the last 8 bytes of ifa->ifa_addr. The fix was to zero out sin_zero of ia->ia_addr, ia->ia_dstaddr and ia->ia_broadaddr in in_ifinit(). net/if.c: We map the Linux SIOCSIFHWADDR to the FreeBSD SIOCSIFLLADDR. However Linux does not define the sa_len member in its sockaddr structure. We should not return EINVAL if if the 'sa_len' does not match 'sdl->sdl_alen' in if_setlladdr().