Years ago, some people thought the solution for writing efficient server software to handle large numbers of connections was to spawn a thread for each connection. Threads were supposed to be extremely lightweight, but the implications of a machine trying to schedule and manage thousands of them were not deeply considered. As it turns out, not even the best thread implementations have ever been lightweight enough to make "one thread per client" a particularly efficient mechanism.
Enter even-driven programming, another trick with much greater success. Here, rather than having linear program flow (as one does in a conventional or threaded program), the software is driven by which external events occur. Each "event" (such as a timer expiring or I/O depending on a particular descriptor) gets a callback routine bound registered for it; whenever an event occurs, the associated function is called. Most GUI-operated programs are heavily event-driven, and application programmers are often far more familiar with event-driven programming than are server-software authors.
At the heart of any event-driven program is the event dispatcher, in charge of noticing that an event has occurred and calling the appropriate callback function to handle it. The event dispatcher in most event-driven programs needs a good way of determining what I/O events have taken place recently, or it won't work efficiently. The traditional Unix way to determine which file descriptors have I/O pending is the poll(2)-or in more traditional systems, select(2)-system call. Poll(2) lets you specify a set of file descriptors associated with sockets and block until I/O is pending on one of them. It then tells you on which one I/O is pending. (Note that poll and select don't work on descriptors associated with disk files.).
Server software using poll(2) or select(2) at the heart of an event-dispatch loop has been common for a long time, and the technique works fairly well for a moderate number of client programs. Unfortunately, both poll(2) and select(2) have significant defects: They are stateless and require that you construct and pass down the complete list of file descriptors to check each time they are called. The kernel then must parse the list of file descriptors and perform the same task over and over each time one of these system calls is executed. If the heart of your program is calling select(2), and you call it thousands of times a second with thousands of file descriptors under examination, you become inefficient-bad enough to burn a significant portion of your CPU simply processing the interest list.
Many people have tried to solve this problem. Sun Microsystems has added an experimental interface to Solaris to handle it: a pseudo-device called /dev/poll that lets you register a set of descriptors just once [using ioctl(2)], and read for I/O events from the device from then on without having to reregister the descriptors. The mechanism is fairly straightforward; quite a bit of high-performance server software under Solaris has started to use it.
Another proposed mechanism to get file-descriptor information is the POSIX real-time signals mechanism. POSIX real-time signals can be queued (unlike normal Unix signals) and a program can request that a signal be delivered when I/O is pending; this facility would seem to be a good way to drive an event-driven program. However, as Niels Provos at the University of Michigan has shown, the POSIX real-time signals mechanism is fatally flawed and does not scale nearly as well as /dev/poll-style mechanisms.
Among other problems, signal delivery is not guaranteed and the queues can overflow, causing you to effectively double-check lots of work. (Provos's paper appears in the proceedings of the 2000 Usenix conference.) Select(2), poll(2) and /dev/poll have another interesting flaw, however.
It's been clear for some time now that programs might be interested in many kinds of events other than I/O pending on a socket. For example, consider file-management GUIs: Typically, they display the contents of a directory to the user as a set of icons. If someone changes the underlying directory out from under the GUI (for example, by removing a file), you want the GUI to immediately reflect that change by zapping the icon. It would be nice to be able to request that the kernel inform you when a directory gets updated so that you don't need to reread the directory every few seconds to find out. Unfortunately, most Unix kernels have no way to send you an event when a directory change occurs.
Consider also if you want a program that fires every time someone logs in to a system. An obvious way to do that is to get your program to wake up whenever the utmp file gets updated. Unfortunately, there is no way to get an event whenever something touches a particular file on disk. The only way to achieve that is to constantly check whether the file gets updated, which is actually quite inefficient.
Consider a program that is interested in learning when a machine is plugged into a docking station, when a machine is awakened after sleeping, when a system's IP address has changed, or when a host of other things happen. It appears that there is a wide variety of events in which a program might be interested, not simply when I/O is pending on a socket. While many people have talked about building a general mechanism in the kernel to allow
registration for a wide variety of events, but not much has happened.
| KEY POINTS |
- Thread implementations haven't proved to be efficient. Event-driven programming has, and it's now used heavily in GUI-operated systems.
- The POSIX signals mechanism is fatally flawed.
- The promising "kqueues" mechanism is expected to show up in the next NetBSD release.
|
One breakthrough: Jonathan Lemon of the FreeBSD project has implemented a mechanism he calls kqueues-a set of system calls designed specifically to deliver events of a variety of types to the user. (For further details, see Lemon's papers at http://people.freebsd.org/~jlemon/)
Kqueues let you wait on all sorts of events besides just I/O on socket descriptors, such as changes to directories or files. Since the mechanism is completely extensible, in the long run it can incorporate things such as informing programs of power events, devices appearing or disappearing.
The kqueues mechanism is, I think, one of the more interesting recent API additions in Unix-like systems. Lemon's work appears in recent versions of FreeBSD and is expected to be in the next major release of NetBSD (probably to be named NetBSD 1.6). Already, software authors are starting to take advantage of this mechanism; it will be interesting to see how widely it is adopted.
-Perry Metzger is a key member of the NetBSD Project.
|