Académique Documents
Professionnel Documents
Culture Documents
Contents
1 2 3 What is Klogger? How does Klogger Work? Klogger Schemata 3.1 Klogger Conguration File . . . . . . . . . . . . . . . . . . 3.1.1 Conguration File Naming . . . . . . . . . . . . . . 3.1.2 Declaring Events . . . . . . . . . . . . . . . . . . . 3.1.3 Declaring Enumerations . . . . . . . . . . . . . . . 3.1.4 Event Inheritance . . . . . . . . . . . . . . . . . . . 3.2 Logging Events . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Using Hardware Performance Counters . . . . . . . . . . . 3.4 Binding Hardware Performance Counters . . . . . . . . . . 3.5 Reading Hardware Performance Counters . . . . . . . . . . 3.6 Adding Support for Other PentiumIV Performance Counters Compiling a Klogger Enabled Kernel Using Klogger at Runtime 5.1 Enable/Disable Logging . . . . 5.2 Binary to Text Conversion . . . 5.3 Enable/Disable Specic Events . 5.4 Default Events . . . . . . . . . . 5.5 Buffer Size and Low Water Mark 5.6 Event Benchmarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 2 3 3 3 3 4 4 5 5 6 7 8 8 9 9 9 9 10 10 11 11 11 12 12 13 13 13 14 17 17 1
4 5
Klogger Perl Module 6.1 Control Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 Log Analysis Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Current Schemata 7.1 Stopwatch Schema . 7.2 Scheduler Schema . . 7.3 Locking Schema . . 7.4 Networking Schema . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8 9
1 What is Klogger?
Klogger is a framework for post mortem analysis of the Linux kernel behavior. It is designed so that developers can easily dene events they want logged and log them in the kernel code. At runtime, the Klogger infrastructure is enabled for the period of time the developer wishes to analyze, logging all events into a special log le, which can later be analyzed. Klogger excels at its low overhead and exibility, allowing researchers to analyze performance bottlenecks that are very difcult to reach with standard logging tools, while not requiring the user to become a kernel hacker. This document is an attempt to allow the research community to gain experience with Klogger, while possibly providing feedback about the tool and its features. Klogger exibility is embodied in its support for different schemata different disjoint event sets, dedicated to the analysis of different subsystems. Schemata enable users to analyze only the subsystem that interests them, and create new Klogger schemata specic for their research (read more about schemata in sections 3 and 7). We are trying to collect a host of schemata which will allow both developers and researchers to study the various kernel subsystems without having to dive into the actual implementations. The best example is a storage systems researcher who does not have to fully understand the implementation of the subsystem and driver in one kernel or another, as long as he can get information about the blocks read from/written to storage, their order, timings etc. By supplying a schema logging the major events of the storage subsystem, we can allow more people to be involved in cutting edge research. The same goes for most parts of modern kernels networking, scheduling, synchronization, le systems, etc.
3 Klogger Schemata
A Klogger schema denes a specic facet of Klogger. One way of looking at it is that the Klogger framework supplies the mechanism how to log while the schemata supply the policy what to log. Technically, when we refer to a Klogger schema we refer two les: a conguration le dening the events to be logged and their payload, with the other le being a kernel patch containing the invocations of klogger where are the events dened in the rst le actually get logged. The next two sections will describe how to dene and log events. The rst will describe the syntax of the Klogger conguration le, and the second will describe how to log events.
Klogger Type short ushort int long longlong ushort uint ulong ulonglong stringN
C Type short unsigned short int long long long unsigned short unsigned int unsigned long unsigned long long char[N]
Table 1: The basic types available in a Klogger event, and their C equivalents. { header => { "type" "serial" "timestamp" }, "pid" }, Though extracting the text based log is described in Section 5.2, for now simply note that each event appears as a Perl hash. The hash contains all the elds described in the event denition and a header containing the event type (as text), its serial number in the log and a timestamp taken from the hardwares cycle counter. 3.1.3 Declaring Enumerations Enumerations are declared in a syntax similar to C enums. The basic template is: enum enumerationtype { one, minusone = -1, four = 4, TWO = 2, Three } The type enumerationtype may be later used in an event as a type. Integer values will automatically be converted to their equivalent strings when decoding Kloggers output. 3.1.4 Event Inheritance Event inheritance is an advanced feature of Klogger, enabling analysts the denitions of inclusive groups of events, thereby dening different log levels. The inheritance model, allows schema authors to specify the levels of the various events logged by Klogger and include log levels in other log levels using the include command. Similar to C++ syntax, the currently used log level will be declared before all log level declarations with the using command. Furthermore, when a schema author would like to omit an event from an included log level, he may use the omit command. Such action could be done in order to lter out information unneeded in a specic log level, thereby reducing the information generated by Klogger. 4 => "SCHEDOUT", => "119", => "1032071755760", => "1073",
Event inheritance syntax is as follows: using two loglevel one < event ONE_DO { ... } event ONE_SIMPLE { ... } > loglevel two < include one omit ONE_SIMPLE event TWO_DETAILED { ... } > A more detailed example of log levels can be found in the network schemas conguration le.
Counter Name dtlb_miss dtlb_miss_os dtlb_miss_user itlb_miss itlb_miss_os itlb_miss_user l1_cache_misses l1_cache_misses_os l1_cache_misses_user l2_cache_misses l2_cache_misses_os l2_cache_misses_user
Counter Description Page walks for a data TLB miss, all protection levels. Page walks for a data TLB miss, OS protection level. Page walks for a data TLB miss, user protection level. Page walks for an instruction TLB miss, all protection levels. Page walks for an instruction TLB miss, OS protection level. Page walks for an instruction TLB miss, user protection level. L1 cache misses, all protection levels. L1 cache misses, OS protection level. L1 cache misses, user protection level. L2 cache misses, all protection levels. L1 cache misses, OS protection level. L1 cache misses, user protection level.
Table 2: Hardware performance counters currently supported for the PentiumIV architecture. For full descriptions and the list of all counters supported by the hardware please refer to [5]. through the number of retired instructions, and sometimes even the number of memory operations queued on the processor. Needless to say, these counters are processor dependant. Moreover, these counters may not even be backward compatible with previous versions of the same architecture. To make Klogger as efcient as possible, all pieces of code accessing the hardware performance counters are inlined. The result is that all abstractions used are taking place at compile time, incurring no runtime overhead from abstract indirections (as oppose to the approach taken by other tools, such as PAPI [2]). In that context, user should be warned that since the counters are processor model dependant, code compiled with performance counter support for one model may not run on another! Simply put, kernels compiled to utilize PentiumIIIs performance counters might not even boot on a PentiumIV machine.
Lets return to our SCHEDOUT example for a moment. Imagine we want Klogger to log not only the preempted processs pid, but also to log the number of overall L2 cache misses it has suffered. Since we only want the event occurring when the process was running (regardless of any interrupts handled by the kernel during that time), we need to bind the rst virtual counter to the underlying hardwares counter of the L2 cache misses occurring at the user protection level the l2_cache_misses_user counter. As such, the architecture dependant section in the conguration le would look like this: arch PentiumIV { counter1 l2_cache_misses_user; } Also, we would add another 64bit integral eld to the SCHEDOUT event, saving the counters value at the moment of preemption: event SCHEDOUT { int pid ulonglong l2_cache_misses_user } which means the textual log will hold the event as { header => { "type" "serial" "timestamp" }, "pid" "l2_cache_misses_user" }, The next section describes how to read the hardwares performance counters, and how to log our extended SCHEDOUT event. => "SCHEDOUT", => "119", => "1032071755760", => "1073", => "35678014",
klogger_get_EVENTNAME() klogger_get_EVENTNAME_ll()
where EVENTNAME is the actual name used in the conguration le. Returning once again to our SCHEDOUT example, we can read the counter we dened using either function type. We bound logical counter 1 to the l2_cache_misses_user event. Thus, using the rst function type, logging would take place using the following line: klogger(SCHEDOUT, task->pid, klogger_get_counter(KLOGGER_COUNTER1); If we want to use the function based on the event names, we just use the following line: klogger(SCHEDOUT, task->pid, klogger_get_l2_cache_misses_user()); Sometimes the logging is not completely straightforward, involving some code preparing the data to be logged. In such cases, it may be important to know for which architecture the kernel was compiled. For that reason we use the KLOGGER_ARCH_COMPILED macro which is set to one of the klogger_arch_t enumeration values, dened in include/asm/klogger.h.
foo[ linux-2.6.9 ] patch -p1 < KLOGGER/klogger-2.6.9/patch/klogger-2.6.9.patch foo[ linux-2.6.9 ] cp KLOGGER/klogger-2.6.9/patch/klogger.conf.base .klogger.conf.ba 2. Adding schemata Now you can either create your own schema, or use a ready made one supplied with the distribution (for example, the scheduler schema): foo[ linux-2.6.9 ] patch -p1 < \ KLOGGER/klogger-2.6.9/schemata/sched/klogger-scheduler-schema-2.6.9.patch foo[ linux-2.6.9 ] cp KLOGGER/klogger-2.6.9/schemata/sched/klogger.conf.sched \ .klogger.conf.sched
3. Enable KLogger in the kernel conguration Enable [Kernel hacking > KLogger] in the kernel conguration menus. A second option, [Kernel hacking > KLogger benchmarks] also enables runtime measurements of each events logging overhead (more on benchmarks in Section 5.6). Now compile the kernel, and log like the wind...
10
To avoid stressing the memory by allocating large, physically continuous buffers Klogger uses virtually continuous buffers. However, there is a limit on the kernels virtual memory size, which is set to 128MB on Linux 2.6.9 kernels. If the total memory allocated exceeds this limit, you can boot the machine with the special vmalloc boot time ag, resetting the kernels virtual memory limit. For example, when using 128MB buffers on a 4-way machine, we added the vmalloc=640MB boot parameter so the kernel will have enough virtual memory for both the 128 4 = 512M B needed by Klogger, alongside the original virtual memory space it uses. Resetting the low-water mark is done in a similar manner, using the /proc/sys/klogger/lowwater le. The number written to that le is the percent of the memory buffer space below which the buffer will be emptied (default is 10%). For example: foo[ 10% foo[ foo[ 5% foo[ /proc/sys/klogger ] cat lowwater /proc/sys/klogger ] echo 5 > lowwater /proc/sys/klogger ] cat lowwater /proc/sys/klogger ]
Algorithm 1 The basic skeleton of a Klogger analysis script. #!/usr/bin/perl -w use klogger; # Open the log file my $logh = klogger_open_log("/some/log/file.txt"); # While events keep coming, process them. for(my $event = klogger_next_event($logh); defined($event); $event = klogger_next_event($logh)) { Process the event... } # close the log klogger_close_log($logh);
7 Current Schemata
As previously described, a major goal when designing Klogger was to separate the mechanism from the policy. While the core of the framework is regarded as the mechanism, the schemata are the available policies. In this section we describe the schemata that are supplied with Klogger itself. We hope users will develop more schemata, sharing them with us and the community, so that future versions of Klogger will deliver a variety of subsystems schemata.
12
3. Big Kernel Lock (BKL), a relic from early SMP support in Linux, which was meant to be a transitional solution from the monolithic support (only one CPU running kernel code at any given time) to the ne grained support (separate locks for each global data structure). Although the BKL has been deemed a deprecated feature, it is still widely used in some parts of the kernel. A more detailed discussion of locking in Linux can be found in [1] (and a general discussion in [6]). The events included in this schema are: 1. SPINLOCK_INIT A spinlock was initialized. 2. SPINLOCK_FINISH A spinlock was released. This event also saves the time in which the lock was acquired. 3. PREEMPT_SPINLOCK_FINISH A spinlock was released on kernel preemption. This event also saves the time in which the lock was acquired. 4. RWLOCK_INIT A read/write lock was initialized. 5. READLOCK_FINISH A spinlock was released from reader context. This event also saves the time in which the lock was acquired. 6. WRITELOCK_FINISH A spinlock was released from writer context. This event also saves the time in which the lock was acquired. 7. PREEMPT_WRITELOCK_FINISH A spinlock was released from writer context on kernel preemption. This event also saves the time in which the lock was acquired. 8. BKL_LOCK_FINISH The BKL was released (again, also saving the in which time it was acquired).
5. SOCKET_ACCEPT A socket has been accepted to connect to another socket. 6. SOCKET_WRITE A socket is writing data to its target. 7. SOCKET_RECEIVE A socket is receiving (processed) data from its target. Connection level: 1. SOCKET_BIND (REPLACES SOCKET_ASSIGNADDR) A socket has been bound to an IP address and port and was set to communicate using a specied protocol. 2. SOCKET_CONNECT (REPLACES SOCKET_ASSIGNADDR) A socket has connected to an IP address and port and was set to communicate using a specied protocol. 3. CONNECTION_HANDSHAKE A socket has begun in a protocol connection handshake. 4. CONNECTION_CLOSE A socket has closed its connection with the other socket it was connected to. Protocol level: 1. TCP_SENDSYN A SYN packet has been sent from the specied TCP socket. 2. TCP_RECVSYN A SYN packet has been received to the specied TCP socket. 3. TCP_SENDSYNACK A SYN-ACK packet has been sent from the specied TCP socket. 4. TCP_RECVSYNACK A SYN-ACK packet has been received to the specied TCP socket. 5. TCP_SENDACK An ACK packet has been sent from the specied TCP socket. 6. TCP_RECVACK An ACK packet has been received to the specied TCP socket. 7. TCP_SEND The specied amount of data is being sent from the specied TCP socket. 8. TCP_SENDPACKET A fragmented/whole (unspecied) TCP packet is being queued for sending in the specied TCP socket. 9. TCP_RECEIVE The specied amount of data (processed) is being received to the specied TCP socket. 10. TCP_RECEIVEPACKET A fragmented/whole (specied) TCP packet has been received to the specied TCP socket and is queued for processing. 15
11. TCP_RECV_URGENTDATA The specied amount of data (processed) is being received to the specied TCP socket. The packet carries the URG (urgent) ag and is processed before other non-urgent data. 12. TCP_DISCONNECT A TCP socket has disconnected from its target. 13. UDP_PUSHPENDINGFRAMES A UDP socket is pushing all of its pending frames to a specied socket buffer. 14. UDP_FLUSHPENDINGFRAMES There has been a cork in the specied UDP socket and therefore all of the pending frames are being ushed immediately. 15. UDP_QUEUESOCKETBUFFER A UDP socket is queueing IP packets received from a specied socket buffer for processing. 16. UDP_RECEIVE The specied amount of data (processed) is being received to the specied UDP socket. 17. UDP_SEND The specied amount of data is being sent from the specied UDP socket. 18. UDP_CONNECT A UDP socket has connected to its target using the UDP protocol. 19. UDP_DISCONNECT A UDP socket has disconnected from its target. 20. ICMP_SEND An ICMP packet, containing constant data (specied as enumerated code types), is being sent. 21. ICMP_REPLY An ICMP packet reply packet is being sent back to the ICMP packet sender. 22. ICMP_RECEIVE An ICMP packet, containing constant data (specied as enumerated code types), has been received. 23. ICMP_PUSHPENDINGFRAMES An ICMP socket is pushing all of its pending frames to a specied socket buffer. IP Level: 1. IP_PUSHPENDINGFRAMES Pushed pending frames from a protocol have been received from a specied socket into the specied socket buffer. 2. IP_FLUSHPENDINGFRAMES There has been a cork in the specied socket buffer and therefore all of the pending frames are being ushed immediately and wont be sent. 3. IP_SENDPACKET The specied IP socket buffer has sent a packet. Whether its a fragment or not, its fragment offset, packet size, total size, the requesting PID and the current MTU are all logged. 4. IP_RECEIVEPACKET The specied IP socket buffer has received a packet (unprocessed). Whether its a fragment or not, its fragment offset, packet size, total size, the requesting PID and the current MTU are all logged. NOTE: The total size of the de-fragmented data is not applicable until the last fragment arrives.
16
9 Future Work
1. Complete this document. 2. More schemata 3. Add support for hardware performance counters on a variety of architectures. 4. Make Klogger available at boot time. 5. Let the user change the output log le name. 6. Verify cycle synchronization on SMPs.
References
[1] D. P. Bovet and M. Cesati. Understanding the Linux Kernel. OReilly & Associates, 2nd edition, 2003. [2] S. Browne, J. Dongarra, N. Garner, K. London, and P. Mucci. A scalable cross-platform infrastructure for application performance tuning using hardware counters. In Supercomputing, Nov 2000. [3] Y. Etsion, D. Tsafrir, S. Kirkpatrick, and D. G. Feitelson. Fine grained kernel logging with klogger: Experience and insights. In 2nd ACM EuroSys, pages 259272, Mar 2007. [4] B. O. Gallmeister. Posix. 4: Programming for the Real World. OReilly & Associates, January 1995. [5] Intel Corp. IA-32 Intel Achitecture Software Developrs Manual. Vol. 3: System Programming Guide. [6] C. Schimmel. UNIX Systems for Modern Architectures. Addison Wesley, 1994.
17