The standard kernel profiler (as used by readprofile(1)) is rather rudimentary but commonly used as it comes packaged with the kernel for most architectures; at the time of writing, only the cris, s390 and parisc ports don't implement it. It is a simple statistical profiler that stores the kernel PC value into a scaled histogram buffer on every timer tick. Note that only the kernel image is profiled; no user-space, no kernel modules etc. It is also incapable of profiling code where interrupts are disabled.
The profiler requires a reboot with the option "profile=2" to enable it. The value passed is the multiplier, which determines the spatial resolution of the histogram, as described in the man page. The histogram can be cleared by simply writing to the profile buffer device /proc/profile. Some architectures allow you to set the sampling frequency by writing a value to this file. For example on x86, writing a value will set the APIC timer appropriately (lower values mean more frequent interrupts).
When some profile data is collected, readprofile(1) can be used to print out a simple function-based summary, as shown in this excerpt :
... 743 kmalloc 1.6294 491 handle_IRQ_event 5.3370 371 __rdtsc_delay 13.2500 348 kmem_cache_alloc 0.8878 ... |
Each line consists of the number of samples against that function, the name of the function, and the normalised load. The normalised load is calculated by (raw_count(f)/size_in_bytes(f)) - the idea is that you can expect more samples against larger functions. However due to a number of issues, this normalisation isn't particularly useful.
Red Hat ship a simple patch in their kernels to allow readprofile(1) to use NMI interrupts. NMI interrupts cannot be disabled by the IF bit in eflags; this means that you can get profile data from interrupt handlers and code that runs with interrupts disabled (such as code protected by spin_lock_irq()). Note that it only works in this mode when using the NMI watchdog facility, as it relies on the watchdog code to generate the NMIS. The patch can be found here.
The user-space readprofile(1) utility has some incompatibilities with recent Linux kernels: if you are having problems, it is recommended that you upgrade to a recent util-linux version. The problem is related to mis-parsing of vmlinux's nm output.
A patch to enable readprofile(1) to profile kernel modules can be found here. (Note that I have not tested if this patch works, and it is bit-rotted).
[kerneltop] is a simple modification of Section 4.1.1 that displays the counts in a top style, clearing the counters at each iteration. Can be very useful for observing the time spent in the kernel as a certain operation is undergoing.
[minilop] is another Section 4.1.1-derived utility. It adds a feature to show disassembly for each histogram bin of code that has a sample against it, and calculation of the relative load, as well as remembering the peak count for each entry. This project seems to be abandoned.
SGI's kernprof[kernprof] is a powerful kernel-only profiler for the ia32, ia64, sparc64 and mips64 architectures. It cannot profile modules or interrupts-disabled code. Profiling can be enabled and controlled at runtime. It comes in the form of a large kernel patch and associated user-space tools. Some users may find they need to apply a patch to gcc before the profiler can be built.
Kernprof supports a number of different profiling techniques. Its simplest mode creates a PC value histogram for the kernel. Both standard timer-interrupt based sampling, and sampling based on the hardware performance counters, are supported (use of the hardware counters is not supported on all systems). Allowing the use of the performance counters gives a significant power to kernprof, as relevant performance events such as cache misses can be analysed.
Kernprof also supports a number of other profiling modes. The kernel can be built with support for collecting (annotated) call graphs, although this has a significant overhead. There is also the ability to collect exact function call counts via mcount(). Some of these modes can be combined in order to improve the information collected without impinging on performance too badly. Most of these modes generate their data in gprof's gmon.out format. Per-CPU profiles can be created, which can prove useful for analysing SMP performance.
Note that because the sampling method cannot be triggered whilst interrupts are disabled, results must be taken with a pinch of salt in some cases. In particular, if the hardware counters are used, events will still be counted, but will have a tendency to appear in the profiles at code points directly after interrupts become re-enabled. However, a patch to enable NMI-based profiling for kernprof can be found here.