<?xml version="1.0" encoding='ISO-8859-1'?>
<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN" "http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd">

<!--
 
FIXME: vtune remote profiling
 
FIXME: glibc pcprofile[dump]
gcc value profiling by Zdenek Dvorak
sprof is in debian libc6-prof
 
http://people.redhat.com/drepper/ 
http://kernelnewbies.org/kernels/rh73/SOURCES/linux-2.4.9-nmiprofiling.patch 
http://www-106.ibm.com/developerworks/java/library/j-javaopt/?loc=j 
FIXME: ikd ktrace - where is it now ? kdb patch ? 
http://vtad.sourceforge.net/vtad-pod.html 
FIXME: http://freshmeat.net/projects/iprobetoolsuite/ 
http://caml.inria.fr/ocaml/htmlman/manual031.html 
http://sourceforge.net/projects/profilejs/ 
http://sourceforge.net/projects/compaqphoenix/ 
http://freshmeat.net/projects/hot-profile/

-->
 
<article id="linux-profiling">
<artheader>
	<title>Profiling in Linux HOWTO</title>

	<authorgroup>
		<author>
			<firstname>John</firstname>
			<surname>Levon</surname>
			<affiliation>
				<address><email>moz@compsoc.man.ac.uk</email></address>
			</affiliation>
		</author>
	</authorgroup>

	<copyright>
		<year>2002</year>
		<holder>John Levon</holder>
	</copyright>

	<revhistory>
		<revision>
			<revnumber>0.1</revnumber>
			<date>FIXME</date>
			<revremark>Initial version</revremark>
		</revision>
	</revhistory>

	<legalnotice>
	<para>This document can be freely translated and distributed. It's released
	under the LDP License.</para>
	</legalnotice>

	<keywordset>
		<keyword>Linux</keyword>
		<keyword>Profiling</keyword>
		<keyword>performance</keyword>
		<keyword>optimisation</keyword>
		<keyword>optimization</keyword>
		<keyword>gprof</keyword>
	</keywordset>
</artheader>

<chapter id="introduction"><title>Introduction</title> 
 
<section id="profilingcode"><title>Profiling code</title>
<para>
As systems get more complex, the need for
machine-assisted performance analysis grows. Kernighan and
Pike note <quote>Measurement is a crucial component of
performance improvement since reasoning and intuition are fallible guides and
must be supplemented with tools like timing commands and profilers.</quote>
<citation>tpop</citation>.
</para>
<para>
Linux is generally well-served in terms of development tools, and there are a
wide selection of profiling packages available. However, several of these
tools are not well-publicised, and many are under-documented. To date, there
has been no comprehensive survey of the choices available: this document hopes
to fill this gap.
</para>
<para>
It is worth mentioning some guidelines that should be followed when doing
performance analysis. Probably the number one rule is : <emphasis>analyse</emphasis>
the results. Think about what the results could be implying; don't take them
at face value. Consider whether the profiling technique could be harming
the accuracy of the profiling data.
</para>
<para>
Pay close attention to your profiling environment. Are you running realistic
tests ? Have you avoided narrowing in on a particular workload at the expense
of the common case ? Amdahl's law indicates the analyst should avoid focussing
on a small part of the system, until it is ascertained that optimisation will
benefit the common case.
</para>
<para>
Are you profiling production code ? Performance analysis of unoptimised code
peppered with debug statements carries the risk of mis-optimisation. Make sure
your optimisation decisions are governed by realistic data, not intuition.
</para>
<para>
Everyone knows Knuth's famous maxim <quote>Premature optimisation is the root
of all evil</quote>, but it is still ignored all too frequently (this maxim
is similar to the Extreme Programming rule "you aren't going to need it").
Too much developer time is spent optimising code that doesn't need optimisation.
This leaves open the question as to when the right time to do performance
analysis is. Commonly this is done during the alpha or beta phase of a release's 
lifecycle, and often in parallel with unstable development for far-reaching
changes. This can prove to be a problem with a development tree in high flux,
as profiling data can quickly become outdated - this is, of course, a development
management issue, and need not concern us here.
</para>
<para>
When you identify a bottleneck in your program, there are two principal ways
to view it. First, it can be considered on the procedural level: this is the sort
of analysis that leads to, for example, inner loop optimisation, inlining decisions,
and other such transformations. Second, an architectural point of view can be taken: here
the underlying algorithms are considered; why does the particular algorithm used
not work efficiently enough for the important cases, and how can the system be
re-worked to fix this.
</para>
<para>
Both points of view are of use, though it is probably fair to say that the
architectural considerations are more important. Re-workings on this level
more often than not lead to more significant gains than procedural analysis,
although they are offset by higher development costs. Procedural analyses are
most useful when tweaking the performance of a system approaching the end of a
release cycle, and are generally cheap to implement. The majority of premature
optimisation is a result of procedural changes guided by intuition. Procedural
changes often makes code harder to read; this accretion of junk code can
easily turn into a significant maintenance burden, especially with large
projects. In general the developer should avoid making micro-optimisations that
could affect code readability until they have proven their worth in extensive
analysis work.
</para>
</section> 
 

<section id="about"><title>About this document</title>
<para>
This HOWTO describes the methods and software a developer can use for
performance analysis on the Linux platform. This document in general focuses
on the Linux/x86 platform, although much of it applies to other architectures
as well.
</para>
<para>
<xref linkend="techniques" /> is a brief survey of the basic methods which are used
for profiling. If you already have some familiarity with the basic profiling
terminology, you may skip this section.
</para>
<para>
<xref linkend="support" /> discusses the kernel and user space facilities found
in the Linux environment that provide support for profilers. You can probably
skip this section if you're not interested in profiler implementations.
</para>
<para>
<xref linkend="profilers" /> provides an overview of the available profilers
on Linux, providing brief synopses of their implementation, and considering their
relative merits.
</para>
<para>
The bibliography at the end of this document collects several relevant websites,
articles, and research papers, and an exhaustive list of the software
described in <xref linkend="profilers" />.
</para>
<para>
This document is actively maintained; please contact the author with any
suggestions, corrections, or confusions. Note that the primary author of this
document is also the project lead for <citation>oprofile</citation>, though all
efforts have been made to provide a disinterested review.
</para>
</section>
 
</chapter>

<chapter id="techniques">
<title>Profiling techniques</title>

<para>
There has been a large number of profiler implementations, and there is
a significant body of literature on performance analysis. This chapter
briefly covers some of the terminology used in <xref linkend="profilers" />,
and describes some profiler design parameters.
</para>
 
<section id="designaims"><title>Design aims of a profiler</title>
<para> 
As an important part of a programmer's artillery, a profiler should avoid
getting in the way of the human analyst. This leads to a number of design
parameters every profiler should aim towards :
</para> 
<variablelist>
<varlistentry><term>Unobtrusive</term>
	<listitem>
		A profiler should not require a significant expenditure of
		developer effort. The need for recompiles, preprocessors,
		special modifications to the toolchain and the like should
		be avoided, as they are inconvenient to the developer. An
		ideal solution should allow profiling at will, without needing
		such changes.
	</listitem>
</varlistentry> 
<varlistentry><term>Accurate</term>
	<listitem>
		The data and reports generated by the profiler system should
		aim towards accuracy. Inaccurate data runs the risk of mis-informing
		the developer of the true situation, leading to wasted effort
		and maintainability problems.
	</listitem>
</varlistentry> 
<varlistentry><term>Complete</term>
	<listitem>
		Profilers should aim towards a complete data set. If a system
		component or facet is not represented in the results, the
		developer may not be aware of its impact on the system as a
		whole.
	</listitem>
</varlistentry> 
<varlistentry><term>General</term>
	<listitem>
		Profilers should avoid special-purpose techniques where possible.
	</listitem>
</varlistentry> 
<varlistentry><term>Fast</term>
	<listitem>
		If the profiling method is too slow, it will impinge on the 
		developer's hacking time. Slow profilers often can't be
		used in realistic environments, which makes collecting
		meaningful data hazardous and prone to error. 
	</listitem>
</varlistentry> 
</variablelist>

</section>

<section id="profiledata"><title>Profile data</title>
<para>
Profile data covers a wide range of data types, including event logs,
execution counts, resource attributions, and more. Any tool that can
generate data as input to a performance analysis can, in some sense,
be considered to be a profiling technique. By definition, profiling data
must be collected at runtime; this fact restricts the available methods
to a few main techniques.
</para>
<para> 
Profile data can be produced in a number of different forms. At the most simple end
are accumulated event counts which can be used for a broad understanding
of the workloads. Event logging, which is a form of tracing, is another
related form. Generally event logs require some form of processing in order
to reveal interesting performance data. Time-based data characterises
how long operations, or sections of code, execute for, typically measured
in real time, or virtual per-process CPU time. <glossterm>Call-graph</glossterm>
data collects data with regards to the path to the code under question. For example,
a periodic stack trace is a simple form of call-graph information. More typically,
call-graph information is represented in an accumulated form at function granularity.
This allows the developer to determine more easily the focal point of
any performance problems in the source code.
</para>
<para>
All such data can be classified as either <glossterm>exact</glossterm>
or <glossterm>statistical</glossterm>. Exact data tells the whole story:
no elements are missing from the data. For example, function call counts are
usually calculated at <emphasis>every</emphasis> function call, so the
total counts are 100% accurate.
</para>
<para>
Statistical data, in contrast, is not 100% accurate. Rather, for the data to be
useful, it is expected that it is a realistic representation of the true data set.
The data set is some fraction
of the data that would have been generated by the profiler input. For example,
a CPU time histogram of functions is rarely exact. Estimating time spent in each function
by sampling the PC counter regularly is a very common profiling technique. There are two 
sub-types of statistical data. First there is data that is inaccurate due to the inherent
uncertainty of certain measurements: for example, a cache line miss data point may
accurately represent the number of actual cache line misses, but the lack of context
means that some misses due to other system processes are not filtered out from
the result set. A more common example is the granularity of certain timing tools.
The second type of inaccuracy is usually a result of examining profiling methods,
and the inherent limitations of their resolution. For example, a profiler that
accounted the function being executed every 10ms could easily skew the results
in favour of functions that take longer, even when there are faster functions
that are called far more often.
</para>
<para>
One of the main reasons statistical profiling is so common is that collecting exact
data often incurs a cost in overhead, and often that cost is prohibitive. Thus
this design choice is a tradeoff between speed/obtrusiveness and accuracy.
</para>
<para>We have mentioned <glossterm>examining</glossterm> profilers.
These constitute one of the main classes of profiling techniques. They are characterised
by a periodic collection of profiling data. This technique inevitably gives
statistically-bound results, unless an <glossterm>accounting</glossterm> technique
is used in concert with the periodic collection. An accounting profiler
collects exact counts for some particular data item, for example, number of
major page faults. The exact nature of accounting profilers implies more reliable
data, but there can often be costs in terms of obtrusiveness of the technique used.
</para>
</section>
 
<section id="instrumentation"><title>Instrumentation methods</title>
<para>
Commonly, the target application must include some <glossterm>instrumentation</glossterm>
to enable the profiling mechanisms to operate. Sometimes only preliminary
start-up code needs to be added, and this is easily acheived via mechanisms
such as <constant>LD_PRELOAD</constant>. Accounting profilers often need to add instrumentation
at a fine-grained level, and there are a number of different techniques in use :
</para>
<variablelist>
<varlistentry><term>Simulation</term><listitem>
A simulator can easily collect detailed data as part of the simulation
run. Such techniques tend to be very obtrusive and slow, so are best
used when the level of detail is critical.
</listitem></varlistentry> 
<varlistentry><term>Source-level instrumentation</term><listitem>
Source-level instrumentation involves altering the source code
that eventually becomes the application by inserting profiling code.
This can happen semi-automatically via a pre-processor, or may
require a programmer to add explicit calls to some profiling API.
</listitem></varlistentry> 
<varlistentry><term>Compile-time instrumentation</term><listitem>
The compiler itself can be used to insert profiling code. This has
the advantage above source-level instrumentation of being more convenient,
but of course requires the source code to be recompiled, which is
not always practicable.
</listitem></varlistentry> 
<varlistentry><term>Offline binary instrumentation</term><listitem>
Binary images that contain the text sections for shared libraries
or applications can be rewritten to add instrumentation. This technique
is complex to implement, but is relatively unobtrusive unless system-wide
performance data is needed.
</listitem></varlistentry> 
<varlistentry><term>Online binary instrumentation</term><listitem>
Mapped binary images are rewritten to add instrumentation. To some degree,
just-in-time compiling environments are in this class of techniques.
</listitem></varlistentry> 
</variablelist>
</section>
 
<section id="relatedtools"><title>Related tools</title>
<para>
Profiling is amongst a class of runtime program examination techniques which
also includes tracing and runtime debugging. Tracing is very similar to
profiling, but differs in focus. Tracing is most commonly used as a method of
examining program logic, rather than application performance. Tools such as
<command>strace</command>, <command>ltrace</command>, Electric Fence, and 
garbage-collection in leak trace
mode exemplify typical tracing systems. However, event-based profiling is
concerned with examination of particular event data, so is strongly related to
tracing. Runtime debugging utilties such as gdb are only used for examining
program logic on a detailed level.
</para><para>
Where these methods coincide with profiling is mainly in the implementation
techniques used. Function call counts can be implemented with the same basic
mechanism as tracing utilities; instrumentation is another technique commonly
used throughout this area. Both tracing and debugging are complex areas, and
deserve separate discussion of their own.  
</para>
</section>
 
</chapter>

<chapter id="support">
<title>Support mechanisms</title>

<section id="hardwaresupport"><title>Hardware support mechanisms</title>

<para>
CPU manufacturers recognised several years ago that profiling was increasing
in importance, and as a result many CPUs, such as the MIPS R10000, the Alpha/AXP,
the Intel Pentium series, and more, provide at least some hardware support to
assist a software-based profiler. At one extreme, bolt-on hardware has been
produced to assist in profiling, for example <citation>profileme</citation>. 
</para>
<para>
One of the simplest things a CPU can provide is a high-resolution timestamp
counter such as the Pentium's TSC. This allows interstitial timing harnesses
for measuring operation latency to a high degree of accuracy.
</para>
<para>
At the next level of complexity there are performance counters. These are typically
registers that count events of interest such as cache line misses. The benefits of such
counters are well known<citation>mipsr10000</citation><citation>monitor</citation>:
actual data from the hardware that can be attributed to sections of source code
removes a lot of the black magic previously associated with performance analysis.
</para>
<para>
Typically software using such counters either periodically check the value of
the counter, or, if possible, use counter overflow events to generate an interrupt,
which then logs the overflow event against the currently executing code.
</para>
<para>
More recent architectures have gone even further in terms of support, providing
much of the data collection machinery in hardware<citation>ia64</citation><citation>ia32</citation>.
</para>

</section>
 
<section id="kernelsupport"><title>Kernel support mechanisms</title>

<para>
Many UNIX systems support the <command>profil(2)</command> system call. This is
an examining profiler that forms part of the kernel. The timer interrupt, or some
other periodic timer, collects the PC value at the time of interrupt, and stores
this in the relevant bin in a histogram buffer supplied by user space. This simple
technique is reasonably fast, but has issues with resolution; also it is inflexible.
</para>
<para>
Linux does not implement <command>profil(2)</command>, preferring a user space
solution (see <xref linkend="librarysupport" />). For reference, version 4 of the 
GNU C library included a patch for a kernel implementation of
<command>profil(2)</command>.
</para>
<para>
The kernel provides the necessary support for POSIX interval timers, via
<command>setitimer(2)</command>. The timer type <constant>ITIMER_PROF</constant>
counts both user-space time and the time the target process spends in the
kernel, and delivers a <constant>SIGPROF</constant> signal on expiration.
A profiler may install a signal handler for <constant>SIGPROF</constant>
and use the <structfield>si_addr</structfield> field of the <structname>
siginfo_t</structname> structure to collect a PC value histogram.
Unfortunately this technique is low-resolution, and the use of signals
can cause problems with profiler overhead.
</para>
<para>
The IA-64 port provides an interface<citation>ia64</citation>
to the hardware performance mechanisms
with the <command>perfmonctl(2)</command> system call. The standard IA-32 kernel
features drivers for user-space access to the machine-specific registers,
which can be used to set up the hardware performance counting mechanisms
<citation>ia32</citation>,<citation>athlon</citation>.
</para>
<para>
Linux kernels from 2.5.43 onwards provide the OProfile profiler interface, discussed later.
</para>
<para>
Text-format information is available for every process in the system via the
<filename class="directory">/proc</filename> file system, with a directory
for each process named by its process ID. You can collect page fault data,
memory usage data, and similar statistics from these files, which may be
useful for characterising performance.
</para>
<para>
The <filename class="directory">/proc</filename> file formats are mostly described
in the <command>proc(5)</command> manpage (make sure you have a recent
<filename>man-pages</filename> package installed<citation>manpages</citation>). 
When in doubt, look in
the kernel source (<filename class="directory">fs/proc/</filename>), and
the source for <command>top(1)</command>, <command>ps(1)</command>, etc.
</para>
</section>

<section id="compilersupport"><title>Compiler support mechanisms</title>

 
<para>
As mentioned previously, an instrumenting profiler needs to modify the
profiled code. Doing this at compile-time is one reasonable method: it is
simple to implement, and can provide exact data. Its main drawback is
the inconvenience of recompilation of the target code, and the risk of skew
as a result of the introduction of profiling code.
</para>
<para>
The GNU C compiler provides a small number of mechanisms which a profiler can
use to support itself. Using the <option>-pg</option> to <command>gcc</command>,
the compiler will insert calls to <function>_mcount()</function> into each
function prologue (for details see <filename>final.c</filename>:<function>profile_function()</function>
in the <command>gcc</command> sources). This function is eventually supplied
by the C library, and collects the from and to PC values into a data structure,
which can then be used to construct call-graph information. The same mechanism
is used with the <option>-a</option> option, which is intended to allow
<glossterm>basic-block profiling</glossterm>, although it is reputed to work poorly
or not at all in a large number of cases.
</para>
<para>
The GNU C compiler provides another mechanism that can be used for profiling, with
the <option>-finstrument-functions</option> option<citation>gcc</citation>.
This will generate references at the start and end of each function to the following
functions :
</para>
<programlisting>
void __cyg_profile_func_enter(void (*fn)(), void (*parent)());
void __cyg_profile_func_exit(void (*fn)(), void (*parent)());
</programlisting> 
<para>
You can implement these functions, and use the function pointer values to
construct profiling data. Typically, a profiler would use the PC values
passed to look up the function names in the binary image, so a user-readable
call-graph report can be generated. Note that these are weak symbols so profiling
via this method can be done via <constant>LD_PRELOAD</constant>.
</para>
<para>
GCC provides increasing support for profile-directed optimization. This technique
uses program profile data in order to guide compilation decisions, in the hope
that the compiled program will behave similarly, improving overall performance.
This feature is enabled by the <option>-fprofile-arcs</option> option,
which then produces a <filename>.da</filename> profile, containing arc traversal
data (in this context, an arc represents a program branch to a basic block,
a straight-line section of code). This can then be fed back in for a second
compile run, this time additionally using <option>-fbranch-probabilities</option>.
See the GCC manual and <citation>gccprofiledriven</citation> for more information.
</para> 
 
</section>

<section id="librarysupport"><title>Library support mechanisms</title>
<para>
The GNU C library provides a user-space implementation <command>profil(3)</command>,
which internally uses <command>setitimer(2)</command> with <constant>ITIMER_PROF</constant>
to populate the PC value histogram. The <command>times(2)</command> and
<command>getrusage(2)</command> library calls allow collection of some data
that may prove relevant to a performance analysis.
</para>
<para>
The GNU C library provides hooks into its memory allocation routines<citation>glibc</citation>.
You can use these hooks in order to collection allocation lifetime data, size distributions
etc. Particularly for object-oriented code, allocation can become a crucial part of
an application's performance, and sometimes it is necessary to fine-tune the application's
behaviour in this respect.
</para>
<para>
Dietlibc<citation>dietlibc</citation> supports <constant>ITIMER_PROF</constant> for
<command>setitimer(2)</command>, but does not implement <command>profil(3)</command>
as of this writing.
</para>
</section>

</chapter>

<chapter id="profilers"><title>Available profiling packages</title>
 
<section id="kernelprofilers"><title>Kernel profilers</title>

<section id="readprofile"><title>readprofile</title>

<para> 
FIXME: hot-profile as in src/ 
The standard kernel profiler (as used by <command>readprofile(1)</command>) is rather rudimentary
but commonly used as it comes packaged with the kernel for most architectures; at
the time of writing, only the cris, s390 and parisc ports don't implement it.
It is a simple statistical profiler that stores the kernel PC value into a scaled
histogram buffer on every timer tick. Note that only the kernel image is profiled;
no user-space, no kernel modules etc. It is also incapable of profiling code where
interrupts are disabled.
</para>
<para>
The profiler requires a reboot with the option "profile=2" to enable it. The value
passed is the multiplier, which determines the spatial resolution of the histogram,
as described in the man page. The histogram can be cleared by simply writing to
the profile buffer device <filename>/proc/profile</filename>. Some architectures
allow you to set the sampling frequency by writing a value to this file. For example
on x86, writing a value will set the APIC timer appropriately (lower values
mean more frequent interrupts).
</para>
<para>
When some profile data is collected, <command>readprofile(1)</command> can be used to print out
a simple function-based summary, as shown in this excerpt :
</para>
<programlisting>
...
   743 kmalloc                                    1.6294
   491 handle_IRQ_event                           5.3370
   371 __rdtsc_delay                             13.2500
   348 kmem_cache_alloc                           0.8878
...
</programlisting>
<para>
Each line consists of the number of samples against
that function, the name of the function, and the normalised load. The normalised load
is calculated by (raw_count(f)/size_in_bytes(f)) - the idea is that you can expect
more samples against larger functions. However due to a number of issues, this normalisation
isn't particularly useful.
</para>
<para>
Red Hat ship a simple patch in their kernels to allow <command>readprofile(1)</command>
to use NMI interrupts. NMI interrupts cannot be disabled by the <acronym>IF</acronym>
bit in <literal>eflags</literal>; this means that you can get profile data from interrupt handlers
and code that runs with interrupts disabled (such as code protected by  
<function>spin_lock_irq()</function>). Note that it only works in this mode
when using the NMI watchdog facility, as it relies on the watchdog code to generate
the NMIS. The patch can be found
<ulink url="http://kernelnewbies.org/kernels/rh80/SOURCES/linux-2.4.9-nmiprofiling.patch">here</ulink>.
</para>
<para>
The user-space <command>readprofile(1)</command> utility has some incompatibilities
with recent Linux kernels: if you are having problems, it is recommended that you
upgrade to a recent util-linux version. The problem is related to mis-parsing of
<filename>vmlinux</filename>'s <command>nm</command> output.
</para> 
<para>
A patch to enable <command>readprofile(1)</command> to profile kernel modules
can be found <ulink url="http://groups.google.com/groups?hl=en&amp;lr=&amp;ie=UTF-8&amp;oe=UTF-8&amp;safe=off&amp;threadm=linux.kernel.2761.1016686968%40kao2.melbourne.sgi.com&amp;rnum=1&amp;prev=/groups%3Fq%3Dlinux.kernel%2Bmodule%2Bprofile%26hl%3Den%26lr%3D%26ie%3DUTF-8%26oe%3DUTF-8%26safe%3Doff%26selm%3Dlinux.kernel.2761.1016686968%2540kao2.melbourne.sgi.com%26rnum%3D1">here</ulink>. (Note that I have not
tested if this patch works, and it is bit-rotted).
</para>
 
</section>
 
<section id="kerneltop"><title>kerneltop</title>

<para>
<citation>kerneltop</citation> is a simple modification of <xref
linkend="readprofile">readprofile</xref> that displays the counts in a
<command>top</command> style, clearing the counters at each iteration.  Can be
very useful for observing the time spent in the kernel as a certain operation
is undergoing.  
</para>
 
</section>
 
<section id="minlop"><title>minilop</title>

<para>
<citation>minilop</citation> is another <xref linkend="readprofile">readprofile</xref>-derived
utility. It adds a feature to show disassembly for each histogram bin of code
that has a sample against it, and calculation of the relative load, as well
as remembering the peak count for each entry. This project seems to be abandoned.
</para>

</section>

<section id="kernprof"><title>Kernprof</title>

<para>
SGI's <command>kernprof</command><citation>kernprof</citation>
is a powerful kernel-only profiler for the ia32, ia64,
sparc64 and mips64 architectures. It cannot profile modules or interrupts-disabled code.
Profiling can be enabled and controlled at runtime.
It comes in the form of a large kernel patch and associated user-space tools. Some users
may find they need to apply a patch to <command>gcc</command> before the profiler can
be built.
</para>
<para>
<command>Kernprof</command>
supports a number of different profiling techniques. Its simplest mode creates a PC value
histogram for the kernel. Both standard timer-interrupt based sampling, and sampling based
on the hardware performance counters, are supported (use of the hardware counters is
not supported on all systems). Allowing the use of the performance counters gives a significant
power to <command>kernprof</command>, as relevant performance events such as cache misses can 
be analysed.
</para> 
<para>
Kernprof also supports a number of other profiling modes. The kernel can be built with
support for collecting (annotated) call graphs, although this has a significant overhead.
There is also the ability to collect exact function call counts via 
<function>mcount()</function>. Some of these modes can be combined in order to improve
the information collected without impinging on performance too badly.
Most of these modes generate their data in <command>gprof</command>'s 
<filename>gmon.out</filename> format. Per-CPU profiles can be created, which
can prove useful for analysing SMP performance.
</para>
<para>
Note that because the sampling method cannot be triggered whilst interrupts are disabled,
results must be taken with a pinch of salt in some cases. In particular, if the hardware 
counters are used, events will still be counted, but will have a tendency to appear
in the profiles at code points directly after interrupts become re-enabled. 
However, a patch to enable NMI-based profiling for <command>kernprof</command>
can be found <ulink url="http://marc.theaimsgroup.com/?l=linux-kernel&amp;m=102429774129115&amp;w=2">here</ulink>. 
</para>
 
</section>
 
<section id="oprofile"><title>OProfile</title>
</section>
 
<section id="ltt"><title>Linux Trace Toolkit</title>
</section>
 
<section id="dprobes"><title>Dynamic probes</title>
</section>
 
<section id="kip"><title>KIP</title>
</section>
 
<section id="timepegs"><title>Timepegs</title>
</section>
 
<section id="latency"><title>Interrupt latency measurement</title>
</section>
 
<section id="preemptionlatency"><title>Pre-emption latency measurement</title>
</section>

<section id="lockmetering"><title>SGI Lockmeter</title>
</section>

<section id="sar"><title>I/O statistics</title>
</section>

<section id="cacheinfo"><title>Cacheinfo</title>
</section>

<section id="mct"><title>MCT</title>
MCT<citation>MCT</citation> is a very simple test harness useful
for comparing the low-level performance characteristics of
kernel mutual exclusion primitives.
</section>

<section id="strongarm"><title>StrongARM profiler</title>
</section>
 
 
 
</section>
 
<section id="binaryprofilers"><title>Binary profilers</title>

<section id="vprof"><title>VProf</title>
</section>
 
<section id="eazelprof"><title>Eazel prof</title>
</section>
 
<section id="valgrind"><title>Valgrind</title>
Valgrind<citation>valgrind</citation> is an excellent debugging
system that simulates an x86 in order to catch memory allocation
and access errors. The source code indicates some preliminary
support for PC value and memory access profiling, that must
be explicitly enabled at compile time.
</section>
 
<section id="oprofile2"><title>OProfile</title>
</section>
 
<section id="jiti86"><title>JiTI86</title>
</section>

<section id="tsprof"><title>tsprof</title>
</section>

<section id="sprof"><title>Paderborn sprof</title>
</section>
 
</section>
 
<section id="sourceprofilers"><title>Source/compile-time profilers</title>

 
<section id="gprof"><title>gprof</title>
</section>
 
<section id="perfctr"><title>Perfctr</title>
</section>
 
<section id="eazelprofiler"><title>Eazel profiler (cprof)</title>
</section>
 
<section id="functioncheck"><title>FunctionCheck</title>
</section>
 
<section id="hrprof"><title>High-resolution Profiler</title>
</section> 
 
<section id="gnusprof"><title>GNU sprof</title>

This is unrelated to the other tool named <command>sprof</command>.
GNU <command>sprof</command> is packaged with the GNU C library
(it can often be found as part of the
 
</section>

<section id="pcl"><title>Performance Counter Library</title>
</section>

<section id="papi"><title>Performance API</title>
</section>

<section id="tau"><title>TAU</title>
</section>
 
<section id="lfp"><title>Low-fat Profiler</title>
</section>
 
<section id="hendriks"><title>Erik Hendriks' performance counter package</title> 
This code provided virtualised access to the x86 performance counters in a 
similar manner to <xref linkend="perfctr" /> for 2.2 kernels. It is no longer
maintained, and is listed here only for historical interest. You can find
the manual and the source <ulink url="http://www.scyld.com/products/beowulf/software/perf.html">
here</ulink>.
</section>
 
<section id="bprof"><title>bprof</title>
<command>bprof</command> is a very old tool that provided instruction-level
profiling data via <command>setitimer(2)</command>. If you are lucky, you
will still be able to find a source RPM at <ulink url="http://rpmfind.net/">rpmfind.net</ulink>,
but the code is merely of historical interest now.
</section>
 
</section>
 
<section id="analysistools"><title>Analysis tools</title>

<section id="profileviewer"><title>Profileviewer</title>
</section>

<section id="kprof"><title>KProf</title>
</section> 

<section id="cgprof"><title>cgprof</title>
</section>
 
</section>
 
<section id="specialisedprofilers"><title>Specialised profilers</title>
 
 
<section id="cacheprof"><title>Cacheprof</title>
</section>
 
<section id="fireprofile"><title>Fireprofile</title>
</section>
 
<section id="allocationprofilers"><title>Allocation profilers</title>
A number of projects exist for debugging allocation problems such as leaks
and invalid accesses. Some of these can produce allocation statistics
that may be useful for later analysis 
(<ulink url="http://dmalloc.com/">dmalloc</ulink>,
<ulink url="http://www.cbmamiga.demon.co.uk/mpatrol/">mpatrol</ulink>,
<ulink url="http://freshmeat.net/projects/mpr/">mpr</ulink>,
<ulink url="http://freshmeat.net/projects/MemProf">MemProf</ulink>).
</section>
 
</section>
 
<section id="langspecificprofilers"><title>Language-specific profilers</title>
 
<section id="java"><title>Java performance analysis</title>
</section>
 
<section id="php"><title>PHP profilers</title>
</section>
 
<section id="python"><title>Python profilers</title>
</section>
 
<section id="tcltk"><title>TCL/Tk profilers</title>
TCL/Tk comes with a built-in profiler, as described 
<ulink url="http://mini.net/tcl/1106.html">here</ulink>.
</section>
 
<section id="lisp"><title>Lisp profilers</title>
http://www.cons.org/cmucl/doc/index.html, see biblio
</section>
 
<section id="ruby"><title>Ruby profiler</title>
The language Ruby comes with its own profiling system. Its use is briefly
covered <ulink url="http://www.rubycentral.com/book/trouble.html">here</ulink>.
</section>
 
<section id="kylix"><title>ProKylix</title>
ProKylix<citation>kylix</citation> provides a profiler for Kylix code. Not free software.
</section>

<section id="perl"><title><structname>Devel::DProf</structname> (Perl)</title>
Perl comes with a profiling package called <structname>Devel::DProf</structname>,
described <ulink url="http://www.perldoc.com/perl5.6/lib/Devel/DProf.html">here</ulink>.
</section> 

</section>
 
</chapter>

<chapter id="summary">
<title>Summary</title> 
</chapter>

<bibliography id="bibliography">
<bibliodiv><title>Articles and research papers</title>
<biblioentry>
	<abbrev>shende</abbrev>
	<citetitle>Profiling and Tracing in Linux</citetitle>
	<author><firstname>Sameer</firstname><surname>Shende</surname></author>
	<abstract>
		Appears in USENIX '99 Extreme Linux Workshop, no longer available
		on the web. A short and outdated introduction to Linux profiling
	</abstract>
</biblioentry>
<biblioentry>
	<abbrev>linuxjournal</abbrev>
	<citetitle><ulink url="">Take control: gprof, bprof and Time Profilers</ulink></citetitle>
	<author><firstname>Andy</firstname><surname>Vaught</surname></author>
	<pubdate>1998-05-01</pubdate> 
	<abstract>
		An old and brief article on profiling in Linux
	</abstract>
</biblioentry>
<biblioentry>
	<abbrev>profilingphp</abbrev>
	<citetitle><ulink url="http://www.onlamp.com/pub/a/php/2002/02/28/profilingphp.html">
		Improving Performance by Profiling PHP Applications
	</ulink></citetitle>
</biblioentry>
<biblioentry>
	<abbrev>hprofarticle</abbrev>
	<citetitle><ulink url="http://www.javaworld.com/javaworld/jw-12-2001/jw-1207-hprof.html">
		 Diagnose common runtime problems with hprof
	</ulink></citetitle>
	<abstract>
		Short article on profiling Java with hprof
	</abstract>
</biblioentry>
<biblioentry>
	<abbrev>mipsr10000</abbrev>
	<citetitle><ulink url="http://www.supercomp.org/sc96/proceedings/SC96PROC/ZAGHA/INDEX.HTM">
		 Performance Analysis Using the MIPS R10000 Performance Counters</ulink></citetitle>
	<author><firstname>Marco</firstname><surname>Zagha</surname> et al.</author>
	<abstract>
		An interesting paper on using hardware performance counters on MIPS
	</abstract>
</biblioentry>
<biblioentry>
	<abbrev>profileme</abbrev>
	<citetitle>FIXME</citetitle>
</biblioentry> 
<biblioentry>
	<abbrev>monitor</abbrev>
	<citetitle><ulink url="http://citeseer.nj.nec.com/buck00using.html">
		Using Hardware Performance Counters to Isolate Memory Bottlenecks
	</ulink></citetitle>
	<authorgroup>
	<author><firstname>Bryan</firstname><surname>Buck</surname></author>
	<author><firstname>Jeffrey</firstname><surname>Hollingsworth</surname></author>
	</authorgroup>
	<abstract>
		A paper on using performance counters for finding performance problems
	</abstract>
</biblioentry>
<biblioentry>
	<abbrev>gccprofiledriven</abbrev>
	<citetitle><ulink url="http://gcc.gnu.org/news/profiledriven.html">
		Infrastructure for Profile-Driven Optimizations
	</ulink></citetitle>
	<author>The GCC team</author>
	<abstract>
		A short news items on continuing efforts to provide GCC compiler optimizations
		based on program profiles.
	</abstract>
</biblioentry>
</bibliodiv>
 
<bibliodiv><title>Manuals and documentation</title>
<biblioentry>
	<abbrev>gprof</abbrev>
	<citetitle><ulink url="http://sources.redhat.com/binutils/docs-2.12/gprof.info/index.html">
		gprof(1)</ulink></citetitle>
	<abstract>The manual for GNU gprof</abstract>
</biblioentry>
<biblioentry>
	<abbrev>manpages</abbrev>
	<citetitle><ulink url="http://freshmeat.net/projects/man-pages/">The Linux manpages collection</ulink></citetitle>
	<abstract>Linux manpages. The distributed package often has
		updates that are not in your distribution's package, so make sure to
		check the latest version of this package if you need to refer to a
		man page.
	</abstract>
</biblioentry>
<biblioentry>
	<abbrev>gcc</abbrev>
	<citetitle><ulink url="http://gcc.gnu.org/onlinedocs/gcc/">
		gcc(1)</ulink></citetitle>
	<abstract>The user manual for gcc</abstract>
</biblioentry>
<biblioentry>
	<abbrev>gcc-internal</abbrev>
	<citetitle><ulink url="http://gcc.gnu.org/onlinedocs/gccint/">
		GCC Internals</ulink></citetitle>
	<abstract>The manual describing GCC internals</abstract>
</biblioentry>
<biblioentry>
	<abbrev>glibc</abbrev>
	<citetitle><ulink url="http://www.gnu.org/manual/glibc/">
		GNU C Library Manual</ulink></citetitle>
	<abstract>The extensive manual for glibc</abstract>
</biblioentry>
<biblioentry>
	<abbrev>ia32</abbrev>
	<citetitle><ulink url="http://developer.intel.com/">
		The IA-32 Architecture Developer's Manual</ulink></citetitle>
	<authorgroup><author>Intel Corporation</author></authorgroup>
	<abstract>Volume 3 of this manual describes the
		performance counter mechanisms for the Pentium Classic, the P6 family,
		and the Pentium 4 CPU families</abstract>
</biblioentry>
<biblioentry>
	<abbrev>ia64</abbrev>
	<citetitle><ulink url="http://www.lia64.org/book/">
		IA-64 Linux Kernel: Design and Implementation</ulink></citetitle>
	<authorgroup><author><firstname>David</firstname> <surname>Mosberger</surname></author>
	<author><firstname>Stephane</firstname><surname>Eranian</surname></author>
	<author><firstname>Bruce</firstname><surname>Perens</surname></author>
	</authorgroup> 
	<isbn>0-13-061014-3</isbn>
	<publisher>Prentice Hall</publisher>
	<pubdate>2002-30-01</pubdate>
</biblioentry>
<biblioentry>
	<abbrev>athlon</abbrev>
	<citetitle><ulink url="http://www.amd.com/products/cpg/athlon/techdocs/pdf/22007.pdf">
	AMD Athlon Processor x86 Code Optimization Guide</ulink></citetitle>
	<author>AMD Corporation</author>
	<abstract>A brief description of the performance counters
		for AMD Athlon and Duron processors</abstract>
</biblioentry>
<biblioentry>
	<abbrev>jvmpi</abbrev>
	<citetitle><ulink url="http://java.sun.com/j2se/1.3/docs/guide/jvmpi/jvmpi.html">
		Java Virtual Machine Profiler Interface</ulink></citetitle>
	<author>Sun Corporation</author>
	<abstract>The Java API for collecting performance data from
		 a virtual machine</abstract>
</biblioentry> 
</bibliodiv>
<bibliodiv><title>Linux software</title>
<biblioentry>
	<abbrev>kernprof</abbrev>
	<citetitle><ulink url="http://oss.sgi.com/projects/kernprof/">Kernprof</ulink></citetitle>
	<abstract>Kernel profiler patch that runs on a number
		of architectures
	</abstract>
</biblioentry>
<biblioentry>
	<abbrev>oprofile</abbrev>
	<citetitle><ulink url="http://oprofile.sf.net/">OProfile</ulink></citetitle>
	<abstract>
		Performance counter based system-wide statistical profiling for x86 Linux
		systems
	</abstract>
</biblioentry>
<biblioentry>
	<abbrev>kip</abbrev>
	<citetitle><ulink url="http://kip.sourceforge.net/">KIP</ulink></citetitle>
	<abstract>
		Detailed tracing/logging of the kernel via source instrumentation
	</abstract>
</biblioentry>
<biblioentry>
	<abbrev>timepegs</abbrev>
	<citetitle><ulink url="http://www.zip.com.au/~akpm/linux/#timepegs">Timepegs
		</ulink></citetitle>
	<abstract>
		Interstitial time measurement of the kernel via source instrumentation
	</abstract>
</biblioentry>
<biblioentry> 
	<abbrev>ltt</abbrev> 
	<citetitle><ulink url="http://www.opersys.com/LTT/">Linux Trace Toolkit</ulink></citetitle>
	<abstract>
		Kernel and part-userspace event logging and tracing system
	</abstract>
</biblioentry>
<biblioentry>
	<abbrev>dprobes</abbrev>
	<citetitle><ulink url="http://oss.software.ibm.com/developerworks/oss/linux/projects/dprobes/">
		Dynamic probes</ulink></citetitle>
	<abstract>
		A powerful tracing and event notification system
	</abstract>
</biblioentry>
<biblioentry>
	<abbrev>schedlat</abbrev>
	<citetitle><ulink url="http://www.zip.com.au/~akpm/linux/schedlat.html">Schedlat</ulink></citetitle>
	<abstract>
		Measures kernel scheduling latency
	</abstract>
</biblioentry>
<biblioentry>
	<abbrev>preemptstats</abbrev>
	<citetitle><ulink url="http://www.tech9.net/rml/linux/">Pre-empt statistics</ulink></citetitle>
	<abstract>
		Another scheduling latency measurement tool for the kernel
	</abstract>
</biblioentry>
<biblioentry>
	<abbrev>intlat</abbrev>
	<citetitle><ulink url="http://www.zip.com.au/~akpm/linux/#intlat">Intlat</ulink></citetitle>
	<abstract>
		Measures the time the kernel has interrupts disabled
	</abstract>
</biblioentry> 
<biblioentry>
	<abbrev>lockmeter</abbrev>
	<citetitle><ulink url="http://oss.sgi.com/projects/lockmeter/">SGI Lockmeter</ulink></citetitle>
	<abstract>
		Detailed statistics on mutual exclusion primitive usage in the kernel
	</abstract>
</biblioentry>
<biblioentry>
	<abbrev>sar</abbrev>
	<citetitle><ulink url="ftp://ftp.uk.linux.org/pub/linux/sct/fs/profiling">SAR patches</ulink></citetitle>
	<abstract>
		Old patches implementing more detailed I/O accounting. Also see
		<ulink url="http://perso.wanadoo.fr/sebastien.godard/">http://perso.wanadoo.fr/sebastien.godard/</ulink>
		for user-space tools
	</abstract>
</biblioentry>
<biblioentry>
	<abbrev>reqlog</abbrev>
	<citetitle><ulink url="http://users.ox.ac.uk/~mbeattie/linux-kernel.html">Reqlog</ulink></citetitle>
	<abstract>
		Logging for I/O requests in the kernel
	</abstract>
</biblioentry>
<biblioentry>
	<abbrev>cacheinfo</abbrev>
	<citetitle><ulink url="http://ds9a.nl/cacheinfo/">Cacheinfo</ulink></citetitle>
	<abstract>
		Module to provide statistics on the various kernel caches
	</abstract>
</biblioentry>
<biblioentry>
	<abbrev>mct</abbrev>
	<citetitle><ulink url="http://www.moses.uklinux.net/mct/">MCT</ulink></citetitle>
	<abstract>
		Test harness for comparing different kernel mutual exclusion primitives
	</abstract>
</biblioentry>
<biblioentry>
	<abbrev>strongarm</abbrev>
	<citetitle><ulink url="http://www.handhelds.org/pipermail/linux/2000-December/000705.html">
		StrongARM profiler</ulink></citetitle>
	<abstract>
		System-wide statistical kernel profiler for the StrongARM CPU series
	</abstract>
</biblioentry>
<biblioentry> 
	<abbrev>kerneltop</abbrev>
	<citetitle><ulink url="http://www.xenotime.net/linux/kerneltop/">
		Kerneltop</ulink></citetitle>
	<abstract>
		Periodic profile statistics for the kernel based on /proc/profile
	</abstract>
</biblioentry>
<biblioentry>
	<abbrev>minilop</abbrev>
	<citetitle><ulink url="http://sourceforge.net/projects/minilop/">
		Mini Linux Optimizing Project</ulink></citetitle>
	<abstract>
		Another /proc/profile based utility, with disassembly support
	</abstract>
</biblioentry>

<biblioentry>
	<abbrev>profileviewer</abbrev>
	<citetitle><ulink url="http://www.capital.net/~dittmer/profileviewer/index.html">
		ProfileViewer</ulink></citetitle>
	<abstract>
		Java-based viewer for gprof output
	</abstract>
</biblioentry>
<biblioentry>
	<abbrev>kprof</abbrev>
	<citetitle><ulink url="http://kprof.sourceforge.net/">KProf</ulink></citetitle>
	<abstract>
		A KDE-based profile viewer for gprof and Function Check
	</abstract>
</biblioentry>
<biblioentry>
	<abbrev>cgprof</abbrev>
	<citetitle><ulink url="http://mvertes.free.fr/">cgprof</ulink></citetitle>
	<abstract>
		A utility to display call graphs from gprof data
	</abstract>
</biblioentry>
<biblioentry>
	<abbrev>perfctr</abbrev>
	<citetitle><ulink url="http://www.csd.uu.se/~mikpe/linux/perfctr/">Perfctr</ulink></citetitle>
	<abstract>
		A library providing virtualised access to the x86 hardware performance
		counters
	</abstract>
</biblioentry>
<biblioentry>
	<abbrev>cacheprof</abbrev>
	<citetitle><ulink url="http://www.cacheprof.org/">Cacheprof</ulink></citetitle>
	<abstract>
		A simulation-based cache impact profiler
	</abstract>
</biblioentry>
<biblioentry>
	<abbrev>valgrind</abbrev>
	<citetitle><ulink url="http://developer.kde.org/~sewardj/">Valgrind</ulink></citetitle>
	<abstract>
		An excellent allocation debugger for x86 binaries with some profiling support
	</abstract>
</biblioentry> 
<biblioentry>
	<abbrev>fireprofiler</abbrev>
	<citetitle><ulink url="http://ares.penguinhosting.net/~ian/">FireProfiler</ulink></citetitle>
	<abstract>
		Produces data on MySQL queries an application makes
	</abstract>
</biblioentry>
<biblioentry>
	<abbrev>vprof</abbrev>
	<citetitle><ulink url="http://aros.ca.sandia.gov/~cljanss/perf/vprof/index.html">
		VProf</ulink></citetitle>
	<abstract>
		Binary profiler that can use x86 performance counters
	</abstract>
</biblioentry>
<biblioentry>
	<abbrev>eazel</abbrev>
	<citetitle><ulink url="http://www.mozilla.org/performance/eazel.html">Eazel profilers
		</ulink></citetitle>
	<abstract>
		Two simple profilers, one instrumenting (based on Corel's defunct cprof),
		and one not
	</abstract>
</biblioentry>
<biblioentry>
	<abbrev>functioncheck</abbrev>
	<citetitle><ulink url="http://www710.univ-lyon1.fr/~yperret/fnccheck/profiler.html">
		FunctionCheck</ulink></citetitle>
	<abstract>
		Instrumenting accounting function-based profiler
	</abstract>
</biblioentry>
<biblioentry>
	<abbrev>http://hrprof.sourceforge.net/</abbrev>
	<citetitle><ulink url="http://hrprof.sourceforge.net/">HRProf</ulink></citetitle>
	<abstract>
		Realtime instrumenting profiler that uses the x86 TSC register
	</abstract>
</biblioentry>
<biblioentry>
	<abbrev>jiti86</abbrev>
	<citetitle><ulink url="http://www.elis.rug.ac.be/~ronsse/jiti/">JitI86</ulink></citetitle>
	<abstract>
		Offline binary instrumentation system
	</abstract>
</biblioentry>
<biblioentry>
	<abbrev>pct</abbrev>
	<citetitle><ulink url="http://www.fz-juelich.de/zam/PCL/">Performance Counter Library</ulink></citetitle>
	<abstract>
		Userspace library API for accessing hardware counters over a wide range
		of platforms
	</abstract>
</biblioentry>
<biblioentry>
	<abbrev>papi</abbrev>
	<citetitle><ulink url="http://icl.cs.utk.edu/projects/papi/">Performance API</ulink></citetitle>
	<abstract>
		Another attempt at a platform-independent hardware counter API
	</abstract>
</biblioentry>
<biblioentry>
	<abbrev>tau</abbrev>
	<citetitle><ulink url="http://www.acl.lanl.gov/tau/">TAU</ulink></citetitle>
	<abstract>
		C++-based source instrumentation package on top of PCL or PAPI
	</abstract>
</biblioentry>
<biblioentry>
	<abbrev>lfp</abbrev>
	<citetitle><ulink url="http://sourceforge.net/projects/lfp">Low-fat Profiler</ulink></citetitle>
	<abstract>
		A simple API to provide interstitial TSC-based timing analysis
	</abstract>
</biblioentry>
<biblioentry>
	<abbrev>apictimers</abbrev>
	<citetitle><ulink url="http://vincent.oberle.com/apic_timer-index.html">
		APIC timer module for Linux</ulink></citetitle>
	<abstract>
		Kernel module for access to high-resolution APIC timers
	</abstract>
</biblioentry>
<biblioentry> 
	<abbrev>tsprof</abbrev>
	<citetitle><ulink url="http://www.bitwagon.com/tsprof/tsprof.html">tsprof</ulink></citetitle>
	<abstract>
		Binary profiler with access to the x86 performance counters. Not free software
	</abstract>
</biblioentry>
<biblioentry>
	<abbrev>Paderborn sprof</abbrev>
	<citetitle><ulink url="http://www.uni-paderborn.de/pc2/projects/warp/sproftool/">sproftool</ulink></citetitle>
	<abstract>
		Portable profiler utilising hardware performance counters. Not free software
	</abstract>
</biblioentry>
 
<biblioentry>
	<abbrev>kylix</abbrev>
	<citetitle><ulink url="http://www.prodelphi.de/">ProKylix</ulink></citetitle>
	<abstract>
		Kylix profiler
	</abstract>
</biblioentry>
<biblioentry>
	<abbrev>jprobe</abbrev>
	<citetitle><ulink url="http://www.sitraka.com/software/jprobe/jprobeprofiler.html">
		JProbe Profiler</ulink></citetitle>
	<abstract>
		Profiler and analysis tools for Java. Not free software
	</abstract>
</biblioentry>
<biblioentry>
	<abbrev>optimizeit</abbrev>
	<citetitle><ulink url="http://www.borland.com/optimizeit/">
		OptimizeIt suite</ulink></citetitle>
	<abstract>
		Profiler and analysis tools for Java. Not free software
	</abstract>
</biblioentry>
<biblioentry>
	<abbrev>javaprofilingtool</abbrev>
	<citetitle><ulink url="http://javaprofiler.sourceforge.net/">Java Profiling Tool</ulink></citetitle>
	<abstract>
		Supposed profiling tool via JVMPI. No code available
	</abstract>
</biblioentry>
<biblioentry>
	<abbrev>xdprof</abbrev>
	<citetitle><ulink url="http://xdprof.sourceforge.net/">xdProf</ulink></citetitle>
	<abstract>
		A stack trace collection facility for Java via JVMPI
	</abstract>
</biblioentry>
<biblioentry>
	<abbrev>sourcetracer</abbrev>
	<citetitle><ulink url="http://sourceforge.net/projects/codewitness/">SourceTracer</ulink></citetitle>
	<abstract>
		Another JVMPI-based Java profiler
	</abstract>
</biblioentry>
<biblioentry>
	<abbrev>jperfanal</abbrev>
	<citetitle><ulink url="http://jperfanal.sourceforge.net/">JPerfAnal</ulink></citetitle>
	<abstract>
		Post-profile viewer for Java
	</abstract>
</biblioentry>
<biblioentry>
	<abbrev>jmp</abbrev>
	<citetitle><ulink url="http://www.d.kth.se/~d94-rol/jmp/">JMP</ulink></citetitle>
	<abstract>
		Java memory allocation profiler via JVMPI
	</abstract>
</biblioentry>
<biblioentry>
	<abbrev>jmocha</abbrev>
	<citetitle><ulink url="http://www-124.ibm.com/developerworks/oss/jmocha/index.html">
		jMocha micro-benchmark suite</ulink></citetitle>
	<abstract>
		Measurement harness for detailed micro-analysis
	</abstract>
</biblioentry>
<biblioentry>
	<abbrev>hpjmeter</abbrev>
	<citetitle><ulink url="http://www.hp.com/products1/unix/java/hpjmeter/index.html">
		HPjmeter Performance Analysis Tool</ulink></citetitle>
	<abstract>
		Profiler viewer for Java. Not free software
	</abstract>
</biblioentry>
<biblioentry>
	<abbrev>jlouiss</abbrev>
	<citetitle><ulink url="http://www.sax.de/~adlibit/index.html">jLouiss</ulink></citetitle>
	<abstract>
		Tracing tool for Java via JVMPI
	</abstract>
</biblioentry>
	
<biblioentry>
	<abbrev>phpapd</abbrev>
	<citetitle><ulink url="http://apd.communityconnect.com/">Advanced PHP Debugger</ulink></citetitle>
	<abstract>
		PHP debugger that can generate profile data
	</abstract>
</biblioentry>
<biblioentry>
	<abbrev>califa</abbrev>
	<citetitle><ulink url="http://califa.sourceforge.net/">Califa</ulink></citetitle>
	<abstract>
		Simple profiler for PHP 3
	</abstract>
</biblioentry>
<biblioentry>
	<abbrev>lispdebug</abbrev>
	<citetitle><ulink url="http://www.marclisp.bewoner.antwerpen.be/intro.html">Lisp Debug</ulink></citetitle>
	<abstract>
		Lisp debugger and profiler that can run on several LISP implementations
	</abstract>
</biblioentry>
<biblioentry>
	<abbrev>pyprof</abbrev>
	<citetitle><ulink url="http://www.azstarnet.com/~donut/programs/index_s.html#pyprof">PyProf</ulink></citetitle>
	<abstract>
		Convenient wrapper for the Python profiler
	</abstract>
</biblioentry> 

</bibliodiv>
<bibliodiv><title>Related links</title>
<biblioentry>
	<abbrev>linuxperf</abbrev>
	<citetitle><ulink url="http://linuxperf.nl.linux.org/">Linux Performance Tuning
		</ulink></citetitle>
	<abstract>
		A portal for various performance analysis and tuning tools
	</abstract>
</biblioentry>
<biblioentry>
	<abbrev>javaperf</abbrev>
	<citetitle><ulink url="http://www.javaperformancetuning.com/">Java Performance Tuning</ulink></citetitle>
	<abstract>
		Portal for Java performance tuning and analysis
	</abstract>
</biblioentry>
<biblioentry>
	<abbrev>devtools</abbrev>
	<citetitle><ulink url="http://www.hotfeet.ch/~gemi/LDT/">Linux Development Tools
		</ulink></citetitle>
	<abstract>
		Portal for various debugging and development tools under Linux
	</abstract>
</biblioentry>
<biblioentry id="tpop">
	<abbrev>tpop</abbrev>
	<citetitle><ulink url="http://netlib.bell-labs.com/cm/cs/tpop/">The Practise
		of Programming</ulink></citetitle>
	<authorgroup><author><firstname>Brian</firstname><initial>W</initial><surname>Kernighan</surname></author>
		<author><firstname>Rob</firstname><surname>Pike</surname></author></authorgroup>
	<publisher>Addison-Wesley, Inc.</publisher>
	<pubdate>1999</pubdate>
	<isbn>0-201-61586-X</isbn>
		 
	<abstract>
		An excellent practical guide to program development
	</abstract>
</biblioentry>
</bibliodiv>
</bibliography>
	
</article>
