Profile data covers a wide range of data types, including event logs, execution counts, resource attributions, and more. Any tool that can generate data as input to a performance analysis can, in some sense, be considered to be a profiling technique. By definition, profiling data must be collected at runtime; this fact restricts the available methods to a few main techniques.
Profile data can be produced in a number of different forms. At the most simple end are accumulated event counts which can be used for a broad understanding of the workloads. Event logging, which is a form of tracing, is another related form. Generally event logs require some form of processing in order to reveal interesting performance data. Time-based data characterises how long operations, or sections of code, execute for, typically measured in real time, or virtual per-process CPU time. Call-graph data collects data with regards to the path to the code under question. For example, a periodic stack trace is a simple form of call-graph information. More typically, call-graph information is represented in an accumulated form at function granularity. This allows the developer to determine more easily the focal point of any performance problems in the source code.
All such data can be classified as either exact or statistical. Exact data tells the whole story: no elements are missing from the data. For example, function call counts are usually calculated at every function call, so the total counts are 100% accurate.
Statistical data, in contrast, is not 100% accurate. Rather, for the data to be useful, it is expected that it is a realistic representation of the true data set. The data set is some fraction of the data that would have been generated by the profiler input. For example, a CPU time histogram of functions is rarely exact. Estimating time spent in each function by sampling the PC counter regularly is a very common profiling technique. There are two sub-types of statistical data. First there is data that is inaccurate due to the inherent uncertainty of certain measurements: for example, a cache line miss data point may accurately represent the number of actual cache line misses, but the lack of context means that some misses due to other system processes are not filtered out from the result set. A more common example is the granularity of certain timing tools. The second type of inaccuracy is usually a result of examining profiling methods, and the inherent limitations of their resolution. For example, a profiler that accounted the function being executed every 10ms could easily skew the results in favour of functions that take longer, even when there are faster functions that are called far more often.
One of the main reasons statistical profiling is so common is that collecting exact data often incurs a cost in overhead, and often that cost is prohibitive. Thus this design choice is a tradeoff between speed/obtrusiveness and accuracy.
We have mentioned examining profilers. These constitute one of the main classes of profiling techniques. They are characterised by a periodic collection of profiling data. This technique inevitably gives statistically-bound results, unless an accounting technique is used in concert with the periodic collection. An accounting profiler collects exact counts for some particular data item, for example, number of major page faults. The exact nature of accounting profilers implies more reliable data, but there can often be costs in terms of obtrusiveness of the technique used.