General Performance Analysis Tools

There are a number of general tools for measuring CPU and elapsed time either from within the code itself or via the UNIX environment.

UNIX Tools

time and timex

These return the elapsed, user and system time used by a given command (executable). Note that under the C shell and Korn shell, time is a shell built-in function. Full details on its usage may be found in the appropriate shell man page. For details on timex, see man timex.

Example:

  OSCA$ time a.out 	                 seconds          clocks 	elapsed          0.27913       125606662 	user             0.00135          608352 	sys              0.03821        17196053 	

The user CPU time is time spent executing the instructions that make up the user code, including e.g. calls to subroutine libraries linked to the code.

The system time is time spent in "kernel" mode, processing I/O requests, for example, or other situations which require the intervention of the operating system.

Together these comprise the total CPU time. Note that the user time is usually significantly greater than the system time. Indeed, a high ratio of system to user CPU time may indicate a problem. Exceptions such as page faults, misaligned memory references, and floating-point exceptions consume a large amount of system time. Note that time spent doing things like waiting for input from the terminal, seeking on a disk, or rewinding a tape does not appear in the CPU time; these situations do not require the CPU and thus it can (and generally does) work on something else while waiting.

The elapsed (or "wallclock") time is simply the total real time that passed during the execution of the program. This will typically be greater than the total CPU time due primarily to sharing the CPU with other programs. In addition, programs that perform a large number of I/O operations, require more memory bandwidth than is available on the machine (i.e., the CPU spends significant time waiting for data to arrive from memory), or is paged or swapped will show a relatively lower ratio of CPU time used to elapsed time.

ja - "Job Accounting" (Cray SV1 system only)

This utility reports job- or session-related accounting information with very low system overhead. It can be used to measure

  • Elapsed time
  • User CPU time used
  • System CPU time
  • I/O wait tine
  • I/O requests
  • Maximum memory size (the memory "high water mark")

as well as many other quantities.

To enable job accounting, type:

 		ja

Accounting information is then collected for all processes that are subsequently run. Additional ja commands generate accounting reports, the content of which are controlled by options to ja. The type of report you receive is determined by supplying one of the following options

  • -c - Produces command report
  • -f - Produces command flow report (provides information on the parent/child relationships of processes)
  • -s - Produces summary report

    In addition, there are various modifying options which allow finer control over the displayed information. These include

    • -l - When used with -c or -f, generate a "long" listing (132 columns)
    • -h - Replaces the physics I/O data with the largest amount of memory used by the process at any time.

The accounting report is written to the standard output unless redirected. Further ja commands generate additional (updated) reports, until the -t option is supplied or the shell is terminated. This option disables job accounting.

Example:

 OSCA$ ja OSCA$ a.out OSCA$ ja -chl > ja.out OSCA$ cut -c1-72 ja.out   Job Accounting - Command Report ===============================  Command  Started    Elapsed    User CPU   Sys CPU   I/O Wait I/O Wait      Name      At      Seconds     Seconds   Seconds   Sec Lck  Sec Unlck   ======== ======== =========== ========== ========== ======== ==========  ja       17:17:19      0.1119     0.0014     0.0084     0.00     0.0907  a.out    17:17:21      0.9822     0.5277     0.0448     0.00     0.1838  OSCA$ cut -c1-9,73-132 ja.out   Job Accou =========  Command   CPU MEM   I/O WMem Kwords   Log I/O Memory   Ex   Name    Avg Mwds  Avg Mwds Xferred  Request HiWater  St Ni Fl  SBU' ======== ========= ========= ======== ======= ======= === == == ===== ja          0.2002    0.0000     0.00       0     512   0 20       0. a.out       1.1975    0.0000     0.03      20    2496   0 20       0. OSCA$      

Memory HiWater is the maximum amount of memory used by the process, in units of 512 words (note that one word is 8 bytes on the SV1). This data is very useful when determining resource requirements for batch processing.

Profiling

Profiling is a general technique for obtaining timing information on a program at the level of subroutines and below. It is very useful for identifying in which subroutines or loops the code spends the majority of its time, as an aid to focusing optimization efforts. An advantage of profiling is that it incurs only a relatively modest overhead

Profiling works via a statistical sampling process. The program counter is periodically sampled to determine which piece of code is being executed, and statistics are collected over the course of the run. The resulting data may be statistically reliable even if the code runs for as little as a few CPU seconds.

Note, however, that the profiling "resolution" (i.e., whether one obtains information at the level of subroutines or code blocks) and the optimization level are usually inversely related. General statistics, such as total number of calls to a subroutine, or the total CPU time, synchronization time, or I/O time spent within a routine, are easily and reliably collected. Obtaining timing data at a finer resolution allows one to see more clearly what is happening within subroutines, which may be necessary to determine exactly where a problem or performance bottleneck lies. But in general this can be achieved only by reducing the optimization level, so that the sampled code is different from the optimized "production" code. In addition, the instrumentation added to the code (the calls to profiling subroutines) themselves affect the timing characteristics of the code. Ultimately, as the instrumentation exposes finer and finer detail, the data collected become less accurate.

The general procedure for profiling is:

  1. Compile and/or link the code to enable profiling (not necessary on all systems). Generally the resolution of the profiling as well as the sampling frequency are under the control of the user.
  2. Run the instrumented code to generate the data.
  3. Collate and view the data.

For details on how to profile programs on the various OSC systems, see the relevant performance analysis summaries.

Fortran 90 Intrinsics, etc.

STOP statement

When a code terminates at a STOP statement, the system returns the total CPU and elapsed times, and the memory and stack high-water marks.

Example:

 OSCA$ a.out  STOP   executed at line 31 in Fortran routine 'VECTOR'  CPU: 0.528s,  Wallclock: 0.539s,  24.5% of 4-CPU Machine  Memory HWM: 1236446, Stack HWM: 20620, Stack segment expansions: 0 OSCA$ 
  • TIME()
  • SECOND
  • DATE_AND_TIME
  • SYSTEM_CLOCK
  • RTC/IRTC

These intrinsic functions give various types of access to the system clock, and may be used to provide detailed timing of individual code sections. A basic technique is to sample the clock before and after a critical section of code, and then compute the difference to obtain the time spent in that section.

See the man pages for detailed information on the above.

C/C++ Functions

  • time
  • rtclock (Cray Research Inc. extension to ANSI C)
  • rtc/irtc

(Remember to #include <time.h> where necessary!)

These functions give various types of access to the system clock, and may be used to provide detailed timing of individual code sections. A basic technique is to sample the clock before and after a critical section of code, and then compute the difference to obtain the time spent in that section.

See the man pages for detailed information on the above.