Contents
1. Overview
The Intel VTune performance analyzer utility helps in determining bottlenecks and sources of suboptimal utilization of CPU resources by a given application. It provides also a conveneient way to create a call graph of the application as well as to gather numerous Hardware Performance events which provide both programer and system integrator with invaluable detailed insight into the actual code execution.
2. Using VTune
2.1. GUI mode (Eclipse)
To start the GUI for VTune just type in a terminal:
# /opt/intel/vtune/bin/vtlec &
NB: DO NOT RUN SAMPLING WITH A LOT (MORE THAN 50-60) EVENTS AT A TIME, YOU CAN EASILY EXCEED THE MEMORY SIZE AVAILABLE ON THE MACHINE AND SYSTEM CRASHES ARE POSSIBLE!
2.2. Command line mode
Before being able to use VTune from the command line or in a script, one should extend the $PATH environment variable:
# export PATH=$PATH:/opt/intel/vtune/bin # which vtl
A simple activity for sampling, and then showing the data can be done in this way:
# vtl activity -d 2 -c sampling -app ls run # vtl show -all # vtl view a1::r1
The output of the last two commands should look something like the following:
a1__Activity1 r1___Sat Jun 26 10:11:57 2010 - Sampling Results [127.0.0.1] r2_______Run 1 r3_________CPU_CLK_UNHALTED.THREAD r4_________INST_RETIRED.ANY a2__Activity2 r1___Sat Jun 26 10:13:34 2010 - Sampling Results [127.0.0.1] r2_______Run 1 r3_________CPU_CLK_UNHALTED.THREAD r4_________INST_RETIRED.ANY a3__Activity3 r1___Sat Jun 26 10:15:55 2010 - Sampling Results [127.0.0.1] r2_______Run 1 r3_________CPU_CLK_UNHALTED.THREAD r4_________INST_RETIRED.ANY Module Process CPU_CLK_UNHALTED.THREAD samples INST_RETIRED.ANY samples Clocks per Instructions Retired - CPI Process Path Process ID Original Module Path vmlinux-2.6.18-194.3.1.el5 pid_0x0 893 18 49.611 0x0 /boot/ vmlinux-2.6.18-194.3.1.el5 pid_0x349 16 3 5.333 0x349 /boot/ libpthread-2.5.so vtl.bin 8 0 0.000 /opt/intel/vtune/shared/bin/ 0x2f47 /lib/ vmlinux-2.6.18-194.3.1.el5 vtl.bin 7 1 7.000 /opt/intel/vtune/shared/bin/ 0x2f47 /boot/ vmlinux-2.6.18-194.3.1.el5 ntd 5 0 0.000 /opt/sag/exx/v721/bin/ 0x1241 /boot/ ntd ntd 3 1 3.000 /opt/sag/exx/v721/bin/ 0x1241 /opt/sag/exx/v721/bin/ libmutant.so vtl.bin 2 1 2.000 /opt/intel/vtune/shared/bin/ 0x2f47 /opt/sag/exx/v721/lib/ vmlinux-2.6.18-194.3.1.el5 irqbalance 1 1 1.000 /usr/sbin/ 0x117f /boot/ cpufreq_ondemand pid_0x349 1 0 0.000 0x349 libc-2.5.so vtserver.bin 1 0 0.000 /opt/intel/vtune/rdc/shared/bin/ 0x2fc0 /lib64/ Other32 dsm_sa_datamgr32d.5.9.1.6284 1 0 0.000 /opt/dell/srvadmin/dataeng/bin/ 0x196a
3. Exporting and analyzing data
4. Results
The following sections present the results of various High Energy Physics applications. Central for this results is the floating point percentage of every application compared to the HEP-SPEC benchmark. Results near the result of HEPSPEC are proof for the applicability of HEPSPEC all_cpp benchmarks for simulating HEP application-like workload and thus providing consistent base for testing computational resource power.