一、Callgrind
Callgrind 是 Valgrind 工具集的一部分,而Valgrind对于在Linux平台上进行开发的同学们应该非常熟悉。Callgrind主要用于软件中函数调用关系和CPU缓存性能等的分析。其核心功能包括:
1、调用图分析:追踪函数调用链路(堆栈的层次)并统计耗时占比
2、缓存模拟分析:分析 L1/L2 缓存命中率
3.多线程分析:多线程中的性能瓶颈即处理调试的耗时情况
不过Callgrind的应用有一定的受限,即表现在:
1、测试的程序在编译时需增加 -g调试选项(相信略有经验的开发者都明白)
2、会明显的引起测试程序的性能下降,大约在几十倍左右
二、安装
安装Oprofile工具的方法也很简单,运行下面的命令:
sudo apt update
sudo apt install valgrind kcachegrind #kcachegrind是图形化界面,可以不安装
安装完成后即可以使用callgrind进行性能分析。
三、使用方法
1、Callgrind工具的使用方法如下:
# 启动分析
valgrind --tool=callgrind ./app_name [args]
#生成报告
callgrind_annotate callgrind.out.[pid] --auto=yes>report.txt
kcachegrind callgrind.out.[pid] #图形化界面
说明:
--auto=yes:自动关联源码(需编译时添加 -g 调试符号)
--inclusive=yes:统计函数总耗时(包含子函数调用)
2、处理多线程数据
未防止多线程处理数据时的混淆可增加相关标记:
valgrind --tool=callgrind --separate-threads=yes ./app_name
3、缓存分析
valgrind --tool=callgrind --cache-sim=yes ./app_name
四、具体示例
使用下面的测试程序:
void test() {
for (int num=0; num<1000000; num++);
}
int main() {
test();
return 0;
}
1、编译
g++ -g -o ct callgrindTest.cpp
2、启动测试
valgrind --tool=callgrind ./ct
3、生成并查看报告
#生成
callgrind_annotate callgrind.out.281233 --auto=yes>myreport.txt
#查看
cat myreprot.txt
--------------------------------------------------------------------------------
Profile data file 'callgrind.out.281233' (creator: callgrind-3.19.0)
--------------------------------------------------------------------------------
I1 cache:
D1 cache:
LL cache:
Timerange: Basic block 0 - 1032208
Trigger: Program termination
Profiled target: ./ct (PID 281233, part 1)
Events recorded: Ir
Events shown: Ir
Event sort order: Ir
Thresholds: 99
Include dirs:
User annotated:
Auto-annotation: on
--------------------------------------------------------------------------------
Ir
--------------------------------------------------------------------------------
3,149,006 (100.0%) PROGRAM TOTALS
--------------------------------------------------------------------------------
Ir file:function
--------------------------------------------------------------------------------
3,000,011 (95.27%) callgrindTest.c:test() [/home/fpc/qt65_project/ct]
29,297 ( 0.93%) ./elf/./elf/dl-tunables.c:__GI___tunables_init [/usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2]
22,444 ( 0.71%) ./elf/./elf/dl-lookup.c:do_lookup_x [/usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2]
19,725 ( 0.63%) ./elf/./elf/dl-lookup.c:_dl_lookup_symbol_x [/usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2]
17,371 ( 0.55%) ./elf/../sysdeps/x86_64/dl-machine.h:_dl_relocate_object
9,783 ( 0.31%) ./elf/./elf/do-rel.h:_dl_relocate_object
5,055 ( 0.16%) ./string/../sysdeps/x86_64/strcmp.S:strcmp [/usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2]
4,648 ( 0.15%) ./elf/./elf/dl-lookup.c:check_match [/usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2]
3,085 ( 0.10%) ./elf/../sysdeps/x86/dl-cacheinfo.h:intel_check_word.constprop.0 [/usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2]
2,406 ( 0.08%) ./elf/./elf/dl-tunables.h:__GI___tunables_init
2,222 ( 0.07%) ./elf/../bits/stdlib-bsearch.h:intel_check_word.constprop.0
2,170 ( 0.07%) ./elf/./elf/dl-version.c:_dl_check_map_versions [/usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2]
--------------------------------------------------------------------------------
-- Auto-annotated source: callgrindTest.c
--------------------------------------------------------------------------------
Ir
.
3 ( 0.00%) void test() {
3,000,004 (95.27%) for (int num=0; num<1000000; num++);
4 ( 0.00%) }
.
3 ( 0.00%) int main() {
1 ( 0.00%) test();
3,000,011 (95.27%) => callgrindTest.c:test() (1x)
1 ( 0.00%) return 0;
2 ( 0.00%) }
--------------------------------------------------------------------------------
The following files chosen for auto-annotation could not be found:
--------------------------------------------------------------------------------
./elf/../bits/stdlib-bsearch.h
./elf/../sysdeps/x86/dl-cacheinfo.h
./elf/../sysdeps/x86_64/dl-machine.h
./elf/./elf/dl-lookup.c
./elf/./elf/dl-tunables.c
./elf/./elf/dl-tunables.h
./elf/./elf/dl-version.c
./elf/./elf/do-rel.h
./string/../sysdeps/x86_64/strcmp.S
--------------------------------------------------------------------------------
Ir
--------------------------------------------------------------------------------
3,000,018 (95.27%) events annotated
符号说明:
a) Ir(Instruction reads):指令读取次数,反映 CPU 负载
b) Dr(Data reads) 和 Dw(Data writes):数据读写次数
c) Bc(Branch conditional) 和 Bi(Branch indirect):分支预测统计
4、图形化分析
使用命令:
kcachegrind callgrind.out.281233
生成下图:
五、总结
开发者可能对Callgrind都比较熟悉,这里重新说明一下,重点是和前面的几个性能测试工具进行对比。有比较才有伤害嘛。与诸君共勉!