ThinkChat2.0新版上线,更智能更精彩,支持会话、画图、视频、阅读、搜索等,送10W Token,即刻开启你的AI之旅 广告
perf stat用于运行指令,并分析其统计结果。虽然perf top也可以指定pid,但是必须先启动应用才能查看信息。 perf stat能完整统计应用整个生命周期的信息。 命令格式为: ``` perf stat [-e <EVENT> | --event=EVENT] [-a] <command> perf stat [-e <EVENT> | --event=EVENT] [-a] — <command> [<options>] ``` 下面简单看一下perf stat 的输出: ``` al@al-System-Product-Name:~/perf$ sudo perf stat ^C Performance counter stats for 'system wide': 40904.820871 cpu-clock (msec) # 5.000 CPUs utilized 18,132 context-switches # 0.443 K/sec 1,053 cpu-migrations # 0.026 K/sec 2,420 page-faults # 0.059 K/sec 3,958,376,712 cycles # 0.097 GHz (49.99%) 574,598,403 stalled-cycles-frontend # 14.52% frontend cycles idle (49.98%) 9,392,982,910 stalled-cycles-backend # 237.29% backend cycles idle (50.00%) 1,653,185,883 instructions # 0.42 insn per cycle # 5.68 stalled cycles per insn (50.01%) 237,061,366 branches # 5.795 M/sec (50.02%) 18,333,168 branch-misses # 7.73% of all branches (50.00%) 8.181521203 seconds time elapsed ``` 输出解释如下: ``` cpu-clock:任务真正占用的处理器时间,单位为ms。CPUs utilized = task-clock / time elapsed,CPU的占用率。 context-switches:程序在运行过程中上下文的切换次数。 CPU-migrations:程序在运行过程中发生的处理器迁移次数。Linux为了维持多个处理器的负载均衡,在特定条件下会将某个任务从一个CPU迁移到另一个CPU。 CPU迁移和上下文切换:发生上下文切换不一定会发生CPU迁移,而发生CPU迁移时肯定会发生上下文切换。发生上下文切换有可能只是把上下文从当前CPU中换出,下一次调度器还是将进程安排在这个CPU上执行。 page-faults:缺页异常的次数。当应用程序请求的页面尚未建立、请求的页面不在内存中,或者请求的页面虽然在内存中,但物理地址和虚拟地址的映射关系尚未建立时,都会触发一次缺页异常。另外TLB不命中,页面访问权限不匹配等情况也会触发缺页异常。 cycles:消耗的处理器周期数。如果把被ls使用的cpu cycles看成是一个处理器的,那么它的主频为2.486GHz。可以用cycles / task-clock算出。 stalled-cycles-frontend:指令读取或解码的质量步骤,未能按理想状态发挥并行左右,发生停滞的时钟周期。 stalled-cycles-backend:指令执行步骤,发生停滞的时钟周期。 instructions:执行了多少条指令。IPC为平均每个cpu cycle执行了多少条指令。 branches:遇到的分支指令数。branch-misses是预测错误的分支指令数。 ``` 其他常用参数 ``` -a, --all-cpus 显示所有CPU上的统计信息 -C, --cpu <cpu> 显示指定CPU的统计信息 -c, --scale scale/normalize counters -D, --delay <n> ms to wait before starting measurement after program start -d, --detailed detailed run - start a lot of events -e, --event <event> event selector. use 'perf list' to list available events -G, --cgroup <name> monitor event in cgroup name only -g, --group put the counters into a counter group -I, --interval-print <n> print counts at regular interval in ms (>= 10) -i, --no-inherit child tasks do not inherit counters -n, --null null run - dont start any counters -o, --output <file> 输出统计信息到文件 -p, --pid <pid> stat events on existing process id -r, --repeat <n> repeat command and print average + stddev (max: 100, forever: 0) -S, --sync call sync() before starting a run -t, --tid <tid> stat events on existing thread id ... ``` 示例 前面统计程序的示例,下面看一下统计CPU信息的示例: 执行sudo perf stat -C 0,统计CPU 0的信息。想要停止后,按下Ctrl+C终止。可以看到统计项一样,只是统计对象变了。 ``` al@al-System-Product-Name:~/perf$ sudo perf stat -C 0 ^C Performance counter stats for 'CPU(s) 0': 2517.107315 cpu-clock (msec) # 1.000 CPUs utilized 2,941 context-switches # 0.001 M/sec 109 cpu-migrations # 0.043 K/sec 38 page-faults # 0.015 K/sec 644,094,340 cycles # 0.256 GHz (49.94%) 70,425,076 stalled-cycles-frontend # 10.93% frontend cycles idle (49.94%) 965,270,543 stalled-cycles-backend # 149.86% backend cycles idle (49.94%) 623,284,864 instructions # 0.97 insn per cycle # 1.55 stalled cycles per insn (50.06%) 65,658,190 branches # 26.085 M/sec (50.06%) 3,276,104 branch-misses # 4.99% of all branches (50.06%) 2.516996126 seconds time elapsed ``` 如果需要统计更多的项,需要使用-e,如: ``` perf stat -e task-clock,context-switches,cpu-migrations,page-faults,cycles,stalled-cycles-frontend,stalled-cycles-backend,instructions,branches,branch-misses,L1-dcache-loads,L1-dcache-load-misses,LLC-loads,LLC-load-misses,dTLB-loads,dTLB-load-misses ls 结果如下,关注的特殊项也纳入统计。 al@al-System-Product-Name:~/perf$ sudo perf stat -e task-clock,context-switches,cpu-migrations,page-faults,cycles,stalled-cycles-frontend,stalled-cycles-backend,instructions,branches,branch-misses,L1-dcache-loads,L1-dcache-load-misses,LLC-loads,LLC-load-misses,dTLB-loads,dTLB-load-misses ls Performance counter stats for 'ls': 2.319422 task-clock (msec) # 0.719 CPUs utilized 0 context-switches # 0.000 K/sec 0 cpu-migrations # 0.000 K/sec 89 page-faults # 0.038 M/sec 2,142,386 cycles # 0.924 GHz 659,800 stalled-cycles-frontend # 30.80% frontend cycles idle 725,343 stalled-cycles-backend # 33.86% backend cycles idle 1,344,518 instructions # 0.63 insn per cycle # 0.54 stalled cycles per insn <not counted> branches <not counted> branch-misses <not counted> L1-dcache-loads <not counted> L1-dcache-load-misses <not counted> LLC-loads <not counted> LLC-load-misses <not counted> dTLB-loads <not counted> dTLB-load-misses 0.003227507 seconds time elapsed ```