什么是perf?
Linux性能調(diào)優(yōu)工具,32內(nèi)核以上自帶的工具,軟件性能分析。在2.6.31及后續(xù)版本的linux內(nèi)核里,安裝perf非常的容易。
幾乎能夠處理所有與性能相關的事件。
什么是性能事件?
指在處理器或者操作系統(tǒng)中發(fā)生,可能影響到程序性能的硬件事件或者軟件事情。
主要關注點在哪里?
算法優(yōu)化(空間復雜度、時間復雜度)、代碼優(yōu)化(提到執(zhí)行速度、減少內(nèi)存占用)
評估程序?qū)τ布Y源的使用情況,例如各級cache的訪問次數(shù),各級cache的丟失次數(shù)、流水線停頓周期、前端總線訪問次數(shù)等。
評估程序?qū)Σ僮飨到y(tǒng)資源的使用情況,系統(tǒng)調(diào)用次數(shù)、上下文切換次數(shù)、任務遷移次數(shù)。
基本原理?
硬件的話采用PMC(performance monitoring unit)CPU的部件,在特定的條件下探測的性能事件是否發(fā)生以及發(fā)生的次數(shù)。
軟件性能測試,內(nèi)置于kernel,分布在各個功能模塊中,統(tǒng)計和操作系統(tǒng)相關性能事件。
如何使用高精度的采樣?
如果需要采用高精度的采樣,需要在制定性能事情時,在事件后添加后綴“:p”或者“:pp”
[cpp]?view plain?copy
0:無精度保證??
1:采樣指令好觸發(fā)性能時間的指令偏差為常數(shù)(:p)??
2:盡量保證偏差為0(:pp)??
3:保證偏差必須為0(:ppp)??
有哪些常用的命令?
1、perf list 列出所有能夠觸發(fā)perf采樣點的事件(當前硬件環(huán)境支持的性能事件)
總體分為三類hardware(硬件產(chǎn)生)、software(內(nèi)核軟件產(chǎn)生)、tradepoint(內(nèi)核中靜態(tài)tracepoint觸發(fā)事件)。
[html]?view plain?copy
List?of?pre-defined?events?(to?be?used?in?-e):??
cpu-cycles?OR?cycles???????????????????????????????[Hardware?event]處理器周期事件??
stalled-cycles-frontend?OR?idle-cycles-frontend????[Hardware?event]??
stalled-cycles-backend?OR?idle-cycles-backend??????[Hardware?event]??
instructions???????????????????????????????????????[Hardware?event]??
cache-references???????????????????????????????????[Hardware?event]??
cache-misses???????????????????????????????????????[Hardware?event]??
branch-instructions?OR?branches????????????????????[Hardware?event]??
branch-misses??????????????????????????????????????[Hardware?event]??
bus-cycles?????????????????????????????????????????[Hardware?event]??
cpu-clock??????????????????????????????????????????[Software?event]??
task-clock?????????????????????????????????????????[Software?event]??
page-faults?OR?faults??????????????????????????????[Software?event]??
minor-faults???????????????????????????????????????[Software?event]??
major-faults???????????????????????????????????????[Software?event]??
context-switches?OR?cs?????????????????????????????[Software?event]??
cpu-migrations?OR?migrations???????????????????????[Software?event]??
alignment-faults???????????????????????????????????[Software?event]??
emulation-faults???????????????????????????????????[Software?event]??
L1-dcache-loads????????????????????????????????????[Hardware?cache?event]??
L1-dcache-load-misses??????????????????????????????[Hardware?cache?event]??
L1-dcache-stores???????????????????????????????????[Hardware?cache?event]??
L1-dcache-store-misses?????????????????????????????[Hardware?cache?event]??
L1-dcache-prefetches???????????????????????????????[Hardware?cache?event]??
L1-dcache-prefetch-misses??????????????????????????[Hardware?cache?event]??
L1-icache-loads????????????????????????????????????[Hardware?cache?event]??
L1-icache-load-misses??????????????????????????????[Hardware?cache?event]??
L1-icache-prefetches???????????????????????????????[Hardware?cache?event]??
L1-icache-prefetch-misses??????????????????????????[Hardware?cache?event]??
LLC-loads??????????????????????????????????????????[Hardware?cache?event]??
LLC-load-misses????????????????????????????????????[Hardware?cache?event]??
LLC-stores?????????????????????????????????????????[Hardware?cache?event]??
LLC-store-misses???????????????????????????????????[Hardware?cache?event]??
LLC-prefetches?????????????????????????????????????[Hardware?cache?event]??
LLC-prefetch-misses????????????????????????????????[Hardware?cache?event]??
dTLB-loads?????????????????????????????????????????[Hardware?cache?event]??
dTLB-load-misses???????????????????????????????????[Hardware?cache?event]??
dTLB-stores????????????????????????????????????????[Hardware?cache?event]??
dTLB-store-misses??????????????????????????????????[Hardware?cache?event]??
dTLB-prefetches????????????????????????????????????[Hardware?cache?event]??
dTLB-prefetch-misses???????????????????????????????[Hardware?cache?event]??
iTLB-loads?????????????????????????????????????????[Hardware?cache?event]??
iTLB-load-misses???????????????????????????????????[Hardware?cache?event]??
branch-loads???????????????????????????????????????[Hardware?cache?event]??
branch-load-misses?????????????????????????????????[Hardware?cache?event]??
2、perf stat分析程序的整體性能
利用10個典型事件剖析了應用程序。
task-clock:目標任務真真占用處理器的時間,單位是毫秒,我們稱之為任務執(zhí)行時間,
后面是任務的處理器占用率(執(zhí)行時間和持續(xù)時間的比值)
持續(xù)時間值從任務提交到任務結束的總時間(總時間在stat結束之后會打印出來)。
context-switches:上下文切換次數(shù),前半部分是切換次數(shù),后面是平均每秒發(fā)生次數(shù)(M是10的6次方)。
cpu-migrations:處理器遷移,linux為了位置各個處理器的負載均衡,
會在特定的條件下將某個任務從一個處理器遷往另外一個處理器,此時便是發(fā)生了一次處理器遷移。
page-fault:缺頁異常,linux內(nèi)存管理子系統(tǒng)采用了分頁機制,
當應用程序請求的頁面尚未建立、請求的頁面不在內(nèi)存中或者請求的頁面雖在在內(nèi)存中,
但是尚未建立物理地址和虛擬地址的映射關系是,會觸發(fā)一次缺頁異常。
cycles:任務消耗的處理器周期數(shù)
instructions:任務執(zhí)行期間產(chǎn)生的處理器指令數(shù),IPC(instructions perf cycle)
IPC是評價處理器與應用程序性能的重要指標。(很多指令需要多個處理周期才能執(zhí)行完畢),
IPC越大越好,說明程序充分利用了處理器的特征。
branches:程序在執(zhí)行期間遇到的分支指令數(shù)。
branch-misses:預測錯誤的分支指令數(shù)
cache-misses:cache時效的次數(shù)
cache-references:cache的命中次數(shù)
常用的參數(shù)如下
[cpp]?view plain?copy
-e,指定性能事件??
-p,指定分析進程的PID??
-t,指定待分析線程的TID??
-r?N,連續(xù)分析N次??
-d,全面性能分析,采用更多的性能事件??
一次分析后的結果如下:
[html]?view plain?copy
Performance?counter?stats?for?process?id?'21787':??
42677.253367?task-clock????????????????#????0.142?CPUs?utilized???????????
587,906?context-switches??????????#????0.014?M/sec???????????????????
29,209?CPU-migrations????????????#????0.001?M/sec???????????????????
117?page-faults???????????????#????0.000?M/sec???????????????????
82,341,400,508?cycles????????????????????#????1.929?GHz?????????????????????[83.48%]??
61,262,984,952?stalled-cycles-frontend???#???74.40%?frontend?cycles?idle????[83.28%]??
43,113,701,768?stalled-cycles-backend????#???52.36%?backend??cycles?idle????[66.72%]??
44,023,301,495?instructions??????????????#????0.53??insns?per?cycle?????????
#????1.39??stalled?cycles?per?insn?[83.50%]??
8,137,448,528?branches??????????????????#??190.674?M/sec???????????????????[83.22%]??
430,957,756?branch-misses?????????????#????5.30%?of?all?branches?????????[83.34%]??
300.393753095?seconds?time?elapsed??
3、perf top實時顯示系統(tǒng)/進程的性能統(tǒng)計信息
默認性能事件“cycles CPU周期數(shù)”進行全系統(tǒng)的性能剖析
常見的參數(shù)如下:
[cpp]?view plain?copy
-p:指定進程PID??
-t:指定線程的TID??
-a:分析整個系統(tǒng)的性能(默認)??
-d:界面刷新周期,默認是2秒??
結果輸出中,比例是該符號引發(fā)的性能時間在整個監(jiān)測域中占的比例,通常稱為熱度。
[html]?view plain?copy
samples??pcnt?function???????????????????????????????????????????????????????????????????????????????DSO??
_______?_____?______________________________________________________________________________________?_________??
61.00?19.4%?native_write_msr_safe??????????????????????????????????????????????????????????????????[kernel]??
18.00??5.7%?JVM_InternString???????????????????????????????????????????????????????????????????????libjvm.so??
17.00??5.4%?find_busiest_group?????????????????????????????????????????????????????????????????????[kernel]??
17.00??5.4%?_spin_lock?????????????????????????????????????????????????????????????????????????????[kernel]??
12.00??3.8%?dev_hard_start_xmit????????????????????????????????????????????????????????????????????[kernel]??
11.00??3.5%?tg_load_down???????????????????????????????????????????????????????????????????????????[kernel]??
9.00??2.9%?futex_wake?????????????????????????????????????????????????????????????????????????????[kernel]??
8.00??2.5%?do_futex???????????????????????????????????????????????????????????????????????????????[kernel]??
7.00??2.2%?load_balance_fair??????????????????????????????????????????????????????????????????????[kernel]??
7.00??2.2%?weighted_cpuload???????????????????????????????????????????????????????????????????????[kernel]??
7.00??2.2%?update_cfs_shares??????????????????????????????????????????????????????????????????????[kernel]??
7.00??2.2%?JVM_LatestUserDefinedLoader????????????????????????????????????????????????????????????libjvm.so??
6.00??1.9%?update_cfs_load????????????????????????????????????????????????????????????????????????[kernel]??
5.00??1.6%?_ZN16SystemDictionary30resolve_instance_class_or_nullE12symbolHandle6HandleS1_P6Thread?libjvm.so??
5.00??1.6%?br_sysfs_delbr?????????????????????????????????????????????????????????????????????????[bridge]??
5.00??1.6%?futex_wait????????????
4、perf ?record/report記錄一段時間內(nèi)系統(tǒng)/進程的性能事件
默認在當前目錄下生成數(shù)據(jù)文件:perf.data
report讀取生成的perf.data文件,-i參數(shù)指定路徑
了解perf,是性能分析的開始。
http://www.ibm.com/developerworks/cn/linux/l-cn-perf1/
?
評論
查看更多