Description
Bugzilla Link | 50211 |
Version | trunk |
OS | Linux |
CC | @Arnaud-de-Grandmaison-ARM,@dwblaikie,@efriedma-quic,@pcc,@smithp35,@stephenhines |
Extended Description
When linking 64bit Android library libqcrilNr.so, using in house llvm compiler, code base similar to community release/12.X. Observing ThinLTO link time taking much longer than full lto.
When link with full lto, takes around 5 minutes.
When link with thin lto, takes around 17 minutes.
I can't really figure out why thin lto is taking much longer. Seeking help here, see if anyone more familiar with this issue, or knowing which part of llvm could possibly contribute to this slow down?
Can't really share any object file here. But sharing the time reports, hope this can help.
Time report for full lto:
===-------------------------------------------------------------------------===
... Pass execution timing report ...
===-------------------------------------------------------------------------===
Total Execution Time: 444.9100 seconds (444.2988 wall clock)
---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name ---
69.7080 ( 22.6%) 28.3786 ( 20.8%) 98.0866 ( 22.0%) 98.0891 ( 22.1%) AArch64 Assembly Printer
33.5838 ( 10.9%) 13.1880 ( 9.7%) 46.7717 ( 10.5%) 46.6892 ( 10.5%) AArch64 Instruction Selection
14.8625 ( 4.8%) 7.5726 ( 5.6%) 22.4351 ( 5.0%) 22.4490 ( 5.1%) Machine Module Information
13.1707 ( 4.3%) 6.7261 ( 4.9%) 19.8967 ( 4.5%) 19.9138 ( 4.5%) Dominator Tree Construction #7
12.8369 ( 4.2%) 6.6886 ( 4.9%) 19.5255 ( 4.4%) 19.5272 ( 4.4%) Function Alias Analysis Results #4
12.8099 ( 4.2%) 6.6547 ( 4.9%) 19.4646 ( 4.4%) 19.4668 ( 4.4%) Basic Alias Analysis (stateless AA impl) #4
6.9480 ( 2.3%) 3.2763 ( 2.4%) 10.2243 ( 2.3%) 10.2027 ( 2.3%) Greedy Register Allocator
6.2687 ( 2.0%) 3.1288 ( 2.3%) 9.3975 ( 2.1%) 9.3346 ( 2.1%) Prologue/Epilogue Insertion & Frame Finalization
5.1671 ( 1.7%) 2.3538 ( 1.7%) 7.5209 ( 1.7%) 7.4856 ( 1.7%) Live DEBUG_VALUE analysis
4.7978 ( 1.6%) 2.1008 ( 1.5%) 6.8986 ( 1.6%) 6.8701 ( 1.5%) Live Variable Analysis
5.3652 ( 1.7%) 0.9511 ( 0.7%) 6.3163 ( 1.4%) 6.2932 ( 1.4%) Module Verifier #2
5.1952 ( 1.7%) 0.5094 ( 0.4%) 5.7046 ( 1.3%) 5.6843 ( 1.3%) Module Verifier
3.6007 ( 1.2%) 1.5525 ( 1.1%) 5.1533 ( 1.2%) 5.1344 ( 1.2%) Live Interval Analysis
3.4184 ( 1.1%) 1.5104 ( 1.1%) 4.9288 ( 1.1%) 4.8973 ( 1.1%) Simple Register Coalescing
3.2224 ( 1.0%) 1.6417 ( 1.2%) 4.8641 ( 1.1%) 4.8603 ( 1.1%) Machine Natural Loop Construction #2
2.8867 ( 0.9%) 1.4917 ( 1.1%) 4.3784 ( 1.0%) 4.3753 ( 1.0%) MachineDominator Tree Construction #5
3.0209 ( 1.0%) 1.0482 ( 0.8%) 4.0691 ( 0.9%) 4.0510 ( 0.9%) Memory SSA
2.8420 ( 0.9%) 1.1124 ( 0.8%) 3.9544 ( 0.9%) 3.9477 ( 0.9%) Free MachineFunction
2.0593 ( 0.7%) 1.0078 ( 0.7%) 3.0671 ( 0.7%) 3.0524 ( 0.7%) Insert stack protectors
1.8488 ( 0.6%) 0.8942 ( 0.7%) 2.7430 ( 0.6%) 2.7401 ( 0.6%) Slot index numbering #2
1.5921 ( 0.5%) 0.7925 ( 0.6%) 2.3846 ( 0.5%) 2.3749 ( 0.5%) MachineDominator Tree Construction
1.4949 ( 0.5%) 0.7285 ( 0.5%) 2.2234 ( 0.5%) 2.2182 ( 0.5%) Machine Block Frequency Analysis #3
1.5021 ( 0.5%) 0.7089 ( 0.5%) 2.2110 ( 0.5%) 2.2009 ( 0.5%) Branch Probability Analysis #2
1.3685 ( 0.4%) 0.7354 ( 0.5%) 2.1039 ( 0.5%) 2.0826 ( 0.5%) Scalar Evolution Analysis
1.3959 ( 0.5%) 0.6686 ( 0.5%) 2.0645 ( 0.5%) 2.0621 ( 0.5%) MachineDominator Tree Construction #2
For thin lto, observing single cpu running for the first 13 minutes, the later 4 minutes are multi-threaded.
===-------------------------------------------------------------------------===
... Pass execution timing report ...
===-------------------------------------------------------------------------===
Total Execution Time: 122903.7639 seconds (7749.4743 wall clock)
---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name ---
304.7405 ( 3.8%) 2.1033 ( 0.0%) 306.8439 ( 0.2%) 306.8722 ( 4.0%) Lower type metadata
289.0256 ( 3.6%) 0.8153 ( 0.0%) 289.8409 ( 0.2%) 289.8686 ( 3.7%) Branch Probability Basic Block Placement
190.9626 ( 2.4%) 0.4888 ( 0.0%) 191.4514 ( 0.2%) 191.4685 ( 2.5%) Branch relaxation pass
76.6860 ( 0.9%) 1136.7987 ( 1.0%) 1213.4848 ( 1.0%) 62.8574 ( 0.8%) AArch64 Assembly Printer #116
42.9683 ( 0.5%) 2.5747 ( 0.0%) 45.5430 ( 0.0%) 45.5567 ( 0.6%) AArch64 Instruction Selection
42.5507 ( 0.5%) 0.1631 ( 0.0%) 42.7139 ( 0.0%) 42.7174 ( 0.6%) Control Flow Optimizer
52.7947 ( 0.7%) 782.3718 ( 0.7%) 835.1665 ( 0.7%) 41.8995 ( 0.5%) AArch64 Assembly Printer #7
47.6952 ( 0.6%) 704.5987 ( 0.6%) 752.2939 ( 0.6%) 37.7206 ( 0.5%) AArch64 Assembly Printer #40
30.7870 ( 0.4%) 504.1012 ( 0.4%) 534.8883 ( 0.4%) 36.6904 ( 0.5%) AArch64 Assembly Printer #23
26.1035 ( 0.3%) 449.4197 ( 0.4%) 475.5232 ( 0.4%) 33.7455 ( 0.4%) AArch64 Assembly Printer #137
32.0621 ( 0.4%) 261.0043 ( 0.2%) 293.0665 ( 0.2%) 33.0863 ( 0.4%) AArch64 Assembly Printer #11
36.7061 ( 0.5%) 575.6889 ( 0.5%) 612.3949 ( 0.5%) 31.3014 ( 0.4%) AArch64 Assembly Printer #121
40.1155 ( 0.5%) 522.4378 ( 0.5%) 562.5532 ( 0.5%) 28.2140 ( 0.4%) AArch64 Assembly Printer #114
25.4224 ( 0.3%) 216.7932 ( 0.2%) 242.2156 ( 0.2%) 27.9494 ( 0.4%) AArch64 Assembly Printer #139
34.6752 ( 0.4%) 490.1866 ( 0.4%) 524.8617 ( 0.4%) 26.3398 ( 0.3%) Branch Probability Analysis #231
34.2679 ( 0.4%) 441.6632 ( 0.4%) 475.9312 ( 0.4%) 23.8663 ( 0.3%) AArch64 Assembly Printer #106
35.0161 ( 0.4%) 427.0860 ( 0.4%) 462.1021 ( 0.4%) 23.2253 ( 0.3%) AArch64 Assembly Printer #107
22.1234 ( 0.3%) 128.0110 ( 0.1%) 150.1344 ( 0.1%) 22.4227 ( 0.3%) AArch64 Assembly Printer #92
20.8856 ( 0.3%) 138.7600 ( 0.1%) 159.6456 ( 0.1%) 22.3985 ( 0.3%) AArch64 Assembly Printer #95
18.4032 ( 0.2%) 189.8183 ( 0.2%) 208.2215 ( 0.2%) 21.9948 ( 0.3%) AArch64 Assembly Printer #100
16.5243 ( 0.2%) 387.3604 ( 0.3%) 403.8847 ( 0.3%) 20.2050 ( 0.3%) AArch64 Assembly Printer #28
18.5907 ( 0.2%) 90.6694 ( 0.1%) 109.2601 ( 0.1%) 20.0015 ( 0.3%) AArch64 Assembly Printer #140
19.3601 ( 0.2%) 72.3643 ( 0.1%) 91.7244 ( 0.1%) 17.7730 ( 0.2%) AArch64 Assembly Printer #141