You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: content/english/hpc/compilation/situational.md
+36-16Lines changed: 36 additions & 16 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -3,9 +3,19 @@ title: Situational Optimizations
3
3
weight: 3
4
4
---
5
5
6
+
<!--
7
+
6
8
Generally, you always want to specify the exact platform you are running and turn on `-O3`, but other optimizations, like the ones discussed [in the previous section](../assembly), are far more situational and require some input from the programmer.
7
9
8
-
**Loop unrolling** is disabled by default, unless the loop takes a small constant number of iterations known at compile time (in which case it will be replaced with a completely jump-free repeated sequence of instructions). It can be enabled with `-funroll-loops` flag, which will unroll all loops whose number of iterations can be determined at compile time or upon entry to the loop.
10
+
-->
11
+
12
+
Most compiler optimizations enabled by `-O2` and `-O3` are guaranteed to either improve or at least not seriously hurt performance. Those that aren't included in `-O3` are either not strictly standard-compliant, or highly circumstantial and require some additional input from the programmer to help decide whether using them is beneficial.
13
+
14
+
Let's discuss the most frequently used ones that we've also previously covered in this book.
15
+
16
+
### Loop Unrolling
17
+
18
+
[Loop unrolling](/hpc/architecture/loops#loop-unrolling) is disabled by default, unless the loop takes a small constant number of iterations known at compile time — in which case it will be replaced with a completely jump-free, repeated sequence of instructions. It can be enabled globally with the `-funroll-loops` flag, which will unroll all loops whose number of iterations can be determined at compile time or upon entry to the loop.
9
19
10
20
You can also use a pragma to target a specific loop:
11
21
@@ -18,7 +28,27 @@ for (int i = 0; i < n; i++) {
18
28
19
29
Loop unrolling makes binary larger, and may or may not make it run faster. Don't use it fanatically.
20
30
21
-
**Likeliness of branches** can be hinted by `[[likely]]` and `[[unlikely]]` attributes in ifs and switches:
31
+
### Function Inlining
32
+
33
+
[Inlining](/hpc/architecture/functions#inlining) is best left for the compiler to decide, but you can influence it with `inline` keyword:
34
+
35
+
```c++
36
+
inlineintsquare(int x) {
37
+
return x * x;
38
+
}
39
+
```
40
+
41
+
The hint may be ignored though if the compiler thinks that the potential performance gains are not worth it. You can force inlining by adding the `always_inline` attribute:
There is also the `-finline-limit=n` option which lets you set a specific threshold on the size of inlined functions (in terms of the number of instructions). Its Clang equivalent is `-inline-threshold`.
48
+
49
+
### Likeliness of Branches
50
+
51
+
[Likeliness of branches](/hpc/architecture/layout#unequal-branches) can be hinted by `[[likely]]` and `[[unlikely]]` attributes in `if`-s and `switch`-es:
22
52
23
53
```c++
24
54
intfactorial(int n) {
@@ -29,7 +59,7 @@ int factorial(int n) {
29
59
}
30
60
```
31
61
32
-
This is a new feature that only appeared in C++20. Before that, there were GCC-specific `likely` and `unlikely` macros similarly used to wrap condition expressions:
62
+
This is a new feature that only appeared in C++20. Before that, there were compiler-specific intrinsics similarly used to wrap condition expressions. The same example in older GCC:
33
63
34
64
```c++
35
65
int factorial(int n) {
@@ -40,20 +70,10 @@ int factorial(int n) {
40
70
}
41
71
```
42
72
43
-
What it usually does is it swaps the branches so that the more likely one goes immediately after jump (recall that "don't jump" branch is taken by default). The performance gain is usually rather small, because for most hot spots hardware branch prediction works just fine.
44
-
45
-
**Inlining** is best left for the compiler to decide, but you can influence it with `inline` keyword:
46
-
47
-
```c++
48
-
inlineintsquare(int x) {
49
-
return x * x;
50
-
}
51
-
```
73
+
<!--
52
74
53
-
The hint may be ignored though if the compiler thinks that the performance gains are not be worth it. You can force inlining in GCC by adding `always_inline` attribute:
75
+
What it usually does is it swaps the branches so that the more likely one goes immediately after jump (recall that "don't jump" branch is taken by default). The performance gain is usually rather small, because for most hot spots hardware branch prediction works just fine.
There are many other cases like this when you need to point the compiler in the right direction, but we will get to them later when they become more relevant.
0 commit comments