Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Commit fe2510a

Browse filesBrowse files
committed
situational optimizations edits
1 parent 868a2e5 commit fe2510a
Copy full SHA for fe2510a

1 file changed

+36-16Lines changed: 36 additions & 16 deletions

File tree

Expand file treeCollapse file tree
Open diff view settings
Filter options
Expand file treeCollapse file tree
Open diff view settings
Collapse file

‎content/english/hpc/compilation/situational.md‎

Copy file name to clipboardExpand all lines: content/english/hpc/compilation/situational.md
+36-16Lines changed: 36 additions & 16 deletions
  • Display the source diff
  • Display the rich diff
Original file line numberDiff line numberDiff line change
@@ -3,9 +3,19 @@ title: Situational Optimizations
33
weight: 3
44
---
55

6+
<!--
7+
68
Generally, you always want to specify the exact platform you are running and turn on `-O3`, but other optimizations, like the ones discussed [in the previous section](../assembly), are far more situational and require some input from the programmer.
79
8-
**Loop unrolling** is disabled by default, unless the loop takes a small constant number of iterations known at compile time (in which case it will be replaced with a completely jump-free repeated sequence of instructions). It can be enabled with `-funroll-loops` flag, which will unroll all loops whose number of iterations can be determined at compile time or upon entry to the loop.
10+
-->
11+
12+
Most compiler optimizations enabled by `-O2` and `-O3` are guaranteed to either improve or at least not seriously hurt performance. Those that aren't included in `-O3` are either not strictly standard-compliant, or highly circumstantial and require some additional input from the programmer to help decide whether using them is beneficial.
13+
14+
Let's discuss the most frequently used ones that we've also previously covered in this book.
15+
16+
### Loop Unrolling
17+
18+
[Loop unrolling](/hpc/architecture/loops#loop-unrolling) is disabled by default, unless the loop takes a small constant number of iterations known at compile time — in which case it will be replaced with a completely jump-free, repeated sequence of instructions. It can be enabled globally with the `-funroll-loops` flag, which will unroll all loops whose number of iterations can be determined at compile time or upon entry to the loop.
919

1020
You can also use a pragma to target a specific loop:
1121

@@ -18,7 +28,27 @@ for (int i = 0; i < n; i++) {
1828

1929
Loop unrolling makes binary larger, and may or may not make it run faster. Don't use it fanatically.
2030

21-
**Likeliness of branches** can be hinted by `[[likely]]` and `[[unlikely]]` attributes in ifs and switches:
31+
### Function Inlining
32+
33+
[Inlining](/hpc/architecture/functions#inlining) is best left for the compiler to decide, but you can influence it with `inline` keyword:
34+
35+
```c++
36+
inline int square(int x) {
37+
return x * x;
38+
}
39+
```
40+
41+
The hint may be ignored though if the compiler thinks that the potential performance gains are not worth it. You can force inlining by adding the `always_inline` attribute:
42+
43+
```c++
44+
#define FORCE_INLINE inline __attribute__((always_inline))
45+
```
46+
47+
There is also the `-finline-limit=n` option which lets you set a specific threshold on the size of inlined functions (in terms of the number of instructions). Its Clang equivalent is `-inline-threshold`.
48+
49+
### Likeliness of Branches
50+
51+
[Likeliness of branches](/hpc/architecture/layout#unequal-branches) can be hinted by `[[likely]]` and `[[unlikely]]` attributes in `if`-s and `switch`-es:
2252

2353
```c++
2454
int factorial(int n) {
@@ -29,7 +59,7 @@ int factorial(int n) {
2959
}
3060
```
3161
32-
This is a new feature that only appeared in C++20. Before that, there were GCC-specific `likely` and `unlikely` macros similarly used to wrap condition expressions:
62+
This is a new feature that only appeared in C++20. Before that, there were compiler-specific intrinsics similarly used to wrap condition expressions. The same example in older GCC:
3363
3464
```c++
3565
int factorial(int n) {
@@ -40,20 +70,10 @@ int factorial(int n) {
4070
}
4171
```
4272

43-
What it usually does is it swaps the branches so that the more likely one goes immediately after jump (recall that "don't jump" branch is taken by default). The performance gain is usually rather small, because for most hot spots hardware branch prediction works just fine.
44-
45-
**Inlining** is best left for the compiler to decide, but you can influence it with `inline` keyword:
46-
47-
```c++
48-
inline int square(int x) {
49-
return x * x;
50-
}
51-
```
73+
<!--
5274
53-
The hint may be ignored though if the compiler thinks that the performance gains are not be worth it. You can force inlining in GCC by adding `always_inline` attribute:
75+
What it usually does is it swaps the branches so that the more likely one goes immediately after jump (recall that "don't jump" branch is taken by default). The performance gain is usually rather small, because for most hot spots hardware branch prediction works just fine.
5476
55-
```c++
56-
#define FORCE_INLINE inline __attribute__((always_inline))
57-
```
77+
-->
5878

5979
There are many other cases like this when you need to point the compiler in the right direction, but we will get to them later when they become more relevant.

0 commit comments

Comments
0 (0)
Morty Proxy This is a proxified and sanitized view of the page, visit original site.