SHAOJIE'S BOOK

Posted 2023-10-12Updated 2025-01-30Tutorials12 minutes read (About 1771 words)

GCC Compiler Option 1 : Optimization Options

手册

# 会列出可选项
g++ -march=native -m32 ... -Q --help=target 
# 会列出O3默认开启和关闭选项
g++ -O3 -Q --help=optimizers

编译时最好按照其分类有效组织, 例子如下：

g++ 
# Warning Options
-Wall -Werror -Wno-unknown-pragmas -Wno-dangling-pointer 
# Program Instrumentation Options
-fno-stack-protector
# Code-Gen-Options
-fno-exceptions -funwind-tables -fasynchronous-unwind-tables
# C++ Dialect
-fabi-version=2 -faligned-new -fno-rtti
# define
-DPIN_CRT=1 -DTARGET_IA32E -DHOST_IA32E -fPIC -DTARGET_LINUX 
# include
-I../../../source/include/pin 
-I../../../source/include/pin/gen 
-isystem /staff/shaojiemike/Download/pin-3.28-98749-g6643ecee5-gcc-linux/extras/cxx/include 
-isystem /staff/shaojiemike/Download/pin-3.28-98749-g6643ecee5-gcc-linux/extras/crt/include 
-isystem /staff/shaojiemike/Download/pin-3.28-98749-g6643ecee5-gcc-linux/extras/crt/include/arch-x86_64 
-isystem /staff/shaojiemike/Download/pin-3.28-98749-g6643ecee5-gcc-linux/extras/crt/include/kernel/uapi 
-isystem /staff/shaojiemike/Download/pin-3.28-98749-g6643ecee5-gcc-linux/extras/crt/include/kernel/uapi/asm-x86 
-I../../../extras/components/include 
-I../../../extras/xed-intel64/include/xed 
-I../../../source/tools/Utils 
-I../../../source/tools/InstLib 
# Optimization Options
-O3 -fomit-frame-pointer -fno-strict-aliasing 
-c -o obj-intel64/inscount0.o inscount0.cpp

常见选项

-Wxxx 对 xxx 启动warning，
-fxxx 启动xxx的编译器功能。-fno-xxx 关闭对应选项？？？
-gxxx debug 相关
-mxxx 特定机器架构的选项

名称	含义
-Wall	打开常见的所有warning选项
-Werror	把warning当成error
-std=	C or C++ language standard. eg ‘c++11’ == ‘c++0x’ ‘c++17’ == ‘c++1z’, which ‘c++0x’,’c++17’ is develop codename
-Wunknown-pragmas	未知的pragma会报错（-Wno-unknown-pragmas 应该是相反的）
-fomit-frame-pointer	不生成栈帧指针,属于-O1优化
-Wstack-protector	没有防止堆栈崩溃的函数时warning (-fno-stack-protector)
-MMD	only user header files, not system header files.
-fexceptions	Enable exception handling.
-funwind-tables	Unwind tables contain debug frame information which is also necessary for the handling of such exceptions
-fasynchronous-unwind-tables	Generate unwind table in DWARF format. so it can be used for stack unwinding from asynchronous events
-fabi-version=n	Use version n of the C++ ABI. The default is version 0.(Version 2 is the version of the C++ ABI that first appeared in G++ 3.4, and was the default through G++ 4.9.) ABI: an application binary interface (ABI) is an interface between two binary program modules. Often, one of these modules is a library or operating system facility, and the other is a program that is being run by a user.
-fno-rtti	Disable generation of information about every class with virtual functions for use by the C++ run-time type identification features (dynamic_cast and typeid). If you don’t use those parts of the language, you can save some space by using this flag
-faligned-new	Enable support for C++17 new of types that require more alignment than `void* ::operator new(std::size_t)` provides. A numeric argument such as `-faligned-new=32` can be used to specify how much alignment (in bytes) is provided by that function, but few users will need to override the default of `alignof(std::max_align_t)`. This flag is enabled by default for `-std=c++17`.
-Wl, xxx	pass xxx option to linker, e.g., `-Wl,-R/staff/shaojiemike/github/MultiPIM_icarus0/common/libconfig/lib` specify a runtime library search path for dynamic libraries (shared libraries) during the linking process.

General Optimization Options

-O, -O2, -O3

-O3 turns on all optimizations specified by -O2

and also turns on the -finline-functions, -funswitch-loops, -fpredictive-commoning, -fgcse-after-reload, -ftree-loop-vectorize, -ftree-loop-distribute-patterns, -ftree-slp-vectorize, -fvect-cost-model, -ftree-partial-pre and -fipa-cp-clone options

-ffastmath

允许使用浮点计算获得更高的性能，但可能会略微降低精度。

-Ofast

更快但是有保证正确

-flto

（仅限 GNU）链接时优化，当程序链接时检查文件之间的函数调用的步骤。该标志必须用于编译和链接时。使用此标志的编译时间很长，但是根据应用程序，当与 -O* 标志结合使用时，可能会有明显的性能改进。这个标志和任何优化标志都必须传递给链接器，并且应该调用 gcc/g++/gfortran 进行链接而不是直接调用 ld。

-mtune=processor

此标志对特定处理器类型进行额外调整，但它不会生成额外的 SIMD 指令，因此不存在体系结构兼容性问题。调整将涉及对处理器缓存大小、首选指令顺序等的优化。

在 AMD Bulldozer 节点上使用的值为 bdver1，在 AMD Epyc 节点上使用的值为 znver2。是zen ver2的简称。

Optimization Options: 数据预取相关

-fprefetch-loop-arrays
1. 如果目标机器支持，生成预取内存的指令，以提高访问大数组的循环的性能。这个选项可能产生更好或更差的代码；结果在很大程度上取决于源代码中的循环结构。
2. -Os禁用

Optimization Options: 访存优化相关

https://zhuanlan.zhihu.com/p/496435946

下面没有特别指明都是O3，默认开启

调整数据的访问顺序

-ftree-loop-distribution
1. 允许将一个复杂的大循环，拆开成多个循环，各自可以继续并行和向量化
-ftree-loop-distribute-patterns
1. 类似上面一种？
-floop-interchange
1. 允许交换多层循环次序来连续访存
-floop-unroll-and-jam
1. 允许多层循环，将外循环按某种系数展开，并将产生的多个内循环融合。

代码段对齐

(不是计算访问的数据)

-falign-functions=n:m:n2:m2
1. Enabled at levels -O2, -O3.
  类似有一堆

调整代码块的布局

-freorder-blocks
1. 函数基本块重排来，减少分支

Optimization Options: Unroll Flags

-funroll-loops

Unroll loops whose number of iterations can be determined at compile time or upon entry to the loop. -funroll-loops implies -frerun-cse-after-loop. This option makes code larger, and may or may not make it run faster.

-funroll-all-loops

Unroll all loops, even if their number of iterations is uncertain when the loop is entered. This usually makes programs run more slowly. -funroll-all-loops implies the same options as -funroll-loops,

max-unrolled-insns

The maximum number of instructions that a loop should have if that loop is unrolled, and if the loop is unrolled, it determines how many times the loop code is unrolled.
如果循环被展开，则循环应具有的最大指令数，如果循环被展开，则它确定循环代码被展开的次数。

max-average-unrolled-insns

The maximum number of instructions biased by probabilities of their execution that a loop should have if that loop is unrolled, and if the loop is unrolled, it determines how many times the loop code is unrolled.
如果一个循环被展开，则根据其执行概率偏置的最大指令数，如果该循环被展开，则确定循环代码被展开的次数。

max-unroll-times

The maximum number of unrollings of a single loop.
单个循环的最大展开次数。

Optimization Options: SIMD Instructions

-march=native

会自动检测，但有可能检测不对。

-march=”arch”

这将为特定架构生成 SIMD 指令并应用 -mtune 优化。 arch 的有用值与上面的 -mtune 标志相同。

g++ -march=native -m32 ... -Q --help=target

-mtune=                               skylake-avx512 

 Known valid arguments for -march= option:
    i386 i486 i586 pentium lakemont pentium-mmx winchip-c6 winchip2 c3 samuel-2 c3-2 nehemiah c7 esther i686 pentiumpro pentium2 pentium3 pentium3m pentium-m pentium4 pentium4m prescott nocona core2 nehalem corei7 westmere sandybridge corei7-avx ivybridge core-avx-i haswell core-avx2 broadwell skylake skylake-avx512 cannonlake icelake-client icelake-server cascadelake tigerlake bonnell atom silvermont slm goldmont goldmont-plus tremont knl knm intel geode k6 k6-2 k6-3 athlon athlon-tbird athlon-4 athlon-xp athlon-mp x86-64 eden-x2 nano nano-1000 nano-2000 nano-3000 nano-x2 eden-x4 nano-x4 k8 k8-sse3 opteron opteron-sse3 athlon64 athlon64-sse3 athlon-fx amdfam10 barcelona bdver1 bdver2 bdver3 bdver4 znver1 znver2 btver1 btver2 generic native

-msse4.2 -mavx -mavx2 -march=core-avx2

dynamic flags

-fPIC

position-independent code(PIC)

需要进一步的研究学习

暂无

遇到的问题

暂无

开题缘由、总结、反思、吐槽~~

参考文献

https://blog.csdn.net/daidodo/article/details/2185222

https://www.bu.edu/tech/support/research/software-and-programming/programming/compilers/gcc-compiler-flags/

Posted 2023-10-10Updated 2025-01-30Tutorials3 minutes read (About 408 words)

GCC Compiler Option 2 : Preprocessor Options

-Mxxx

-M option is designed for auto-generate Makefile rules from g++ command.
默认包含-E option to STOP after preprocessor during the compilation
默认包含-w option to DISABLE/suppress all warnings.

Using a complex g++ command as an example:

1
2

g++ -Wall -Werror -Wno-unknown-pragmas -DPIN_CRT=1 -fno-stack-protector -fno-exceptions -funwind-tables -fasynchronous-unwind-tables -fno-rtti -DTARGET_IA32E -DHOST_IA32E -fPIC -DTARGET_LINUX -fabi-version=2 -faligned-new -I../../../source/include/pin -I../../../source/include/pin/gen -isystem /staff/shaojiemike/Download/pin-3.28-98749-g6643ecee5-gcc-linux/extras/cxx/include -isystem /staff/shaojiemike/Download/pin-3.28-98749-g6643ecee5-gcc-linux/extras/crt/include -isystem /staff/shaojiemike/Download/pin-3.28-98749-g6643ecee5-gcc-linux/extras/crt/include/arch-x86_64 -isystem /staff/shaojiemike/Download/pin-3.28-98749-g6643ecee5-gcc-linux/extras/crt/include/kernel/uapi -isystem /staff/shaojiemike/Download/pin-3.28-98749-g6643ecee5-gcc-linux/extras/crt/include/kernel/uapi/asm-x86 -I../../../extras/components/include -I../../../extras/xed-intel64/include/xed -I../../../source/tools/Utils -I../../../source/tools/InstLib -O3 -fomit-frame-pointer -fno-strict-aliasing -Wno-dangling-pointer 
-M inscount0.cpp -o Makefile_bk

In Makefile_bk

inscount0.o: inscount0.cpp \
 # sys header
 /usr/include/stdc-predef.h \
 /staff/shaojiemike/Download/pin-3.28-98749-g6643ecee5-gcc-linux/extras/cxx/include/iostream \
 /usr/lib/gcc/x86_64-linux-gnu/11/include/float.h
 # usr header
 ../../../source/include/pin/pin.H \
 ../../../extras/xed-intel64/include/xed/xed-interface.h \
 ... more header files

-MM not include sys header file
- e.g., the first 3 header will be disapear.
-MF filename config the Makefile rules write to which file instead of to stdout.
-M -MG is designed to generate Makefile rules when there is header file missing, treated it as generated in normal.
-M -MP will generated M-rules for dependency between header files
- e.g., header1.h includes header2.h. So header1.h: header2.h in Makefile
-MD == -M -MF file without default option -E
- the default file has a suffix of .d, e.g., inscount0.d for -c inscount0.cpp
-MMD == -MD not include sys header file

需要进一步的研究学习

暂无

遇到的问题

暂无

开题缘由、总结、反思、吐槽~~

参考文献

上面回答部分来自ChatGPT-3.5，没有进行正确性的交叉校验。

无

Posted 2021-07-28Updated 2025-01-30Tutorialsa minute read (About 178 words)

AMD Epyc Compiler Options

AMD EPYC™ 7xx2-series Processors Compiler Options Quick Reference Guide

AOCC compiler (with Flang -Fortran Front-End)

Latest release: 2.1, Nov 2019

https://developer.amd.com/amd-aocc/Advanced

GNU compiler collection (gcc, g++, gfortran)

Intel compilers (icc, icpc, ifort)

amd prace guide

需要进一步的研究学习

Amd uprof
PGI compiler
Numactl
OMP_PROC_BIND=TRUE; OMP_PLACES=sockets

遇到的问题

暂无

开题缘由、总结、反思、吐槽~~

参考文献

https://developer.amd.com/wordpress/media/2020/04/Compiler%20Options%20Quick%20Ref%20Guide%20for%20AMD%20EPYC%207xx2%20Series%20Processors.pdf

https://prace-ri.eu/wp-content/uploads/Best-Practice-Guide_AMD.pdf#page35

Prace guide

Posted 2021-07-27Updated 2025-01-30Tutorials14 minutes read (About 2039 words)

Intel Compile Options

Win与Linux的区别

选项区别

对于大部分选项，Intel编译器在Win上的格式为：/Qopt，那么对应于Lin上的选项是：-opt。禁用某一个选项的方式是/Qopt-和-opt-。

Intel的编译器、链接器等

在Win上，编译器为icl.exe，链接器为xilink.exe，VS的编译器为cl.exe，链接器为link.exe。

在Linux下，C编译器为icc，C++编译器为icpc（但是也可以使用icc编译C++文件），链接器为xild，打包为xiar，其余工具类似命名。

GNU的C编译器为gcc，C++编译器为g++，链接器为ld，打包为ar

并行化

-qopenmp

-qopenmp-simd

如果选项 O2 或更高版本有效，则启用 OpenMP* SIMD 编译。

-parallel

告诉自动并行程序为可以安全地并行执行的循环生成多线程代码。

要使用此选项，您还必须指定选项 O2 或 O3。
如果还指定了选项 O3，则此选项设置选项 [q 或 Q]opt-matmul。

-qopt-matmul

启用或禁用编译器生成的矩阵乘法（matmul）库调用。

向量化(SIMD指令集)

-xHost

必须至少与-O2一起使用，在Linux系统上，如果既不指定-x也不指定-m，则默认值为-msse2。

-fast

On macOS* systems: -ipo, -mdynamic-no-pic,-O3, -no-prec-div,-fp-model fast=2, and -xHost

On Windows* systems: /O3, /Qipo, /Qprec-div-, /fp:fast=2, and /QxHost

On Linux* systems: -ipo, -O3, -no-prec-div,-static, -fp-model fast=2, and -xHost

指定选项 fast 后，您可以通过在命令行上指定不同的特定于处理器的 [Q]x 选项来覆盖 [Q]xHost 选项设置。但是，命令行上指定的最后一个选项优先。

-march

必须至少与-O2一起使用，如果同时指定 -ax 和 -march 选项，编译器将不会生成特定于 Intel 的指令。

指定 -march=pentium4 设置 -mtune=pentium4。

-x

告诉编译器它可以针对哪些处理器功能，包括它可以生成哪些指令集和优化。

AMBERLAKE
BROADWELL
CANNONLAKE
CASCADELAKE
COFFEELAKE
GOLDMONT
GOLDMONT-PLUS
HASWELL
ICELAKE-CLIENT (or ICELAKE)
ICELAKE-SERVER
IVYBRIDGE
KABYLAKE
KNL
KNM
SANDYBRIDGE
SILVERMONT
SKYLAKE
SKYLAKE-AVX512
TREMONT
WHISKEYLAKE

-m

告诉编译器它可能针对哪些功能，包括它可能生成的指令集。

-ax

生成基于多个指令集的代码。

HLO

High-level Optimizations，高级(别)优化。O1不属于

-O2

更广泛的优化。英特尔推荐通用。

在O2和更高级别启用矢量化。

在使用IA-32体系结构的系统上：执行一些基本的循环优化，例如分发、谓词Opt、交换、多版本控制和标量替换。

此选项还支持：

内部函数的内联
文件内过程间优化，包括：
   内联
   恒定传播
   正向替代
   常规属性传播
   可变地址分析
   死静态函数消除
   删除未引用变量
以下性能增益功能：
   恒定传播
   复制传播
   死码消除
   全局寄存器分配
   全局指令调度与控制推测
   循环展开
   优化代码选择
   部分冗余消除
   强度折减/诱导变量简化
   变量重命名
   异常处理优化
   尾部递归
   窥视孔优化
   结构分配降低与优化
   死区消除

-O3

O3选项对循环转换(loop transformations)进行更好的处理来优化内存访问。

比-O2更激进，编译时间更长。建议用于涉及密集浮点计算的循环代码。

既执行O2优化，并支持更积极的循环转换，如Fusion、Block Unroll和Jam以及Collasing IF语句。

此选项可以设置其他选项。这由编译器决定，具体取决于您使用的操作系统和体系结构。设置的选项可能会因版本而异。

当O3与options-ax或-x（Linux）或options/Qax或/Qx（Windows）一起使用时，编译器执行的数据依赖性分析比O2更严格，这可能会导致更长的编译时间。

O3优化可能不会导致更高的性能，除非发生循环和内存访问转换。在某些情况下，与O2优化相比，优化可能会减慢代码的速度。

O3选项建议用于循环大量使用浮点计算和处理大型数据集的应用程序。

与非英特尔微处理器相比，共享库中的许多例程针对英特尔微处理器进行了高度优化。

-Ofast

-O3 plus some extras.

IPO

Interprocedural Optimizations，过程间优化。

典型优化措施包括：过程内嵌与重新排序、消除死（执行不到的）代码以及常数传播和内联等基本优化。

过程间优化，当程序链接时检查文件间函数调用的一个步骤。在编译和链接时必须使用此标志。使用这个标志的编译时间非常长，但是根据应用程序的不同，如果与-O*标志结合使用，可能会有明显的性能改进。

内联

内联或内联展开，简单理解，就是将函数调用用函数体代替，主要优点是省去了函数调用开销和返回指令的开销，主要缺点是可能增大代码大小。

PGO

PGO优化是分三步完成的，是一个动态的优化过程。

PGO，即Profile-Guided Optimizations，档案导引优化。

具体选项详解

-mtune=processor

此标志对特定的处理器类型进行额外的调整，但是它不会生成额外的SIMD指令，因此不存在体系结构兼容性问题。调优将涉及对处理器缓存大小、指令优先顺序等的优化。

为支持指定英特尔处理器或微体系结构代码名的处理器优化代码。

-no-prec-div

不启用提高浮点除法的精度。

-static

不用动态库

-fp-model fast=2

自动向量化时按照固定精度，与OpenMP的选项好像有兼容性的问题

-funroll-all-loops

展开所有循环，即使进入循环时迭代次数不确定。此选项可能会影响性能。

-unroll-aggressive / -no-unroll-aggressive

此选项决定编译器是否对某些循环使用更激进的展开。期权的积极形式可以提高绩效。

此选项可对具有较小恒定递增计数的回路进行积极的完全展开。

falign-loops

将循环对齐到 2 的幂次字节边界。

-falign-loops[=n]是最小对齐边界的可选字节数。它必须是 1 到 4096 之间的 2 的幂，例如 1、2、4、8、16、32、64、128 等。如果为 n 指定 1，则不执行对齐；这与指定选项的否定形式相同。如果不指定 n，则默认对齐为 16 字节。

-O0 / -Od

关闭所有优化选项，-O等于-O2 (Linux* and macOS*)

-O1

在保证代码量不增加的情况下编译，

实现全局优化；这包括数据流分析、代码运动、强度降低和测试替换、分割生存期分析和指令调度。
禁用某些内部函数的内联。

遇到的问题

1	icpc -dM -E -x c++ SLIC.cpp

https://stackoverflow.com/questions/34310546/how-can-i-see-which-compilation-options-are-enabled-on-intel-icc-compiler

parallel 与mpicc 或者mpiicc有什么区别呢

开题缘由、总结、反思、吐槽~~

讲实话，IPO PGO我已经晕了，我先列个list,之后再研究

参考文献

https://blog.csdn.net/gengshenghong/article/details/7034748

按字母顺序排列的intel c++编译器选项列表

手册

常见选项

General Optimization Options

-O, -O2, -O3

-ffastmath

-Ofast

-flto

-mtune=processor

Optimization Options: 数据预取相关

Optimization Options: 访存优化相关

调整数据的访问顺序

代码段对齐

调整代码块的布局

Optimization Options: Unroll Flags

-funroll-loops

-funroll-all-loops

max-unrolled-insns

max-average-unrolled-insns

max-unroll-times

Optimization Options: SIMD Instructions

-march=native

-march=”arch”

-msse4.2 -mavx -mavx2 -march=core-avx2

dynamic flags

-fPIC

需要进一步的研究学习

遇到的问题

开题缘由、总结、反思、吐槽~~

参考文献

-Mxxx

需要进一步的研究学习

遇到的问题

开题缘由、总结、反思、吐槽~~

参考文献

AMD EPYC™ 7xx2-series Processors Compiler Options Quick Reference Guide

AOCC compiler (with Flang -Fortran Front-End)

GNU compiler collection (gcc, g++, gfortran)

Intel compilers (icc, icpc, ifort)

amd prace guide

需要进一步的研究学习

遇到的问题

开题缘由、总结、反思、吐槽~~

参考文献

Win与Linux的区别

选项区别

Intel的编译器、链接器等

并行化

-qopenmp

-qopenmp-simd

-parallel

-qopt-matmul

向量化(SIMD指令集)

-xHost

-fast

-march

-x

-m

-ax

HLO

-O2

-O3

-Ofast

IPO

内联

PGO

具体选项详解

-mtune=processor

-no-prec-div

-static

-fp-model fast=2

-funroll-all-loops

-unroll-aggressive / -no-unroll-aggressive

falign-loops

-O0 / -Od

-O1

遇到的问题

开题缘由、总结、反思、吐槽~~

参考文献

Categories

Subscribe for updates