GCC Compiler Option 1 : Optimization Options

手册

全体选项其中一部分是Optimize-Options

1
2
3
4
# 会列出可选项
g++ -march=native -m32 ... -Q --help=target
# 会列出O3默认开启和关闭选项
g++ -O3 -Q --help=optimizers

编译时最好按照其分类有效组织, 例子如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
g++ 
# Warning Options
-Wall -Werror -Wno-unknown-pragmas -Wno-dangling-pointer
# Program Instrumentation Options
-fno-stack-protector
# Code-Gen-Options
-fno-exceptions -funwind-tables -fasynchronous-unwind-tables
# C++ Dialect
-fabi-version=2 -faligned-new -fno-rtti
# define
-DPIN_CRT=1 -DTARGET_IA32E -DHOST_IA32E -fPIC -DTARGET_LINUX
# include
-I../../../source/include/pin
-I../../../source/include/pin/gen
-isystem /staff/shaojiemike/Download/pin-3.28-98749-g6643ecee5-gcc-linux/extras/cxx/include
-isystem /staff/shaojiemike/Download/pin-3.28-98749-g6643ecee5-gcc-linux/extras/crt/include
-isystem /staff/shaojiemike/Download/pin-3.28-98749-g6643ecee5-gcc-linux/extras/crt/include/arch-x86_64
-isystem /staff/shaojiemike/Download/pin-3.28-98749-g6643ecee5-gcc-linux/extras/crt/include/kernel/uapi
-isystem /staff/shaojiemike/Download/pin-3.28-98749-g6643ecee5-gcc-linux/extras/crt/include/kernel/uapi/asm-x86
-I../../../extras/components/include
-I../../../extras/xed-intel64/include/xed
-I../../../source/tools/Utils
-I../../../source/tools/InstLib
# Optimization Options
-O3 -fomit-frame-pointer -fno-strict-aliasing
-c -o obj-intel64/inscount0.o inscount0.cpp

常见选项

  • -Wxxx 对 xxx 启动warning,
  • -fxxx 启动xxx的编译器功能。-fno-xxx 关闭对应选项???
  • -gxxx debug 相关
  • -mxxx 特定机器架构的选项
名称 含义
-Wall 打开常见的所有warning选项
-Werror 把warning当成error
-std= C or C++ language standard. eg ‘c++11’ == ‘c++0x’ ‘c++17’ == ‘c++1z’, which ‘c++0x’,’c++17’ is develop codename
-Wunknown-pragmas 未知的pragma会报错(-Wno-unknown-pragmas 应该是相反的)
-fomit-frame-pointer 不生成栈帧指针,属于-O1优化
-Wstack-protector 没有防止堆栈崩溃的函数时warning (-fno-stack-protector)
-MMD only user header files, not system header files.
-fexceptions Enable exception handling.
-funwind-tables Unwind tables contain debug frame information which is also necessary for the handling of such exceptions
-fasynchronous-unwind-tables Generate unwind table in DWARF format. so it can be used for stack unwinding from asynchronous events
-fabi-version=n Use version n of the C++ ABI. The default is version 0.(Version 2 is the version of the C++ ABI that first appeared in G++ 3.4, and was the default through G++ 4.9.) ABI: an application binary interface (ABI) is an interface between two binary program modules. Often, one of these modules is a library or operating system facility, and the other is a program that is being run by a user.
-fno-rtti Disable generation of information about every class with virtual functions for use by the C++ run-time type identification features (dynamic_cast and typeid). If you don’t use those parts of the language, you can save some space by using this flag
-faligned-new Enable support for C++17 new of types that require more alignment than void* ::operator new(std::size_t) provides. A numeric argument such as -faligned-new=32 can be used to specify how much alignment (in bytes) is provided by that function, but few users will need to override the default of alignof(std::max_align_t). This flag is enabled by default for -std=c++17.
-Wl, xxx pass xxx option to linker, e.g., -Wl,-R/staff/shaojiemike/github/MultiPIM_icarus0/common/libconfig/lib specify a runtime library search path for dynamic libraries (shared libraries) during the linking process.

General Optimization Options

-O, -O2, -O3

-O3 turns on all optimizations specified by -O2

and also turns on the -finline-functions, -funswitch-loops, -fpredictive-commoning, -fgcse-after-reload, -ftree-loop-vectorize, -ftree-loop-distribute-patterns, -ftree-slp-vectorize, -fvect-cost-model, -ftree-partial-pre and -fipa-cp-clone options

-ffastmath

允许使用浮点计算获得更高的性能,但可能会略微降低精度。

-Ofast

更快但是有保证正确

-flto

(仅限 GNU)链接时优化,当程序链接时检查文件之间的函数调用的步骤。该标志必须用于编译和链接时。使用此标志的编译时间很长,但是根据应用程序,当与 -O* 标志结合使用时,可能会有明显的性能改进。这个标志和任何优化标志都必须传递给链接器,并且应该调用 gcc/g++/gfortran 进行链接而不是直接调用 ld。

-mtune=processor

此标志对特定处理器类型进行额外调整,但它不会生成额外的 SIMD 指令,因此不存在体系结构兼容性问题。调整将涉及对处理器缓存大小、首选指令顺序等的优化。

在 AMD Bulldozer 节点上使用的值为 bdver1,在 AMD Epyc 节点上使用的值为 znver2。是zen ver2的简称。

Optimization Options: 数据预取相关

  1. -fprefetch-loop-arrays
    1. 如果目标机器支持,生成预取内存的指令,以提高访问大数组的循环的性能。这个选项可能产生更好或更差的代码;结果在很大程度上取决于源代码中的循环结构。
    2. -Os禁用

Optimization Options: 访存优化相关

https://zhuanlan.zhihu.com/p/496435946

下面没有特别指明都是O3,默认开启

调整数据的访问顺序

  1. -ftree-loop-distribution
    1. 允许将一个复杂的大循环,拆开成多个循环,各自可以继续并行和向量化
  2. -ftree-loop-distribute-patterns
    1. 类似上面一种?
  3. -floop-interchange
    1. 允许交换多层循环次序来连续访存
  4. -floop-unroll-and-jam
    1. 允许多层循环,将外循环按某种系数展开,并将产生的多个内循环融合。

代码段对齐

(不是计算访问的数据)

  1. -falign-functions=n:m:n2:m2
    1. Enabled at levels -O2, -O3.
      类似有一堆

调整代码块的布局

  1. -freorder-blocks
    1. 函数基本块重排来,减少分支

Optimization Options: Unroll Flags

-funroll-loops

Unroll loops whose number of iterations can be determined at compile time or upon entry to the loop. -funroll-loops implies -frerun-cse-after-loop. This option makes code larger, and may or may not make it run faster.

-funroll-all-loops

Unroll all loops, even if their number of iterations is uncertain when the loop is entered. This usually makes programs run more slowly. -funroll-all-loops implies the same options as -funroll-loops,

max-unrolled-insns

The maximum number of instructions that a loop should have if that loop is unrolled, and if the loop is unrolled, it determines how many times the loop code is unrolled.
如果循环被展开,则循环应具有的最大指令数,如果循环被展开,则它确定循环代码被展开的次数。

max-average-unrolled-insns

The maximum number of instructions biased by probabilities of their execution that a loop should have if that loop is unrolled, and if the loop is unrolled, it determines how many times the loop code is unrolled.
如果一个循环被展开,则根据其执行概率偏置的最大指令数,如果该循环被展开,则确定循环代码被展开的次数。

max-unroll-times

The maximum number of unrollings of a single loop.
单个循环的最大展开次数。

Optimization Options: SIMD Instructions

-march=native

会自动检测,但有可能检测不对。

-march=”arch”

这将为特定架构生成 SIMD 指令并应用 -mtune 优化。 arch 的有用值与上面的 -mtune 标志相同。

1
2
3
4
5
6
g++ -march=native -m32 ... -Q --help=target

-mtune= skylake-avx512

Known valid arguments for -march= option:
i386 i486 i586 pentium lakemont pentium-mmx winchip-c6 winchip2 c3 samuel-2 c3-2 nehemiah c7 esther i686 pentiumpro pentium2 pentium3 pentium3m pentium-m pentium4 pentium4m prescott nocona core2 nehalem corei7 westmere sandybridge corei7-avx ivybridge core-avx-i haswell core-avx2 broadwell skylake skylake-avx512 cannonlake icelake-client icelake-server cascadelake tigerlake bonnell atom silvermont slm goldmont goldmont-plus tremont knl knm intel geode k6 k6-2 k6-3 athlon athlon-tbird athlon-4 athlon-xp athlon-mp x86-64 eden-x2 nano nano-1000 nano-2000 nano-3000 nano-x2 eden-x4 nano-x4 k8 k8-sse3 opteron opteron-sse3 athlon64 athlon64-sse3 athlon-fx amdfam10 barcelona bdver1 bdver2 bdver3 bdver4 znver1 znver2 btver1 btver2 generic native

-msse4.2 -mavx -mavx2 -march=core-avx2

dynamic flags

-fPIC

position-independent code(PIC)

需要进一步的研究学习

暂无

遇到的问题

暂无

开题缘由、总结、反思、吐槽~~

参考文献

https://blog.csdn.net/daidodo/article/details/2185222

https://www.bu.edu/tech/support/research/software-and-programming/programming/compilers/gcc-compiler-flags/

GCC Compiler Option 2 : Preprocessor Options

-Mxxx

  • -M option is designed for auto-generate Makefile rules from g++ command.
  • 默认包含-E option to STOP after preprocessor during the compilation
  • 默认包含-w option to DISABLE/suppress all warnings.

Using a complex g++ command as an example:

1
2
g++ -Wall -Werror -Wno-unknown-pragmas -DPIN_CRT=1 -fno-stack-protector -fno-exceptions -funwind-tables -fasynchronous-unwind-tables -fno-rtti -DTARGET_IA32E -DHOST_IA32E -fPIC -DTARGET_LINUX -fabi-version=2 -faligned-new -I../../../source/include/pin -I../../../source/include/pin/gen -isystem /staff/shaojiemike/Download/pin-3.28-98749-g6643ecee5-gcc-linux/extras/cxx/include -isystem /staff/shaojiemike/Download/pin-3.28-98749-g6643ecee5-gcc-linux/extras/crt/include -isystem /staff/shaojiemike/Download/pin-3.28-98749-g6643ecee5-gcc-linux/extras/crt/include/arch-x86_64 -isystem /staff/shaojiemike/Download/pin-3.28-98749-g6643ecee5-gcc-linux/extras/crt/include/kernel/uapi -isystem /staff/shaojiemike/Download/pin-3.28-98749-g6643ecee5-gcc-linux/extras/crt/include/kernel/uapi/asm-x86 -I../../../extras/components/include -I../../../extras/xed-intel64/include/xed -I../../../source/tools/Utils -I../../../source/tools/InstLib -O3 -fomit-frame-pointer -fno-strict-aliasing -Wno-dangling-pointer 
-M inscount0.cpp -o Makefile_bk

In Makefile_bk

1
2
3
4
5
6
7
8
9
inscount0.o: inscount0.cpp \
# sys header
/usr/include/stdc-predef.h \
/staff/shaojiemike/Download/pin-3.28-98749-g6643ecee5-gcc-linux/extras/cxx/include/iostream \
/usr/lib/gcc/x86_64-linux-gnu/11/include/float.h
# usr header
../../../source/include/pin/pin.H \
../../../extras/xed-intel64/include/xed/xed-interface.h \
... more header files
  • -MM not include sys header file
    • e.g., the first 3 header will be disapear.
  • -MF filename config the Makefile rules write to which file instead of to stdout.
  • -M -MG is designed to generate Makefile rules when there is header file missing, treated it as generated in normal.
  • -M -MP will generated M-rules for dependency between header files
    • e.g., header1.h includes header2.h. So header1.h: header2.h in Makefile
  • -MD == -M -MF file without default option -E
    • the default file has a suffix of .d, e.g., inscount0.d for -c inscount0.cpp
  • -MMD == -MD not include sys header file

需要进一步的研究学习

暂无

遇到的问题

暂无

开题缘由、总结、反思、吐槽~~

参考文献

上面回答部分来自ChatGPT-3.5,没有进行正确性的交叉校验。

Memalloc

2

Buddy 内存分配

是一种用于管理计算机内存的算法,旨在有效地分配和释放内存块,以防止碎片化并提高内存的使用效率。这种算法通常用于操作系统中,以管理系统内核和进程的内存分配。

Buddy 内存分配算法的基本思想是将物理内存划分为大小相等的块,每个块大小都是 2 的幂次方。每个块可以分配给一个正在运行的进程或内核。当内存被分配出去后,它可以被分割成更小的块,或者合并成更大的块,以适应不同大小的内存需求。

算法的名称 “Buddy” 来自于分配的块之间的关系,其中一个块被称为 “buddy”,它是另一个块的大小相等的邻居。这种关系使得在释放内存时,可以尝试将相邻的空闲块合并成更大的块,从而减少内存碎片。

Buddy 内存分配算法的工作流程大致如下:

  1. 初始时,整个可用内存被视为一个大块,大小是 2 的幂次方。

  2. 当一个进程请求内存分配时,算法会搜索可用的块,找到大小合适的块来满足请求。如果找到的块比所需的稍大,它可以被分割成两个相等大小的 “buddy” 块,其中一个分配给请求的进程。

  3. 当一个进程释放内存时,该块会与其 “buddy” 块合并,形成一个更大的块。然后,这个更大的块可以与其它相邻的块继续合并,直到达到较大的块。

Buddy 内存分配算法在一些操作系统中用于管理内核和进程的物理内存,尤其在嵌入式系统和实时操作系统中,以提高内存使用效率和避免碎片化问题。

ucore(Micro-kernel Operating System for Education)

是一个用于教育目的的微内核操作系统

linux遇到问题

我们可window写程序占满16G内存

但是linux,用了3GB就会seg fault

猜想是不是有单进程内存限制 https://www.imooc.com/wenda/detail/570992

而且malloc alloc的空间在堆区,我们可以明显的发现这个空间是被栈区包住的,有限的。windows是如何解决这个问题的呢?

  1. 首先这个包住是虚拟地址,通过页表映射到的物理地址是分开的
  2. 根据第一点,可以实现高地址动态向上移动

动态数据区一般就是“堆栈”。“栈 (stack)”和“堆(heap)”是两种不同的动态数据区,栈是一种线性结构,堆是一种链式结构。进程的每个线程都有私有的“栈”,所以每个线程虽然 代码一样,但本地变量的数据都是互不干扰。一个堆栈可以通过“基地址”和“栈顶”地址来描述。全局变量和静态变量分配在静态数据区,本地变量分配在动态数 据区,即堆栈中。程序通过堆栈的基地址和偏移量来访问本地变量。

当进程初始化时,系统会自动为进程创建一个默认堆,这个堆默认所占内存的大小为1M。堆对象由系统进行管理,它在内存中以链式结构存在。

Linux 单进程内存限制

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
/etc/security/limits.conf

# shaojiemike @ node5 in ~ [6:35:51]
$ ulimit -a
-t: cpu time (seconds) unlimited
-f: file size (blocks) unlimited
-d: data seg size (kbytes) unlimited
-s: stack size (kbytes) 8192
-c: core file size (blocks) 0
-m: resident set size (kbytes) unlimited
-u: processes 513967
-n: file descriptors 1024
-l: locked-in-memory size (kbytes) 65536
-v: address space (kbytes) unlimited
-x: file locks unlimited
-i: pending signals 513967
-q: bytes in POSIX msg queues 819200
-e: max nice 0
-r: max rt priority 0
-N 15: unlimited

ulimit -HSn 4096 # H指定了硬性大小,S指定了软性大小,n表示设定单个进程最大的打开文件句柄数量。硬限制是实际的限制,而软限制,是warnning限制,只会做出warning

lsof

文件描述符

文件句柄数

这些限制一般不会限制内存。

超算登录节点任务限制的实现

GNU malloc()

调用malloc(size_t size)函数分配内存成功,总会分配size字节VM(再次强调不是RAM),并返回一个指向刚才所分配内存区域的开端地址。分配的内存会为进程一直保留着,直到你显示地调用free()释放它(当然,整个进程结束,静态和动态分配的内存都会被系统回收)。

GNU libc库提供了二个内存分配函数,分别是malloc()和calloc()。glibc函数malloc()总是通过brk()或mmap()系统调用来满足内存分配需求。函数malloc(),根据不同大小内存要求来选择brk(),还是mmap(),阈值 MMAP_THRESHOLD=128Kbytes是临界值。小块内存(<=128kbytes),会调用brk(),它将数据段的最高地址往更高处推(堆从底部向上增长)。大块内存,则使用mmap()进行匿名映射(设置标志MAP_ANONYMOUS)来分配内存,与堆无关,在堆之外。

malloc不是直接分配内存的,是第一次访问的时候才分配的?

https://www.zhihu.com/question/20836462

问题

  1. 堆区和栈区是进程唯一的吗?
    1. 是的,而且栈主要是为一个线程配备,小可以保证基本在cache里
  2. 两个操作系统的malloc的是物理内存还是虚拟内存
  3. Linux采用的是copy-on-write机制

需要进一步的研究学习

暂无

遇到的问题

暂无

开题缘由、总结、反思、吐槽~~



每次都是6008这里,40000*6008*3/1024/1024=687MB

733448/1024=716MB

问了大师兄,问题竟然是malloc的传入参数错误的类型是int,导致存不下3*40*1024*40*1024。应该用size_t类型。(size_t是跨平台的非负整数安全类型)

参考文献

https://blog.csdn.net/shenzi/article/details/3972437?utm_medium=distribute.pc_relevant_t0.none-task-blog-2%7Edefault%7EBlogCommendFromMachineLearnPai2%7Edefault-1.base&depth_1-utm_source=distribute.pc_relevant_t0.none-task-blog-2%7Edefault%7EBlogCommendFromMachineLearnPai2%7Edefault-1.base

程序(进程)内存空间分布深入理解

AOCC

https://developer.amd.com/amd-aocc/

Install

1
2
3
4
5
6
7
8
cd <compdir>\
tar -xvf aocc-compiler-<ver>.tar
cd aocc-compiler-<ver>
bash install.sh
# It will install the compiler and displaythe AOCC setup instructions.

source <compdir>/setenv_AOCC.sh
# This will setup the shell environment for using AOCC C, C++, and Fortran compiler where the command is executed.

Using AOCC

Libraries

需要进一步的研究学习

暂无

遇到的问题

暂无

开题缘由、总结、反思、吐槽~~

参考文献

https://developer.amd.com/wp-content/resources/AOCC_57223_Install_Guide_Rev_3.1.pdf

homepage interesting upgrade options

homepage Live2d + Mouse click effects + background-music

在\themes\hugo-theme-minos\layouts路径下修改模板html即可

https://www.python87.com/p/881.html

https://github.com/stevenjoezhang/live2d-widget

https://apps.elfsight.com/panel/applications/background-music/

live2d

live2d : https://jingzhisheng.cn/blog/detail/1406456203487350784

https://l2d.alg-wiki.com/

https://github.com/alg-wiki/AzurLaneL2DViewer/tree/gh-pages/assets

pretty blogs

anime theme

需要进一步的研究学习

live2d more

遇到的问题

暂无

参考文献