SHAOJIE'S BOOK

Posted 2023-10-12Updated 2025-01-30Tutorials12 minutes read (About 1771 words)

GCC Compiler Option 1 : Optimization Options

手册

# 会列出可选项
g++ -march=native -m32 ... -Q --help=target 
# 会列出O3默认开启和关闭选项
g++ -O3 -Q --help=optimizers

编译时最好按照其分类有效组织, 例子如下：

g++ 
# Warning Options
-Wall -Werror -Wno-unknown-pragmas -Wno-dangling-pointer 
# Program Instrumentation Options
-fno-stack-protector
# Code-Gen-Options
-fno-exceptions -funwind-tables -fasynchronous-unwind-tables
# C++ Dialect
-fabi-version=2 -faligned-new -fno-rtti
# define
-DPIN_CRT=1 -DTARGET_IA32E -DHOST_IA32E -fPIC -DTARGET_LINUX 
# include
-I../../../source/include/pin 
-I../../../source/include/pin/gen 
-isystem /staff/shaojiemike/Download/pin-3.28-98749-g6643ecee5-gcc-linux/extras/cxx/include 
-isystem /staff/shaojiemike/Download/pin-3.28-98749-g6643ecee5-gcc-linux/extras/crt/include 
-isystem /staff/shaojiemike/Download/pin-3.28-98749-g6643ecee5-gcc-linux/extras/crt/include/arch-x86_64 
-isystem /staff/shaojiemike/Download/pin-3.28-98749-g6643ecee5-gcc-linux/extras/crt/include/kernel/uapi 
-isystem /staff/shaojiemike/Download/pin-3.28-98749-g6643ecee5-gcc-linux/extras/crt/include/kernel/uapi/asm-x86 
-I../../../extras/components/include 
-I../../../extras/xed-intel64/include/xed 
-I../../../source/tools/Utils 
-I../../../source/tools/InstLib 
# Optimization Options
-O3 -fomit-frame-pointer -fno-strict-aliasing 
-c -o obj-intel64/inscount0.o inscount0.cpp

常见选项

-Wxxx 对 xxx 启动warning，
-fxxx 启动xxx的编译器功能。-fno-xxx 关闭对应选项？？？
-gxxx debug 相关
-mxxx 特定机器架构的选项

名称	含义
-Wall	打开常见的所有warning选项
-Werror	把warning当成error
-std=	C or C++ language standard. eg ‘c++11’ == ‘c++0x’ ‘c++17’ == ‘c++1z’, which ‘c++0x’,’c++17’ is develop codename
-Wunknown-pragmas	未知的pragma会报错（-Wno-unknown-pragmas 应该是相反的）
-fomit-frame-pointer	不生成栈帧指针,属于-O1优化
-Wstack-protector	没有防止堆栈崩溃的函数时warning (-fno-stack-protector)
-MMD	only user header files, not system header files.
-fexceptions	Enable exception handling.
-funwind-tables	Unwind tables contain debug frame information which is also necessary for the handling of such exceptions
-fasynchronous-unwind-tables	Generate unwind table in DWARF format. so it can be used for stack unwinding from asynchronous events
-fabi-version=n	Use version n of the C++ ABI. The default is version 0.(Version 2 is the version of the C++ ABI that first appeared in G++ 3.4, and was the default through G++ 4.9.) ABI: an application binary interface (ABI) is an interface between two binary program modules. Often, one of these modules is a library or operating system facility, and the other is a program that is being run by a user.
-fno-rtti	Disable generation of information about every class with virtual functions for use by the C++ run-time type identification features (dynamic_cast and typeid). If you don’t use those parts of the language, you can save some space by using this flag
-faligned-new	Enable support for C++17 new of types that require more alignment than `void* ::operator new(std::size_t)` provides. A numeric argument such as `-faligned-new=32` can be used to specify how much alignment (in bytes) is provided by that function, but few users will need to override the default of `alignof(std::max_align_t)`. This flag is enabled by default for `-std=c++17`.
-Wl, xxx	pass xxx option to linker, e.g., `-Wl,-R/staff/shaojiemike/github/MultiPIM_icarus0/common/libconfig/lib` specify a runtime library search path for dynamic libraries (shared libraries) during the linking process.

General Optimization Options

-O, -O2, -O3

-O3 turns on all optimizations specified by -O2

and also turns on the -finline-functions, -funswitch-loops, -fpredictive-commoning, -fgcse-after-reload, -ftree-loop-vectorize, -ftree-loop-distribute-patterns, -ftree-slp-vectorize, -fvect-cost-model, -ftree-partial-pre and -fipa-cp-clone options

-ffastmath

允许使用浮点计算获得更高的性能，但可能会略微降低精度。

-Ofast

更快但是有保证正确

-flto

（仅限 GNU）链接时优化，当程序链接时检查文件之间的函数调用的步骤。该标志必须用于编译和链接时。使用此标志的编译时间很长，但是根据应用程序，当与 -O* 标志结合使用时，可能会有明显的性能改进。这个标志和任何优化标志都必须传递给链接器，并且应该调用 gcc/g++/gfortran 进行链接而不是直接调用 ld。

-mtune=processor

此标志对特定处理器类型进行额外调整，但它不会生成额外的 SIMD 指令，因此不存在体系结构兼容性问题。调整将涉及对处理器缓存大小、首选指令顺序等的优化。

在 AMD Bulldozer 节点上使用的值为 bdver1，在 AMD Epyc 节点上使用的值为 znver2。是zen ver2的简称。

Optimization Options: 数据预取相关

-fprefetch-loop-arrays
1. 如果目标机器支持，生成预取内存的指令，以提高访问大数组的循环的性能。这个选项可能产生更好或更差的代码；结果在很大程度上取决于源代码中的循环结构。
2. -Os禁用

Optimization Options: 访存优化相关

https://zhuanlan.zhihu.com/p/496435946

下面没有特别指明都是O3，默认开启

调整数据的访问顺序

-ftree-loop-distribution
1. 允许将一个复杂的大循环，拆开成多个循环，各自可以继续并行和向量化
-ftree-loop-distribute-patterns
1. 类似上面一种？
-floop-interchange
1. 允许交换多层循环次序来连续访存
-floop-unroll-and-jam
1. 允许多层循环，将外循环按某种系数展开，并将产生的多个内循环融合。

代码段对齐

(不是计算访问的数据)

-falign-functions=n:m:n2:m2
1. Enabled at levels -O2, -O3.
  类似有一堆

调整代码块的布局

-freorder-blocks
1. 函数基本块重排来，减少分支

Optimization Options: Unroll Flags

-funroll-loops

Unroll loops whose number of iterations can be determined at compile time or upon entry to the loop. -funroll-loops implies -frerun-cse-after-loop. This option makes code larger, and may or may not make it run faster.

-funroll-all-loops

Unroll all loops, even if their number of iterations is uncertain when the loop is entered. This usually makes programs run more slowly. -funroll-all-loops implies the same options as -funroll-loops,

max-unrolled-insns

The maximum number of instructions that a loop should have if that loop is unrolled, and if the loop is unrolled, it determines how many times the loop code is unrolled.
如果循环被展开，则循环应具有的最大指令数，如果循环被展开，则它确定循环代码被展开的次数。

max-average-unrolled-insns

The maximum number of instructions biased by probabilities of their execution that a loop should have if that loop is unrolled, and if the loop is unrolled, it determines how many times the loop code is unrolled.
如果一个循环被展开，则根据其执行概率偏置的最大指令数，如果该循环被展开，则确定循环代码被展开的次数。

max-unroll-times

The maximum number of unrollings of a single loop.
单个循环的最大展开次数。

Optimization Options: SIMD Instructions

-march=native

会自动检测，但有可能检测不对。

-march=”arch”

这将为特定架构生成 SIMD 指令并应用 -mtune 优化。 arch 的有用值与上面的 -mtune 标志相同。

g++ -march=native -m32 ... -Q --help=target

-mtune=                               skylake-avx512 

 Known valid arguments for -march= option:
    i386 i486 i586 pentium lakemont pentium-mmx winchip-c6 winchip2 c3 samuel-2 c3-2 nehemiah c7 esther i686 pentiumpro pentium2 pentium3 pentium3m pentium-m pentium4 pentium4m prescott nocona core2 nehalem corei7 westmere sandybridge corei7-avx ivybridge core-avx-i haswell core-avx2 broadwell skylake skylake-avx512 cannonlake icelake-client icelake-server cascadelake tigerlake bonnell atom silvermont slm goldmont goldmont-plus tremont knl knm intel geode k6 k6-2 k6-3 athlon athlon-tbird athlon-4 athlon-xp athlon-mp x86-64 eden-x2 nano nano-1000 nano-2000 nano-3000 nano-x2 eden-x4 nano-x4 k8 k8-sse3 opteron opteron-sse3 athlon64 athlon64-sse3 athlon-fx amdfam10 barcelona bdver1 bdver2 bdver3 bdver4 znver1 znver2 btver1 btver2 generic native

-msse4.2 -mavx -mavx2 -march=core-avx2

dynamic flags

-fPIC

position-independent code(PIC)

需要进一步的研究学习

暂无

遇到的问题

暂无

开题缘由、总结、反思、吐槽~~

参考文献

https://blog.csdn.net/daidodo/article/details/2185222

https://www.bu.edu/tech/support/research/software-and-programming/programming/compilers/gcc-compiler-flags/

Posted 2023-10-10Updated 2025-01-30Tutorials3 minutes read (About 408 words)

GCC Compiler Option 2 : Preprocessor Options

-Mxxx

-M option is designed for auto-generate Makefile rules from g++ command.
默认包含-E option to STOP after preprocessor during the compilation
默认包含-w option to DISABLE/suppress all warnings.

Using a complex g++ command as an example:

1
2

g++ -Wall -Werror -Wno-unknown-pragmas -DPIN_CRT=1 -fno-stack-protector -fno-exceptions -funwind-tables -fasynchronous-unwind-tables -fno-rtti -DTARGET_IA32E -DHOST_IA32E -fPIC -DTARGET_LINUX -fabi-version=2 -faligned-new -I../../../source/include/pin -I../../../source/include/pin/gen -isystem /staff/shaojiemike/Download/pin-3.28-98749-g6643ecee5-gcc-linux/extras/cxx/include -isystem /staff/shaojiemike/Download/pin-3.28-98749-g6643ecee5-gcc-linux/extras/crt/include -isystem /staff/shaojiemike/Download/pin-3.28-98749-g6643ecee5-gcc-linux/extras/crt/include/arch-x86_64 -isystem /staff/shaojiemike/Download/pin-3.28-98749-g6643ecee5-gcc-linux/extras/crt/include/kernel/uapi -isystem /staff/shaojiemike/Download/pin-3.28-98749-g6643ecee5-gcc-linux/extras/crt/include/kernel/uapi/asm-x86 -I../../../extras/components/include -I../../../extras/xed-intel64/include/xed -I../../../source/tools/Utils -I../../../source/tools/InstLib -O3 -fomit-frame-pointer -fno-strict-aliasing -Wno-dangling-pointer 
-M inscount0.cpp -o Makefile_bk

In Makefile_bk

inscount0.o: inscount0.cpp \
 # sys header
 /usr/include/stdc-predef.h \
 /staff/shaojiemike/Download/pin-3.28-98749-g6643ecee5-gcc-linux/extras/cxx/include/iostream \
 /usr/lib/gcc/x86_64-linux-gnu/11/include/float.h
 # usr header
 ../../../source/include/pin/pin.H \
 ../../../extras/xed-intel64/include/xed/xed-interface.h \
 ... more header files

-MM not include sys header file
- e.g., the first 3 header will be disapear.
-MF filename config the Makefile rules write to which file instead of to stdout.
-M -MG is designed to generate Makefile rules when there is header file missing, treated it as generated in normal.
-M -MP will generated M-rules for dependency between header files
- e.g., header1.h includes header2.h. So header1.h: header2.h in Makefile
-MD == -M -MF file without default option -E
- the default file has a suffix of .d, e.g., inscount0.d for -c inscount0.cpp
-MMD == -MD not include sys header file

需要进一步的研究学习

暂无

遇到的问题

暂无

开题缘由、总结、反思、吐槽~~

参考文献

上面回答部分来自ChatGPT-3.5，没有进行正确性的交叉校验。

无

Posted 2023-08-07Updated 2025-01-30operating system11 minutes read (About 1680 words)

Memalloc

Buddy 内存分配

是一种用于管理计算机内存的算法，旨在有效地分配和释放内存块，以防止碎片化并提高内存的使用效率。这种算法通常用于操作系统中，以管理系统内核和进程的内存分配。

Buddy 内存分配算法的基本思想是将物理内存划分为大小相等的块，每个块大小都是 2 的幂次方。每个块可以分配给一个正在运行的进程或内核。当内存被分配出去后，它可以被分割成更小的块，或者合并成更大的块，以适应不同大小的内存需求。

算法的名称 “Buddy” 来自于分配的块之间的关系，其中一个块被称为 “buddy”，它是另一个块的大小相等的邻居。这种关系使得在释放内存时，可以尝试将相邻的空闲块合并成更大的块，从而减少内存碎片。

Buddy 内存分配算法的工作流程大致如下：

初始时，整个可用内存被视为一个大块，大小是 2 的幂次方。
当一个进程请求内存分配时，算法会搜索可用的块，找到大小合适的块来满足请求。如果找到的块比所需的稍大，它可以被分割成两个相等大小的 “buddy” 块，其中一个分配给请求的进程。
当一个进程释放内存时，该块会与其 “buddy” 块合并，形成一个更大的块。然后，这个更大的块可以与其它相邻的块继续合并，直到达到较大的块。

Buddy 内存分配算法在一些操作系统中用于管理内核和进程的物理内存，尤其在嵌入式系统和实时操作系统中，以提高内存使用效率和避免碎片化问题。

ucore（Micro-kernel Operating System for Education）

是一个用于教育目的的微内核操作系统

linux遇到问题

我们可window写程序占满16G内存

但是linux,用了3GB就会seg fault

猜想是不是有单进程内存限制 https://www.imooc.com/wenda/detail/570992

而且malloc alloc的空间在堆区，我们可以明显的发现这个空间是被栈区包住的，有限的。windows是如何解决这个问题的呢？

首先这个包住是虚拟地址，通过页表映射到的物理地址是分开的
根据第一点，可以实现高地址动态向上移动

动态数据区一般就是“堆栈”。“栈 (stack)”和“堆(heap)”是两种不同的动态数据区，栈是一种线性结构，堆是一种链式结构。进程的每个线程都有私有的“栈”，所以每个线程虽然代码一样，但本地变量的数据都是互不干扰。一个堆栈可以通过“基地址”和“栈顶”地址来描述。全局变量和静态变量分配在静态数据区，本地变量分配在动态数据区，即堆栈中。程序通过堆栈的基地址和偏移量来访问本地变量。

当进程初始化时，系统会自动为进程创建一个默认堆，这个堆默认所占内存的大小为1M。堆对象由系统进行管理，它在内存中以链式结构存在。

Linux 单进程内存限制

/etc/security/limits.conf

# shaojiemike @ node5 in ~ [6:35:51]
$ ulimit -a
-t: cpu time (seconds)              unlimited
-f: file size (blocks)              unlimited
-d: data seg size (kbytes)          unlimited
-s: stack size (kbytes)             8192
-c: core file size (blocks)         0
-m: resident set size (kbytes)      unlimited
-u: processes                       513967
-n: file descriptors                1024
-l: locked-in-memory size (kbytes)  65536
-v: address space (kbytes)          unlimited
-x: file locks                      unlimited
-i: pending signals                 513967
-q: bytes in POSIX msg queues       819200
-e: max nice                        0
-r: max rt priority                 0
-N 15:                              unlimited

ulimit -HSn 4096 # H指定了硬性大小，S指定了软性大小，n表示设定单个进程最大的打开文件句柄数量。硬限制是实际的限制，而软限制，是warnning限制，只会做出warning

lsof

文件描述符

文件句柄数

这些限制一般不会限制内存。

超算登录节点任务限制的实现

GNU malloc()

调用malloc(size_t size)函数分配内存成功，总会分配size字节VM（再次强调不是RAM），并返回一个指向刚才所分配内存区域的开端地址。分配的内存会为进程一直保留着，直到你显示地调用free()释放它（当然，整个进程结束，静态和动态分配的内存都会被系统回收）。

GNU libc库提供了二个内存分配函数,分别是malloc()和calloc()。glibc函数malloc()总是通过brk()或mmap()系统调用来满足内存分配需求。函数malloc()，根据不同大小内存要求来选择brk()，还是mmap()，阈值 MMAP_THRESHOLD=128Kbytes是临界值。小块内存(<=128kbytes)，会调用brk()，它将数据段的最高地址往更高处推（堆从底部向上增长）。大块内存，则使用mmap()进行匿名映射(设置标志MAP_ANONYMOUS)来分配内存，与堆无关，在堆之外。

malloc不是直接分配内存的，是第一次访问的时候才分配的？

https://www.zhihu.com/question/20836462

问题

堆区和栈区是进程唯一的吗？
1. 是的，而且栈主要是为一个线程配备，小可以保证基本在cache里
两个操作系统的malloc的是物理内存还是虚拟内存
Linux采用的是copy-on-write机制

需要进一步的研究学习

暂无

遇到的问题

暂无

开题缘由、总结、反思、吐槽~~

每次都是6008这里，40000*6008*3/1024/1024=687MB

733448/1024=716MB

问了大师兄，问题竟然是malloc的传入参数错误的类型是int,导致存不下3*40*1024*40*1024。应该用size_t类型。（size_t是跨平台的非负整数安全类型）

参考文献

https://blog.csdn.net/shenzi/article/details/3972437?utm_medium=distribute.pc_relevant_t0.none-task-blog-2%7Edefault%7EBlogCommendFromMachineLearnPai2%7Edefault-1.base&depth_1-utm_source=distribute.pc_relevant_t0.none-task-blog-2%7Edefault%7EBlogCommendFromMachineLearnPai2%7Edefault-1.base

程序（进程）内存空间分布深入理解

Posted 2023-07-26Updated 2025-01-30Architecturean hour read (About 6914 words)

Cache

导言

Cache is to reduce latency

Posted 2021-07-28Updated 2025-01-30Tutorialsa minute read (About 115 words)

AOCC

https://developer.amd.com/amd-aocc/

Install

cd <compdir>\
tar -xvf aocc-compiler-<ver>.tar
cd aocc-compiler-<ver>
bash install.sh
# It will install the compiler and displaythe AOCC setup instructions.

source <compdir>/setenv_AOCC.sh
# This will setup the shell environment for using AOCC C, C++, and Fortran compiler where the command is executed.

Using AOCC

Libraries

需要进一步的研究学习

暂无

遇到的问题

暂无

开题缘由、总结、反思、吐槽~~

参考文献

https://developer.amd.com/wp-content/resources/AOCC_57223_Install_Guide_Rev_3.1.pdf

Posted 2021-07-13Updated 2025-01-30Tutorialsa few seconds read (About 111 words)

homepage interesting upgrade options

homepage Live2d + Mouse click effects + background-music

在\themes\hugo-theme-minos\layouts路径下修改模板html即可

https://www.python87.com/p/881.html

https://github.com/stevenjoezhang/live2d-widget

https://apps.elfsight.com/panel/applications/background-music/

live2d

live2d : https://jingzhisheng.cn/blog/detail/1406456203487350784

https://l2d.alg-wiki.com/

https://github.com/alg-wiki/AzurLaneL2DViewer/tree/gh-pages/assets

pretty blogs

anime theme

需要进一步的研究学习

live2d more

遇到的问题

暂无

参考文献

无

手册

常见选项

General Optimization Options

-O, -O2, -O3

-ffastmath

-Ofast

-flto

-mtune=processor

Optimization Options: 数据预取相关

Optimization Options: 访存优化相关

调整数据的访问顺序

代码段对齐

调整代码块的布局

Optimization Options: Unroll Flags

-funroll-loops

-funroll-all-loops

max-unrolled-insns

max-average-unrolled-insns

max-unroll-times

Optimization Options: SIMD Instructions

-march=native

-march=”arch”

-msse4.2 -mavx -mavx2 -march=core-avx2

dynamic flags

-fPIC

需要进一步的研究学习

遇到的问题

开题缘由、总结、反思、吐槽~~

参考文献

-Mxxx

需要进一步的研究学习

遇到的问题

开题缘由、总结、反思、吐槽~~

参考文献

Buddy 内存分配

ucore（Micro-kernel Operating System for Education）

linux遇到问题

Linux 单进程内存限制

超算登录节点任务限制的实现

GNU malloc()

问题

需要进一步的研究学习

遇到的问题

开题缘由、总结、反思、吐槽~~

参考文献

Install

Using AOCC

Libraries

需要进一步的研究学习

遇到的问题

开题缘由、总结、反思、吐槽~~

参考文献

homepage Live2d + Mouse click effects + background-music

live2d

pretty blogs

需要进一步的研究学习

遇到的问题

参考文献

Categories

Subscribe for updates

follow.it

Links

Recents

Archives

Tags