SHAOJIE'S BOOK

Posted 2023-10-16Updated 2026-03-11tips36 minutes read (About 5375 words)

Top

导言

原本是想写些网站建设的计划(但别人关心的只是他们搜索的内容)。所以就讲讲所在的团队吧（每个人都关心自己的下一站在哪里）

Posted 2023-10-16Updated 2026-03-11tipsa minute read (About 211 words)

导言

Follow学术大牛，和阅读前沿技术博客是科研探索的乐趣所在。

Posted 2023-10-16Updated 2026-03-11ProjectRecorda minute read (About 175 words)

[DevLog] PLAN

导言

项目展望与机遇小结

241016

可做的内容(NPU支持)
- Pytorch官方库：torchao、torchtune、torchtitan
- Pytorch官方特性：原生支持并行能力，Dtensor
- Pytorch三方库
困境
- 新特性/库的代码不稳定，现在支持之后变动可能会很大
- 由于人手和机器不足，余下的三方库很多，但是支持的必要性有待验证。客户的呼声不明，没有特别急用的库(之前适配过了)。
- 提前跟进库的适配，如果跟进过多，会导致后续持续看护库的人员压力会很大。

Posted 2023-10-16Updated 2026-03-11toLearn3 minutes read (About 391 words)

Mkdocs

导言

mkdocs在今年支持了blog的基本功能，而且已经有探路者实践过了[^1]。也是时候升级博客生成器了。

Posted 2023-10-15Updated 2026-03-11Valuesa minute read (About 146 words)

Research outline

李向阳讲座(研一)

en	cn
Knowledgeable	知己知彼 Be Skeptical
Independent	自己拥有研究
Smart	透过观象看本质
Soft	三人行必有我师

科研，我们需要关注什么？

热点\痛点\盲点
有用、有桃战、可为的事情

实践

做项目的时候，一定要有测试程序跑，才能正向反馈。提高积极性，明确方向。

读论文

读论文，要多篇，提炼overview 抓住立足点，创新点和展望

Posted 2023-10-15Updated 2026-03-11Algorithms4 minutes read (About 587 words)

Concurrent-AMAT, C-AMAT

并发式平均存储访问时间模型

APC

侧重于测量方法的研究，给出了并发存储的测量方法和尺度。

在 APC 中，周期是存储活动周期 (memory
active cycle)，不是通用的 CPU 周期，所以 APC 也叫 APMAC（存储活动周期平均访问数， access per
memory active cycle）。

同时 APC 采用重叠 (overlapping) 的访存时间统计方法：在有两个或多个存储访
问同时进行时，周期只增加一次。

Posted 2023-10-15Updated 2026-03-11Algorithms2 minutes read (About 341 words)

HPL.dat file detailed explanation

HPL.dat

title

HPLinpack benchmark input file
Innovative Computing Laboratory, University of Tennessee
HPL.out      output file name (if any) 输出文件名
6            device out (6=stdout,7=stderr,file)

1            # of problems sizes (N) = sqrt((Memory Size in Gbytes * 1024 * 1024 * 1024) /8) * ratio
11136         Ns 矩阵规模

1            # of NBs block sizes (64~512)In the [96,104,112,120,128, …, 256] range; the multiple of 64
96           NBs 矩阵分块方法

0            PMAP process mapping (0=Row-,1=Column-major) 选择处理器阵列是按列的排列方式还是按行的排列方式。

1            # of process grids (P x Q)
2            Ps # two-dimensional block-cyclic data distribution = amount of processes; P≤Q
2            Qs 二维处理器网格（P×Q）

16.0         threshold 阈值

1            # of panel fact # 后面是L分解的方式
2            PFACTs (0=left, 1=Crout, 2=Right)
1            # of recursive stopping criterium
4            NBMINs (>= 1)
1            # of panels in recursion
2            NDIVs
1            # of recursive panel fact.
1            RFACTs (0=left, 1=Crout, 2=Right)
1            # of broadcast
1            BCASTs (0=1rg,1=1rM,2=2rg,3=2rM,4=Lng,5=LnM)
1            # of lookahead depth
1            DEPTHs (>=0)
2            SWAP (0=bin-exch,1=long,2=mix)
64           swapping threshold
0            L1 in (0=transposed,1=no-transposed) form
0            U  in (0=transposed,1=no-transposed) form
1            Equilibration (0=no,1=yes)
8            memory alignment in double (> 0)

合适的参数参考

Posted 2023-10-15Updated 2026-03-11Tutorials15 minutes read (About 2203 words)

Python Graph & Visualization

数据科学

matplotlib

基础语法与概念

plotly

matplotlib VS plotly

matplotlib can use custom color and line style
plotly is more easily to quickly built up.

基础语法与概念

线性颜色柱的选择

https://plotly.com/python/builtin-colorscales/

same in matplotlib

plt.show 转发图像到本地

使用Dash: A web application framework for your data., 默认部署在localhost:8050端口

本地机器打通ssh隧道

1	ssh -L 8050:127.0.0.1:8050 -vN -f -l shaojiemike 202.38.72.23

科研画图

In scientific research plotting, it’s always bar charts instead of line charts.

Possible reasons:

The visual footprint of point lines is relatively small compared to bar
Line charts are aggregated together when there is a meaningful comparison of data along the y-axis.

global setting

font things, Attention, global settings have higher priority to get work


matplotlib.rcParams.update({
    "pgf.texsystem": "pdflatex",
    'font.family': 'serif',
    # 'text.usetex': True, # comment to support bold font in legend, and font will be bolder
    # 'pgf.rcfonts': False,
})

layout

picture size

figure size (adjust x,y tick distance)

# mpl
fig.set_size_inches(w= 0.5 * x_count * (group_count+1.5), h=5.75 * 0.8) #(8, 6.5)

# plotly
fig.update_layout(height=350, width=50 * graph_data.size(),
                  margin=dict(b=10, t=10, l=20, r=5),
                  bargap=0.2 # x tick distance, 1 is the normalize-distance of adjacent bars
                  )

relative position

1 2	# mpl: Adjust the left margin to make room for the legend, left & right chart vertical line position from [0,1] plt.subplots_adjust(left=0.1, right=0.8)

set x axis

# mpl:
x = np.arange(len(graph_data.x))  # the label locations, [0, 1, 2, 3, 4, 5, 6, 7]
# set the bar move to arg1 with name arg2
ax.set_xticks(x + (group_count-1)*0.5*width, graph_data.x)
plt.xticks(fontsize=graph_data.fontsize+4)
# Adjust x-axis limits to narrow the gap
plt.xlim(-(0.5+gap_count)*width, 
        x_count - 1 + (group_count-1)*width + (0.5+gap_count)*width)
    

# plotly

set y axis

vertical grid line
dick size
and range size

# mpl
ax.set_ylabel(yaxis_title, 
              fontsize=graph_data.fontsize,
              fontweight='bold')
plt.grid(True, which='major',axis='y', zorder=-1.0) # line and bar seems need to set zorder=10 to cover it
plt.yticks(np.arange(0, max_y ,10), fontsize=graph_data.fontsize) # step 10
plt.yscale('log',base=10) # or plt.yscale('linear')
ax.set_ylim(0.1, max_y)
## highlight selected y-label: https://stackoverflow.com/questions/73597796/make-one-y-axis-label-bold-in-matplotlib

# plotly
fig.update_layout(
      yaxis_range=[0,maxY],                   
      yaxis=dict(
          rangemode='tozero',  # Set the y-axis rangemode to 'tozero'
          dtick=maxY/10,
          gridcolor='rgb(196, 196, 196)', # grey
          gridwidth=1,
      ),
)

Bolden Contour Lines

Entire Figure

# mpl ?

# plotly: Add a rectangle shape to cover the entire subplot area
fig.add_shape(type="rect",
            xref="paper", yref="paper",
            x0=0, y0=0,
            x1=1, y1=1,
            line=dict(color="black", width=0.7))

and Each Bar Within

# mpl: Create a bar chart with bold outlines
plt.bar(categories, values, edgecolor='black', linewidth=2)

# plotly: ?

legend box

# mpl: 
ax.legend(loc='upper left', ncols=3, fontsize=graph_data.fontsize)
# legend out-of-figure, (1.02, 1) means anchor is upper right corner
plt.legend(loc='upper left', bbox_to_anchor=(1.02, 1), borderaxespad=0)

# Calculate the legend width based on the figure width
fig_size = plt.gcf().get_size_inches()
fig_width = fig_size[0]
# Move the legend to the center above the ceiling line
plt.legend(loc='upper center', bbox_to_anchor=(0.5, 1.1), 
  ncol=2,  # ncol=2 to have labels in one line
  frameon=False,  # frameon=False removes the legend box outline
  columnspacing=fig_width, # distance between each label
  handlelength=1.0, # label-box width (unit is text fontsize)
  handleheight=1.0, # label-box heigh (unit is text fontsize)
  prop={'size': 20, 'weight': 'bold'} # text fontsize          
)

bar

# mpl: white hugo hatch with black bar edge.
# Problem: because the bug of mpl. hatch color follow the edge color
# Solved: draw bar twice, first the white hugo hatch with white bar edge. Second empty figure with black bar edge.
# white doubel ref: https://stackoverflow.com/questions/38168948/how-to-decouple-hatch-and-edge-color-in-matplotlib
# ref2: https://stackoverflow.com/questions/71424924/how-to-change-the-edge-color-of-markers-patches-in-matplotlib-legend
color_palette = ['black', (193/255, 1/255, 1/255), 'w', (127/255, 126/255, 127/255), 'blue']
pattern_list = ["", "/", "+", "\\", "x"]
edgecolor_list = ['w', 'w', (0/255, 176/255, 80/255), 'w', 'w']
    
ax.bar(x + offset, measurement, width, label=species_name,
                       color=color_palette[idx],
                        hatch = pattern_list[idx],
                        edgecolor=edgecolor_list[idx],
                        linewidth=1,
                     )
ax.bar(x + offset, measurement, width, label=species_name,
            color = "none",
            edgecolor='black', linewidth=1,
              )
# related legend: https://stackoverflow.com/questions/71424924/how-to-change-the-edge-color-of-markers-patches-in-matplotlib-legend
handles1, labels1 = ax.get_legend_handles_labels()
plt.legend([handles1[2*idx]+handles1[2*idx+1] for idx in range(group_count)], 
            [labels1[2*idx] for idx in range(group_count)],  
            loc='upper center', bbox_to_anchor=(0.5, 1.12), 
            ncol=group_count,  # have labels in one line
            frameon=False,
            # bbox_transform=plt.gcf().transFigure,
            columnspacing=legend_width, 
            # handlelength=1.0, 
            handleheight=1.2, 
            prop={'size': graph_data.fontsize,
                'weight': 'heavy'}
) 


# plotly: find the overflow
overflow_pattern = ["/" if y > maxY  else "" for y in entry[1]]
fig.add_bar(x=x,y=yList, 
    name=barName, 
    marker=dict(
        color=color_list[i],
        pattern_shape = overflow_pattern,
        line=dict(color='black', width=2)
        ),
    textfont=dict(size=graph_data.fontsize),
)
# legend 
legend_num = len( barDict.items())
fig.update_layout(barmode="relative", 

        # legend_title="Legend Title",
        legend = dict(
          entrywidthmode='fraction', # https://plotly.com/python/legend/
          entrywidth= 0.2,
          x=0.5 - 0.5 * legend_num * 0.2,        # Set x to 0.5 for the center
          y=1.2,       # Set y to a value greater than 1 to move it above the plot
          orientation="h",  # Display legend items in a single line
        ),
)

Out-box text

To draw symmetry chart, we need to special highlight the overflow bar number.

If the ancher point locate in the plot box, it’s easy to show text above the ceil line using textposition="bottom" like option. In the opposite scenario, plotly and mathplotlib all will hide the out-box text.

# plotly
fig.add_annotation(
            x=[x[0][i],x[1][i]],  # 注释的 x 坐标为 "bc"
            y=min(maxY,entry),  # 注释的 y 坐标为该列的最大值
            text=f"{entry:.2f}",  # 注释的文本内容
            # valign = "bottom", # text position in text box(default invisible)
            yanchor = "bottom", # text box position relative to anchor
            showarrow=False,  # 显示箭头
            # bgcolor="rgba(255, 255, 255, 0.8)",  # 注释框背景颜色
            font=dict(size=graph_data.fontsize+2)  # 注释文本字体大小
        )

# mathplotlib
# Create labels for overflowed values
for i, value in enumerate(values):
    if value > maxY:
        ax.annotate(f'Overflow: {value:.2f}', (i, maxY), ha='center', va='bottom', fontsize=12)

But mlb can write text out box.

ax.text(1, -1.6, 'Increasing', ha="center")

# first parameter is text, xy is the anchor point, xytext is the text, 
# xytext 2 xy is a relative distance
ax.annotate('yahaha', xy=(0, -0.1), xycoords='axes fraction', xytext=(1, -0.1))

out-box line

mathplotlib(mpl) can achieve this using ref, but there are few blogs about plotly.

# mpl: from 1*1 size full-graph (0.5,0.2) to point (1,0.8)
# transform=gcf().transFigure : gcf() stands for "get current figure," and .transFigure indicates that the coordinates provided in [0.5, 0.5], [0, 1] are in figure-relative coordinates. This means that the line's position is defined relative to the entire figure, not just the axes
# clip_on=False : This setting means that the line is not clipped at the edges of the axes. It allows the line to extend beyond the axes' boundaries.
from pylab import *  
plot([0.5, 1], [0.2, 0.8], color='lightgreen', linestyle='--', lw=1 ,transform=gcf().transFigure, clip_on=False)

# mpl: arrow from xy 2 xytext
# xycoords='figure fraction' to Add annotation to the figure (full graph)
ax.annotate('', xy=(0, -0.1), xycoords='axes fraction', xytext=(1, -0.1),\
arrowprops=dict(arrowstyle="->", color='violet'))

in-box line

# mpl:
ax.axhline(y=12, color='red', linestyle='--', label='Horizontal Line at y=12')
ax.axvline(x=3, color='green', linestyle='-.', label='Vertical Line at x=3')

# plotly ref: https://plotly.com/python/horizontal-vertical-shapes/
fig.add_vline(x=2.5, line_width=3, line_dash="dash", line_color="green") # dot
fig.add_hline(y=0.9)

实践

气泡图

二维无向图

教程

3D图

dash + plotly

如果防火墙是关闭的，你可以直接部署在external address上。使用docker也是可行的办法

1	app.run_server(debug=True, host='202.38.72.23')

3D 散点图，拟合曲面与网格图

实际案例

折线图

import matplotlib.pyplot as plt

X = [str(i) for i in metricValue]
Y = accuracyResult

# 设置图片大小
fig, ax = plt.subplots(figsize=(10, 6))  # 指定宽度为10英寸，高度为6英寸

plt.plot(X, Y, marker='o')
plt.xlabel('Threshold Percentage(%)', fontsize=12)  # 设置x轴标签字体大小为12
plt.ylabel('Average Execution Time of Static Method', fontsize=12)  # 设置y轴标签字体大小为12
plt.title('Tuning load store pressure', fontsize=14)  # 设置标题字体大小为14

for i in range(len(X)):
    plt.text(X[i], Y[i], Y[i], 
        fontsize=10, # 设置文本字体大小为10
        ha='center', # 设置水平对齐方式
        va='bottom')  # 设置垂直对齐方式

# 保存图片
plt.savefig(glv._get("resultPath") + f"tuning/{tuningLabel}/loadStorePressure.png", dpi=300)  # 设置dpi为300，可调整保存图片的分辨率

plt.show()  # 显示图片
plt.close()

Heatmap

实例

Stacked Bar & grouped compare bar

Example

Error Bars

在柱状图中，用于表示上下浮动的元素通常被称为“误差条”（Error Bars）。误差条是用于显示数据点或柱状图中的不确定性或误差范围的线条或线段。它们在柱状图中以垂直方向延伸，可以显示上下浮动的范围，提供了一种可视化的方式来表示数据的变化或不确定性。误差条通常通过标准差、标准误差、置信区间或其他统计指标来计算和表示数据的浮动范围。

Errorbars + StackedBars stacked 的过程中由于向上的error线的会被后面的Bar遮盖，然后下面的error线由于arrayminus=[i-j for i,j in zip(sumList,down_error)]导致大部分时间说负值，也不会显示。

fig = go.Figure()

 # color from https://stackoverflow.com/questions/68596628/change-colors-in-100-stacked-barchart-plotly-python
 color_list = ['rgb(29, 105, 150)', \
             'rgb(56, 166, 165)', \
    'rgb(15, 133, 84)',\
          'rgb(95, 70, 144)']
 sumList = [0 for i in range(len(x[0]))]
 for i, entry in enumerate( barDict.items()):
  barName=entry[0]
  yList = entry[1]
  ic(sumList,yList)
  sumList = [x + y for x, y in zip(yList, sumList)]
  fig.add_bar(x=x,y=yList, 
              name=barName, 
              text =[f'{val:.2f}' for val in yList], 
              textposition='inside',
              marker=dict(color=color_list[i]),
              error_y=dict(
    type='data',
    symmetric=False,
    color='purple',
    array=[i-j for i,j in zip(up_error,sumList)],
    arrayminus=[i-j for i,j in zip(sumList,down_error)],
       thickness=2, width=10),
              textfont=dict(size=8)
  )

Candlestick

类似股票上下跳动的浮标被称为”Candlestick”（蜡烛图）或”OHLC”（开盘-最高-最低-收盘）图表。

需要进一步的研究学习

暂无

遇到的问题

暂无

开题缘由、总结、反思、吐槽~~

参考文献

上面回答部分来自ChatGPT-3.5，暂时没有校验其可靠性(看上去貌似说得通)。

[1] Saket, B., Endert, A. and Demiralp, Ç., 2018. Task-based effectiveness of basic visualizations.IEEE transactions on visualization and computer graphics,25(7), pp.2505-2512.

Posted 2023-10-14Updated 2026-03-11network36 minutes read (About 5351 words)

TCP & UDP

TCP和UDP的区别？

对比如下：

	UDP	TCP
是否连接	无连接	面向连接
是否可靠	不可靠传输，不使用流量控制和拥塞控制	可靠传输，使用流量控制和拥塞控制
是否有序	无序	有序，消息在传输过程中可能会乱序，TCP 会重新排序
传输速度	快	慢
连接对象个数	支持一对一，一对多，多对一和多对多交互通信	只能是一对一通信
传输方式	面向报文	面向字节流
首部开销	首部开销小，仅8字节	首部最小20字节，最大60字节
适用场景	适用于实时应用（IP电话、视频会议、直播等）	适用于要求可靠传输的应用，例如文件传输

总结：

TCP 用于在传输层有必要实现可靠传输的情况，
UDP 用于对高速传输和实时性有较高要求的通信。
TCP 和 UDP 应该根据应用目的按需使用。
注意虽然UDP传输时是无序的，但每个报文还是包含有报文序号，以便接收后排序

UDP 和 TCP 对应的应用场景是什么？

TCP 是面向连接，能保证数据的可靠性交付，因此经常用于：

FTP文件传输
HTTP / HTTPS

UDP 面向无连接，它可以随时发送数据，再加上UDP本身的处理既简单又高效，因此经常用于：

包总量较少的通信，如 DNS 、SNMP等
视频、音频等多媒体通信
广播通信

TCP协议如何保证可靠性？

TCP主要提供了检验和、序列号/确认应答、超时重传、滑动窗口、拥塞控制和流量控制等方法实现了可靠性传输。

检验和：通过检验和的方式，接收端可以检测出来数据是否有差错和异常，假如有差错就会直接丢弃TCP段，重新发送。
序列号/确认应答：
序列号的作用不仅仅是应答的作用，有了序列号能够将接收到的数据根据序列号排序，并且去掉重复序列号的数据。
TCP传输的过程中，每次接收方收到数据后，都会对传输方进行确认应答。也就是发送ACK报文，这个ACK报文当中带有对应的确认序列号，告诉发送方，接收到了哪些数据，下一次的数据从哪里发。
滑动窗口：滑动窗口既提高了报文传输的效率，也避免了发送方发送过多的数据而导致接收方无法正常处理的异常。
超时重传：超时重传是指发送出去的数据包到接收到确认包之间的时间，如果超过了这个时间会被认为是丢包了，需要重传。最大超时时间是动态计算的。
拥塞控制：在数据传输过程中，可能由于网络状态的问题，造成网络拥堵，此时引入拥塞控制机制，在保证TCP可靠性的同时，提高性能。
流量控制：
- 如果主机A 一直向主机B发送数据，不考虑主机B的接受能力，则可能导致主机B的接受缓冲区满了而无法再接受数据，从而会导致大量的数据丢包，引发重传机制。
- 而在重传的过程中，若主机B的接收缓冲区情况仍未好转，则会将大量的时间浪费在重传数据上，降低传送数据的效率。
- 所以引入流量控制机制，主机B通过告诉主机A自己接收缓冲区的大小，来使主机A控制发送的数据量。流量控制与TCP协议报头中的窗口大小有关。

详细讲一下拥塞控制？

TCP 一共使用了四种算法来实现拥塞控制：

**慢开始(slow-start)**：不要一开始就发送大量的数据，由小到大逐渐增加拥塞窗口的大小。
**拥塞避免(congestion avoidance)**：
- 拥塞避免算法让拥塞窗口缓慢增长，即每经过一个往返时间RTT就把发送方的拥塞窗口cwnd加1而不是加倍。
- 这样拥塞窗口按线性规律缓慢增长。
- 发送方维持一个叫做拥塞窗口cwnd（congestion window）的状态变量。当cwnd ssthresh时，改用拥塞避免算法。
**快重传(fast retransmit)**：
- 我们可以剔除一些不必要的拥塞报文，提高网络吞吐量。
- 比如接收方在收到一个失序的报文段后就立即发出重复确认，而不要等到自己发送数据时捎带确认。
- 快重传规定：发送方只要一连收到三个重复确认就应当立即重传对方尚未收到的报文段，而不必继续等待设置的重传计时器时间到期。
**快恢复(fast recovery)**：主要是配合快重传。
- 当发送方连续收到三个重复确认时，就执行“乘法减小”算法，把ssthresh门限减半（为了预防网络发生拥塞），但接下来并不执行慢开始算法，
- 因为如果网络出现拥塞的话就不会收到好几个重复的确认，收到三个重复确认说明网络状况还可以。

详细介绍一下 TCP 的三次握手机制？

图片来自：https://juejin.cn/post/6844904005315854343

三次握手机制：

第一次握手：
- 客户端请求建立连接，向服务端发送一个同步报文（SYN=1），
- 同时选择一个随机数 seq = x 作为初始序列号，
- 并进入SYN_SENT状态，等待服务器确认。
第二次握手：：
- 服务端收到连接请求报文后，如果同意建立连接，则向客户端发送同步确认报文（SYN=1，ACK=1），
- 确认号为 ack = x + 1，同时选择一个随机数 seq = y 作为初始序列号，
- 此时服务器进入SYN_RECV状态。
第三次握手：
- 客户端收到服务端的确认后，向服务端发送一个确认报文（ACK=1），
- 确认号为 ack = y + 1，序列号为 seq = x + 1，
- 客户端和服务器进入ESTABLISHED状态，完成三次握手。

理想状态下，TCP连接一旦建立，在通信双方中的任何一方主动关闭连接之前，TCP 连接都将被一直保持下去。

为什么需要三次握手，而不是两次？

主要有三个原因：

防止已过期的连接请求报文突然又传送到服务器，因而产生错误和资源浪费。

在双方两次握手即可建立连接的情况下，假设客户端发送 A 报文段请求建立连接，由于网络原因造成 A 暂时无法到达服务器，服务器接收不到请求报文段就不会返回确认报文段。

客户端在长时间得不到应答的情况下重新发送请求报文段 B，这次 B 顺利到达服务器，服务器随即返回确认报文并进入 ESTABLISHED 状态，客户端在收到确认报文后也进入 ESTABLISHED 状态，双方建立连接并传输数据，之后正常断开连接。

此时姗姗来迟的 A 报文段才到达服务器，服务器随即返回确认报文并进入 ESTABLISHED 状态，但是已经进入 CLOSED 状态的客户端无法再接受确认报文段，更无法进入 ESTABLISHED 状态，这将导致服务器长时间单方面等待，造成资源浪费。
三次握手才能让双方均确认自己和对方的发送和接收能力都正常。

第一次握手：客户端只是发送处请求报文段，什么都无法确认，而服务器可以确认自己的接收能力和对方的发送能力正常；

第二次握手：客户端可以确认自己发送能力和接收能力正常，对方发送能力和接收能力正常；

第三次握手：服务器可以确认自己发送能力和接收能力正常，对方发送能力和接收能力正常；

可见三次握手才能让双方都确认自己和对方的发送和接收能力全部正常，这样就可以愉快地进行通信了。
告知对方自己的初始序号值，并确认收到对方的初始序号值。

TCP 实现了可靠的数据传输，原因之一就是 TCP 报文段中维护了序号字段和确认序号字段，通过这两个字段双方都可以知道在自己发出的数据中，哪些是已经被对方确认接收的。这两个字段的值会在初始序号值得基础递增，如果是两次握手，只有发起方的初始序号可以得到确认，而另一方的初始序号则得不到确认。

为什么要三次握手，而不是四次？

因为三次握手已经可以确认双方的发送接收能力正常，双方都知道彼此已经准备好，而且也可以完成对双方初始序号值得确认，也就无需再第四次握手了。

第一次握手：服务端确认“自己收、客户端发”报文功能正常。
第二次握手：客户端确认“自己发、自己收、服务端收、客户端发”报文功能正常，客户端认为连接已建立。
第三次握手：服务端确认“自己发、客户端收”报文功能正常，此时双方均建立连接，可以正常通信。

什么是 SYN洪泛攻击？如何防范？

SYN洪泛攻击属于 DOS 攻击的一种，它利用 TCP 协议缺陷，通过发送大量的半连接请求，耗费 CPU 和内存资源。

原理：

在三次握手过程中，服务器发送 [SYN/ACK] 包（第二个包）之后、收到客户端的 [ACK] 包（第三个包）之前的 TCP 连接称为半连接（half-open connect），
- 此时服务器处于 SYN_RECV（等待客户端响应）状态。如果接收到客户端的 [ACK]，则 TCP 连接成功，
- 如果未接受到，则会不断重发请求直至成功。
SYN 攻击的攻击者在短时间内伪造大量不存在的 IP 地址，向服务器不断地发送 [SYN] 包，服务器回复 [SYN/ACK] 包，并等待客户的确认。由于源地址是不存在的，服务器需要不断的重发直至超时。
这些伪造的 [SYN] 包将长时间占用未连接队列，影响了正常的 SYN，导致目标系统运行缓慢、网络堵塞甚至系统瘫痪。

检测：当在服务器上看到大量的半连接状态时，特别是源 IP 地址是随机的，基本上可以断定这是一次 SYN 攻击。

防范：

通过防火墙、路由器等过滤网关防护。
通过加固 TCP/IP 协议栈防范，如增加最大半连接数，缩短超时时间。
SYN cookies技术。SYN Cookies 是对 TCP 服务器端的三次握手做一些修改，专门用来防范 SYN 洪泛攻击的一种手段。

三次握手连接阶段，最后一次ACK包丢失，会发生什么？

服务端：

第三次的ACK在网络中丢失，那么服务端该TCP连接的状态为SYN_RECV,并且会根据 TCP的超时重传机制，会等待3秒、6秒、12秒后重新发送SYN+ACK包，以便客户端重新发送ACK包。
如果重发指定次数之后，仍然未收到客户端的ACK应答，那么一段时间后，服务端自动关闭这个连接。

客户端：

客户端认为这个连接已经建立，如果客户端向服务端发送数据，服务端将以RST包（Reset，标示复位，用于异常的关闭连接）响应。此时，客户端知道第三次握手失败。

详细介绍一下 TCP 的四次挥手过程？

图片来源：https://juejin.im/post/5ddd1f30e51d4532c42c5abe

第一次挥手：客户端向服务端发送连接释放报文（FIN=1，ACK=1），主动关闭连接，同时等待服务端的确认。
- 序列号 seq = u，即客户端上次发送的报文的最后一个字节的序号 + 1
- 确认号 ack = k, 即服务端上次发送的报文的最后一个字节的序号 + 1
第二次挥手：服务端收到连接释放报文后，立即发出确认报文（ACK=1），序列号 seq = k，确认号 ack = u + 1。

这时 TCP 连接处于半关闭状态，即客户端到服务端的连接已经释放了，但是服务端到客户端的连接还未释放。这表示客户端已经没有数据发送了，但是服务端可能还要给客户端发送数据。
第三次挥手：服务端向客户端发送连接释放报文（FIN=1，ACK=1），主动关闭连接，同时等待 A 的确认。
- 序列号 seq = w，即服务端上次发送的报文的最后一个字节的序号 + 1。
- 确认号 ack = u + 1，与第二次挥手相同，因为这段时间客户端没有发送数据
第四次挥手：客户端收到服务端的连接释放报文后，立即发出确认报文（ACK=1），序列号 seq = u + 1，确认号为 ack = w + 1。

此时，客户端就进入了 TIME-WAIT 状态。注意此时客户端到 TCP 连接还没有释放，必须经过 2*MSL（最长报文段寿命）的时间后，才进入 CLOSED 状态。而服务端只要收到客户端发出的确认，就立即进入 CLOSED 状态。可以看到，服务端结束 TCP 连接的时间要比客户端早一些。

为什么连接的时候是三次握手，关闭的时候却是四次握手？

服务器在收到客户端的 FIN 报文段后，可能还有一些数据要传输，所以不能马上关闭连接，但是会做出应答，返回 ACK 报文段.

接下来可能会继续发送数据，在数据发送完后，服务器会向客户单发送 FIN 报文，表示数据已经发送完毕，请求关闭连接。服务器的ACK和FIN一般都会分开发送，从而导致多了一次，因此一共需要四次挥手。

为什么客户端的 TIME-WAIT 状态必须等待 2MSL ？

主要有两个原因：

确保 ACK 报文能够到达服务端，从而使服务端正常关闭连接。

第四次挥手时，客户端第四次挥手的 ACK 报文不一定会到达服务端。服务端会超时重传 FIN/ACK 报文，此时如果客户端已经断开了连接，那么就无法响应服务端的二次请求，这样服务端迟迟收不到 FIN/ACK 报文的确认，就无法正常断开连接。

MSL 是报文段在网络上存活的最长时间。客户端等待 2MSL 时间，即「客户端 ACK 报文 1MSL 超时 + 服务端 FIN 报文 1MSL 传输」，就能够收到服务端重传的 FIN/ACK 报文，然后客户端重传一次 ACK 报文，并重新启动 2MSL 计时器。如此保证服务端能够正常关闭。

如果服务端重发的 FIN 没有成功地在 2MSL 时间里传给客户端，服务端则会继续超时重试直到断开连接。
防止已失效的连接请求报文段出现在之后的连接中。

TCP 要求在 2MSL 内不使用相同的序列号。客户端在发送完最后一个 ACK 报文段后，再经过时间 2MSL，就可以保证本连接持续的时间内产生的所有报文段都从网络中消失。这样就可以使下一个连接中不会出现这种旧的连接请求报文段。或者即使收到这些过时的报文，也可以不处理它。

如果已经建立了连接，但是客户端出现故障了怎么办？

或者说，如果三次握手阶段、四次挥手阶段的包丢失了怎么办？如“服务端重发 FIN丢失”的问题。

简而言之，通过定时器 + 超时重试机制，尝试获取确认，直到最后会自动断开连接。

具体而言，TCP 设有一个保活计时器。服务器每收到一次客户端的数据，都会重新复位这个计时器，时间通常是设置为 2 小时。若 2 小时还没有收到客户端的任何数据，服务器就开始重试：每隔 75 分钟发送一个探测报文段，若一连发送 10 个探测报文后客户端依然没有回应，那么服务器就认为连接已经断开了。

TIME-WAIT 状态过多会产生什么后果？怎样处理？

从服务器来讲，短时间内关闭了大量的Client连接，就会造成服务器上出现大量的TIME_WAIT连接，严重消耗着服务器的资源，此时部分客户端就会显示连接不上。

从客户端来讲，客户端TIME_WAIT过多，就会导致端口资源被占用，因为端口就65536个，被占满就会导致无法创建新的连接。

解决办法：

服务器可以设置 SO_REUSEADDR 套接字选项来避免 TIME_WAIT状态，此套接字选项告诉内核，即使此端口正忙（处于
TIME_WAIT状态），也请继续并重用它。

调整系统内核参数，修改/etc/sysctl.conf文件，即修改net.ipv4.tcp_tw_reuse 和 tcp_timestamps

1
2

net.ipv4.tcp_tw_reuse = 1 表示开启重用。允许将TIME-WAIT sockets重新用于新的TCP连接，默认为0，表示关闭；
net.ipv4.tcp_tw_recycle = 1 表示开启TCP连接中TIME-WAIT sockets的快速回收，默认为0，表示关闭。

强制关闭，发送 RST 包越过TIME_WAIT状态，直接进入CLOSED状态。

TIME_WAIT 是服务器端的状态?还是客户端的状态?

TIME_WAIT 是主动断开连接的一方会进入的状态，一般情况下，都是客户端所处的状态;服务器端一般设置不主动关闭连接。

TIME_WAIT 需要等待 2MSL，在大量短连接的情况下，TIME_WAIT会太多，这也会消耗很多系统资源。对于服务器来说，在 HTTP 协议里指定 KeepAlive（浏览器重用一个 TCP 连接来处理多个 HTTP 请求），由浏览器来主动断开连接，可以一定程度上减少服务器的这个问题。

阿里云秋招笔试题的一个选择题

现在有一个数据部分长度为8192B的数据需要通过UDP在以太网上传播，经过分片化为多个IP数据报片，这些片中的数据部分长度都有哪些?(假设按照最大长度分片)，计网习题

需要进一步的研究学习

暂无

遇到的问题

暂无

开题缘由、总结、反思、吐槽~~

参考文献

无

Posted 2023-10-12Updated 2026-03-11Tutorials12 minutes read (About 1771 words)

GCC Compiler Option 1 : Optimization Options

手册

全体选项其中一部分是Optimize-Options

# 会列出可选项
g++ -march=native -m32 ... -Q --help=target 
# 会列出O3默认开启和关闭选项
g++ -O3 -Q --help=optimizers

编译时最好按照其分类有效组织, 例子如下：

g++ 
# Warning Options
-Wall -Werror -Wno-unknown-pragmas -Wno-dangling-pointer 
# Program Instrumentation Options
-fno-stack-protector
# Code-Gen-Options
-fno-exceptions -funwind-tables -fasynchronous-unwind-tables
# C++ Dialect
-fabi-version=2 -faligned-new -fno-rtti
# define
-DPIN_CRT=1 -DTARGET_IA32E -DHOST_IA32E -fPIC -DTARGET_LINUX 
# include
-I../../../source/include/pin 
-I../../../source/include/pin/gen 
-isystem /staff/shaojiemike/Download/pin-3.28-98749-g6643ecee5-gcc-linux/extras/cxx/include 
-isystem /staff/shaojiemike/Download/pin-3.28-98749-g6643ecee5-gcc-linux/extras/crt/include 
-isystem /staff/shaojiemike/Download/pin-3.28-98749-g6643ecee5-gcc-linux/extras/crt/include/arch-x86_64 
-isystem /staff/shaojiemike/Download/pin-3.28-98749-g6643ecee5-gcc-linux/extras/crt/include/kernel/uapi 
-isystem /staff/shaojiemike/Download/pin-3.28-98749-g6643ecee5-gcc-linux/extras/crt/include/kernel/uapi/asm-x86 
-I../../../extras/components/include 
-I../../../extras/xed-intel64/include/xed 
-I../../../source/tools/Utils 
-I../../../source/tools/InstLib 
# Optimization Options
-O3 -fomit-frame-pointer -fno-strict-aliasing 
-c -o obj-intel64/inscount0.o inscount0.cpp

常见选项

-Wxxx 对 xxx 启动warning，
-fxxx 启动xxx的编译器功能。-fno-xxx 关闭对应选项？？？
-gxxx debug 相关
-mxxx 特定机器架构的选项

名称	含义
-Wall	打开常见的所有warning选项
-Werror	把warning当成error
-std=	C or C++ language standard. eg ‘c++11’ == ‘c++0x’ ‘c++17’ == ‘c++1z’, which ‘c++0x’,’c++17’ is develop codename
-Wunknown-pragmas	未知的pragma会报错（-Wno-unknown-pragmas 应该是相反的）
-fomit-frame-pointer	不生成栈帧指针,属于-O1优化
-Wstack-protector	没有防止堆栈崩溃的函数时warning (-fno-stack-protector)
-MMD	only user header files, not system header files.
-fexceptions	Enable exception handling.
-funwind-tables	Unwind tables contain debug frame information which is also necessary for the handling of such exceptions
-fasynchronous-unwind-tables	Generate unwind table in DWARF format. so it can be used for stack unwinding from asynchronous events
-fabi-version=n	Use version n of the C++ ABI. The default is version 0.(Version 2 is the version of the C++ ABI that first appeared in G++ 3.4, and was the default through G++ 4.9.) ABI: an application binary interface (ABI) is an interface between two binary program modules. Often, one of these modules is a library or operating system facility, and the other is a program that is being run by a user.
-fno-rtti	Disable generation of information about every class with virtual functions for use by the C++ run-time type identification features (dynamic_cast and typeid). If you don’t use those parts of the language, you can save some space by using this flag
-faligned-new	Enable support for C++17 new of types that require more alignment than `void* ::operator new(std::size_t)` provides. A numeric argument such as `-faligned-new=32` can be used to specify how much alignment (in bytes) is provided by that function, but few users will need to override the default of `alignof(std::max_align_t)`. This flag is enabled by default for `-std=c++17`.
-Wl, xxx	pass xxx option to linker, e.g., `-Wl,-R/staff/shaojiemike/github/MultiPIM_icarus0/common/libconfig/lib` specify a runtime library search path for dynamic libraries (shared libraries) during the linking process.

General Optimization Options

-O, -O2, -O3

-O3 turns on all optimizations specified by -O2

and also turns on the -finline-functions, -funswitch-loops, -fpredictive-commoning, -fgcse-after-reload, -ftree-loop-vectorize, -ftree-loop-distribute-patterns, -ftree-slp-vectorize, -fvect-cost-model, -ftree-partial-pre and -fipa-cp-clone options

-ffastmath

允许使用浮点计算获得更高的性能，但可能会略微降低精度。

-Ofast

更快但是有保证正确

-flto

（仅限 GNU）链接时优化，当程序链接时检查文件之间的函数调用的步骤。该标志必须用于编译和链接时。使用此标志的编译时间很长，但是根据应用程序，当与 -O* 标志结合使用时，可能会有明显的性能改进。这个标志和任何优化标志都必须传递给链接器，并且应该调用 gcc/g++/gfortran 进行链接而不是直接调用 ld。

-mtune=processor

此标志对特定处理器类型进行额外调整，但它不会生成额外的 SIMD 指令，因此不存在体系结构兼容性问题。调整将涉及对处理器缓存大小、首选指令顺序等的优化。

在 AMD Bulldozer 节点上使用的值为 bdver1，在 AMD Epyc 节点上使用的值为 znver2。是zen ver2的简称。

Optimization Options: 数据预取相关

-fprefetch-loop-arrays
1. 如果目标机器支持，生成预取内存的指令，以提高访问大数组的循环的性能。这个选项可能产生更好或更差的代码；结果在很大程度上取决于源代码中的循环结构。
2. -Os禁用

Optimization Options: 访存优化相关

https://zhuanlan.zhihu.com/p/496435946

下面没有特别指明都是O3，默认开启

调整数据的访问顺序

-ftree-loop-distribution
1. 允许将一个复杂的大循环，拆开成多个循环，各自可以继续并行和向量化
-ftree-loop-distribute-patterns
1. 类似上面一种？
-floop-interchange
1. 允许交换多层循环次序来连续访存
-floop-unroll-and-jam
1. 允许多层循环，将外循环按某种系数展开，并将产生的多个内循环融合。

代码段对齐

(不是计算访问的数据)

-falign-functions=n:m:n2:m2
1. Enabled at levels -O2, -O3.
  类似有一堆

调整代码块的布局

-freorder-blocks
1. 函数基本块重排来，减少分支

Optimization Options: Unroll Flags

-funroll-loops

Unroll loops whose number of iterations can be determined at compile time or upon entry to the loop. -funroll-loops implies -frerun-cse-after-loop. This option makes code larger, and may or may not make it run faster.

-funroll-all-loops

Unroll all loops, even if their number of iterations is uncertain when the loop is entered. This usually makes programs run more slowly. -funroll-all-loops implies the same options as -funroll-loops,

max-unrolled-insns

The maximum number of instructions that a loop should have if that loop is unrolled, and if the loop is unrolled, it determines how many times the loop code is unrolled.
如果循环被展开，则循环应具有的最大指令数，如果循环被展开，则它确定循环代码被展开的次数。

max-average-unrolled-insns

The maximum number of instructions biased by probabilities of their execution that a loop should have if that loop is unrolled, and if the loop is unrolled, it determines how many times the loop code is unrolled.
如果一个循环被展开，则根据其执行概率偏置的最大指令数，如果该循环被展开，则确定循环代码被展开的次数。

max-unroll-times

The maximum number of unrollings of a single loop.
单个循环的最大展开次数。

Optimization Options: SIMD Instructions

-march=native

会自动检测，但有可能检测不对。

-march=”arch”

这将为特定架构生成 SIMD 指令并应用 -mtune 优化。 arch 的有用值与上面的 -mtune 标志相同。

g++ -march=native -m32 ... -Q --help=target

-mtune=                               skylake-avx512 

 Known valid arguments for -march= option:
    i386 i486 i586 pentium lakemont pentium-mmx winchip-c6 winchip2 c3 samuel-2 c3-2 nehemiah c7 esther i686 pentiumpro pentium2 pentium3 pentium3m pentium-m pentium4 pentium4m prescott nocona core2 nehalem corei7 westmere sandybridge corei7-avx ivybridge core-avx-i haswell core-avx2 broadwell skylake skylake-avx512 cannonlake icelake-client icelake-server cascadelake tigerlake bonnell atom silvermont slm goldmont goldmont-plus tremont knl knm intel geode k6 k6-2 k6-3 athlon athlon-tbird athlon-4 athlon-xp athlon-mp x86-64 eden-x2 nano nano-1000 nano-2000 nano-3000 nano-x2 eden-x4 nano-x4 k8 k8-sse3 opteron opteron-sse3 athlon64 athlon64-sse3 athlon-fx amdfam10 barcelona bdver1 bdver2 bdver3 bdver4 znver1 znver2 btver1 btver2 generic native