SHAOJIE'S BOOK

Posted 2022-04-27Updated 2025-01-30Thinking30 minutes read (About 4440 words)

Presentation & Visualization : PPT

导言

学术分享：

目标：让读者理解原理。

工作汇报：

目标：听者听懂、明白背景、工作的要点难点、明确阶段成果
流程：STAR法则组织。

Posted 2022-04-23Updated 2025-01-30network10 minutes read (About 1530 words)

Tcpdump & wireshark

命令行查看当前机器公网ip

1 2	> curl myip.ipip.net 当前 IP：117.136.101.72 来自于：中国安徽移动

检测机器端口开放

# 网页服务直接下载检查内容
wget 4.shaojiemike.top:28096
# -z 选项指示 nc 仅扫描打开的端口，而不发送任何数据，并且 -v 用于获取更多详细信息。
nc -z -v 4.shaojiemike.top 28096

或者扫描指定端口

# IPV6 也行
$ nmap -6 -p 8096 2001:da8:d800:611:5464:f7ab:9560:a646
Starting Nmap 7.80 ( https://nmap.org ) at 2023-01-04 19:33 CST
Nmap scan report for 2001:da8:d800:611:5464:f7ab:9560:a646
Host is up (0.00099s latency).

PORT     STATE SERVICE
8096/tcp open  unknown

Nmap done: 1 IP address (1 host up) scanned in 0.05 seconds

$ nmap -p 28096 4.shaojiemike.top
Starting Nmap 7.80 ( https://nmap.org ) at 2023-01-04 19:19 CST
Nmap scan report for 4.shaojiemike.top (114.214.181.97)
Host is up (0.0011s latency).

PORT      STATE SERVICE
28096/tcp open  unknown

Nmap done: 1 IP address (1 host up) scanned in 0.05 seconds

全部端口，但是会很慢。50分钟

1	sudo nmap -sT -p- 4.shaojiemike.top

wireshark

显示过滤

上方的过滤窗口

tcp.port==80&&(ip.dst==192.168.1.2||ip.dst==192.168.1.3)

ip.addr ==192.168.1.1 //显示所有目标或源地址是192.168.1.1的数据包
eth.addr== 80:f6:2e:ce:3f:00 //根据MAC地址过滤，详见“wireshark过滤MAC地址/物理地址”
tcp.port==23

捕捉过滤

抓包前在capture option中设置，仅捕获符合条件的包，可以避免产生较大的捕获文件和内存占用，但不能完整的复现测试时的网络环境。

1
2
3

host 192.168.1.1 //抓取192.168.1.1 收到和发出的所有数据包
src host 192.168.1.1 //源地址，192.168.1.1发出的所有数据包
dst host 192.168.1.1 //目标地址，192.168.1.1收到的所有数据包

color 含义

tcpdump

传统命令行抓包工具

常用参数

注意过滤规则间的and

-nn :
1. 单个 n 表示不解析域名，直接显示 IP；
2. 两个 n 表示不解析域名和端口。
3. 方便查看 IP 和端口号，
4. 不需要域名解析会非常高效。
-i 指定网卡 -D查看网卡
-v，-vv 和 -vvv 来显示更多的详细信息
port 80 抓取 80 端口上的流量，通常是 HTTP。在前面加src,dst限定词
1. tcpudmp -i eth0 -n arp host 192.168.199 抓取192.168.199.* 网段的arp协议包，arp可以换为tcp,udp等。
-A,-X,-xx会逐渐显示包内容更多信息
-e : 显示数据链路层信息。
1. 默认情况下 tcpdump 不会显示数据链路层信息，使用 -e 选项可以显示源和目的 MAC 地址，以及 VLAN tag 信息。

输出说明

1	192.168.1.106.56166 > 124.192.132.54.80

ip 是 192.168.1.106，源端口是 56166，
目的地址是 124.192.132.54，目的端口是 80。
> 符号代表数据的方向。

Flags

常见的三次握手 TCP 报文的 Flags:

[S] : SYN（开始连接）
[.] : 没有 Flag
[P] : PSH（推送数据）
[F] : FIN （结束连接）
[R] : RST（重置连接）

常见用途

根据目的IP，筛选网络经过的网卡和端口
能抓各种协议的包比如ping，ssh

案例分析

1	curl --trace-ascii - www.github.com

github ip 为 20.205.243.166

ifconfig显示 ibs5的网卡有21TB的带宽上限，肯定是IB卡了。

sudo tcpdump -i ibs5 '((tcp) and (host 20.205.243.166))'
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on ibs5, link-type LINUX_SLL (Linux cooked v1), capture size 262144 bytes
15:53:53.848619 IP snode0.59878 > 20.205.243.166.http: Flags [S], seq 879685062, win 64128, options [mss 2004,sackOK,TS val 4096492456 ecr 0,nop,wscale 7], length 0
15:53:53.952705 IP 20.205.243.166.http > snode0.59878: Flags [S.], seq 1917452372, ack 879685063, win 65535, options [mss 1436,sackOK,TS val 1127310087 ecr 4096492456,nop,wscale 10], length 0
15:53:53.952728 IP snode0.59878 > 20.205.243.166.http: Flags [.], ack 1, win 501, options [nop,nop,TS val 4096492560 ecr 1127310087], length 0
15:53:53.953208 IP snode0.59878 > 20.205.243.166.http: Flags [P.], seq 1:79, ack 1, win 501, options [nop,nop,TS val 4096492561 ecr 1127310087], length 78: HTTP: GET / HTTP/1.1
15:53:54.058654 IP 20.205.243.166.http > snode0.59878: Flags [P.], seq 1:89, ack 79, win 64, options [nop,nop,TS val 1127310193 ecr 4096492561], length 88: HTTP: HTTP/1.1 301 Moved Permanently
15:53:54.058668 IP snode0.59878 > 20.205.243.166.http: Flags [.], ack 89, win 501, options [nop,nop,TS val 4096492666 ecr 1127310193], length 0
15:53:54.059092 IP snode0.59878 > 20.205.243.166.http: Flags [F.], seq 79, ack 89, win 501, options [nop,nop,TS val 4096492667 ecr 1127310193], length 0
15:53:54.162608 IP 20.205.243.166.http > snode0.59878: Flags [F.], seq 89, ack 80, win 64, options [nop,nop,TS val 1127310297 ecr 4096492667], length 0

$ sudo tcpdump -i ibs5 -nn -vvv -e '((port 80) and (tcp) and (host 20.205.243.166))'                                                                                                                                                 tcpdump: listening on ibs5, link-type LINUX_SLL (Linux cooked v1), capture size 262144 bytes
16:09:38.743478 Out ethertype IPv4 (0x0800), length 76: (tos 0x0, ttl 64, id 15215, offset 0, flags [DF], proto TCP (6), length 60)
    10.1.13.50.38376 > 20.205.243.166.80: Flags [S], cksum 0x1fd5 (incorrect -> 0x98b6), seq 1489092902, win 64128, options [mss 2004,sackOK,TS val 4097437351 ecr 0,nop,wscale 7], length 0
16:09:38.848164  In ethertype IPv4 (0x0800), length 76: (tos 0x0, ttl 48, id 0, offset 0, flags [DF], proto TCP (6), length 60)
    20.205.243.166.80 > 10.1.13.50.38376: Flags [S.], cksum 0x69ba (correct), seq 3753100548, ack 1489092903, win 65535, options [mss 1436,sackOK,TS val 3712395681 ecr 4097437351,nop,wscale 10], length 0
16:09:38.848212 Out ethertype IPv4 (0x0800), length 68: (tos 0x0, ttl 64, id 15216, offset 0, flags [DF], proto TCP (6), length 52)
    10.1.13.50.38376 > 20.205.243.166.80: Flags [.], cksum 0x1fcd (incorrect -> 0x9613), seq 1, ack 1, win 501, options [nop,nop,TS val 4097437456 ecr 3712395681], length 0
16:09:38.848318 Out ethertype IPv4 (0x0800), length 146: (tos 0x0, ttl 64, id 15217, offset 0, flags [DF], proto TCP (6), length 130)
    10.1.13.50.38376 > 20.205.243.166.80: Flags [P.], cksum 0x201b (incorrect -> 0x9f0a), seq 1:79, ack 1, win 501, options [nop,nop,TS val 4097437456 ecr 3712395681], length 78: HTTP, length: 78
        GET / HTTP/1.1
        Host: www.github.com
        User-Agent: curl/7.68.0
        Accept: */*

16:09:38.954152  In ethertype IPv4 (0x0800), length 156: (tos 0x0, ttl 48, id 45056, offset 0, flags [DF], proto TCP (6), length 140)
    20.205.243.166.80 > 10.1.13.50.38376: Flags [P.], cksum 0x024d (correct), seq 1:89, ack 79, win 64, options [nop,nop,TS val 3712395786 ecr 4097437456], length 88: HTTP, length: 88
        HTTP/1.1 301 Moved Permanently
        Content-Length: 0
        Location: https://www.github.com/

16:09:38.954207 Out ethertype IPv4 (0x0800), length 68: (tos 0x0, ttl 64, id 15218, offset 0, flags [DF], proto TCP (6), length 52)
    10.1.13.50.38376 > 20.205.243.166.80: Flags [.], cksum 0x1fcd (incorrect -> 0x949a), seq 79, ack 89, win 501, options [nop,nop,TS val 4097437562 ecr 3712395786], length 0
16:09:38.954884 Out ethertype IPv4 (0x0800), length 68: (tos 0x0, ttl 64, id 15219, offset 0, flags [DF], proto TCP (6), length 52)
    10.1.13.50.38376 > 20.205.243.166.80: Flags [F.], cksum 0x1fcd (incorrect -> 0x9498), seq 79, ack 89, win 501, options [nop,nop,TS val 4097437563 ecr 3712395786], length 0
16:09:39.060177  In ethertype IPv4 (0x0800), length 68: (tos 0x0, ttl 48, id 45057, offset 0, flags [DF], proto TCP (6), length 52)
    20.205.243.166.80 > 10.1.13.50.38376: Flags [F.], cksum 0x95e2 (correct), seq 89, ack 80, win 64, options [nop,nop,TS val 3712395892 ecr 4097437563], length 0
16:09:39.060221 Out ethertype IPv4 (0x0800), length 68: (tos 0x0, ttl 64, id 15220, offset 0, flags [DF], proto TCP (6), length 52)
    10.1.13.50.38376 > 20.205.243.166.80: Flags [.], cksum 0x1fcd (incorrect -> 0x93c4), seq 80, ack 90, win 501, options [nop,nop,TS val 4097437668 ecr 3712395892], length 0
16:09:46.177269 Out ethertype IPv4 (0x0800), length 76: (tos 0x0, ttl 64, id 38621, offset 0, flags [DF], proto TCP (6), length 60)

snode0 ip 是 10.1.13.50

traceroute

mtr = traceroute+ping

$ traceroute www.baid.com
traceroute to www.baidu.com (182.61.200.6), 30 hops max, 60 byte packets                                                                                                                                                           
1  acsa-nfs (10.1.13.1)  0.179 ms  0.180 ms  0.147 ms                                                                                                                                                                            
2  192.168.252.1 (192.168.252.1)  2.016 ms  1.954 ms  1.956 ms                                                                                                                                                                   
3  202.38.75.254 (202.38.75.254)  4.942 ms  3.941 ms  4.866 ms

traceroute命令用于显示数据包到主机间的路径。

NETWORKMANAGER 管理

# shaojiemike @ snode0 in /etc/NetworkManager [16:49:55]
$ nmcli general status
STATE         CONNECTIVITY  WIFI-HW  WIFI     WWAN-HW  WWAN
disconnected  unknown       enabled  enabled  enabled  enabled

# shaojiemike @ snode0 in /etc/NetworkManager [16:50:40]
$ nmcli connection show
NAME                     UUID                                  TYPE        DEVICE
InfiniBand connection 1  7edf4eea-0591-48ba-868a-e66e8cb720ce  infiniband  --

好像之前使用过的样子。

# shaojiemike @ snode0 in /etc/NetworkManager [16:56:36] C:127
$ service network-manager status
● NetworkManager.service - Network Manager
     Loaded: loaded (/lib/systemd/system/NetworkManager.service; enabled; vendor preset: enabled)
     Active: active (running) since Mon 2022-03-14 11:52:06 CST; 1 months 10 days ago
       Docs: man:NetworkManager(8)
   Main PID: 1339 (NetworkManager)
      Tasks: 3 (limit: 154500)
     Memory: 12.0M
     CGroup: /system.slice/NetworkManager.service
             └─1339 /usr/sbin/NetworkManager --no-daemon

Warning: some journal files were not opened due to insufficient permissions.

应该是这个 Secure site-to-site connection with Linux IPsec VPN 来设置的

需要进一步的研究学习

暂无

遇到的问题

暂无

开题缘由、总结、反思、吐槽~~

FJW说所有网络都是通过NFS一起出去的

参考文献

无

Posted 2022-04-23Updated 2025-01-30Tutorials2 minutes read (About 272 words)

Servers

通过IPMI芯片的静态IP远程重启和配置机器

https://cloud.tencent.com/developer/article/1448642

Group

当前组

1 2	shaojiemike@snode6:~$ groups shaojiemike shaojiemike : staff sudo

所有组

1	cat /etc/group

User

whoami

一般用户位置

/etc/passwd

LDAP教程

如果发现自己不在/etc/passwd里，很可能使用了ldap 集中身份认证。可以在多台机器上实现分布式账号登录，用同一个账号。

1	getent passwd

first reboot server

ctrl + alt + F3     #jump into command line
login
su - {user-name}
sudo -s
sudo -i
# If invoked without a user name, su defaults to becoming the superuser
ip a |less          #check ip address fjw弄了静态IP就没这个问题了

限制当前shell用户爆内存

宕机一般是爆内存，进程分配肯定会注意不超过物理核个数。

在zshrc里写入 25*1024*1024 = 25GB的内存上限

1	ulimit -v 26214400

当前shell程序超内存，会输出Memory Error结束。

测试读取200GB大文件到内存

1
2
3

with open("/home/shaojiemike/test/DynamoRIO/OpenBLASRawAssembly/openblas_utest.log", 'r') as f:
    data= f.readlines()
    print(len(data))

有文章说Linux有些版本内核会失效

Posted 2022-04-13Updated 2025-01-30Programming22 minutes read (About 3279 words)

PyTorchGeometric

PyTorch Geometric Liberty

PyG是一个基于PyTorch的用于处理不规则数据（比如图）的库，或者说是一个用于在图等数据上快速实现表征学习的框架。它的运行速度很快，训练模型速度可以达到DGL（Deep Graph Library ）v0.2 的40倍（数据来自论文）。除了出色的运行速度外，PyG中也集成了很多论文中提出的方法（GCN,SGC,GAT,SAGE等等）和常用数据集。因此对于复现论文来说也是相当方便。

经典的库才有函数可以支持，自己的模型，自己根据自动微分实现。还要自己写GPU并行。

MessagePassing 是网络交互的核心

数据

数据怎么存储

torch_geometric.data.Data (下面简称Data) 用于构建图

每个节点的特征 x
1. 形状是[num_nodes, num_node_features]。
节点之间的边 edge_index
1. 形状是 [2, num_edges]
节点的标签 y
1. 假如有。形状是[num_nodes, *]
边的特征 edge_attr
1. [num_edges, num_edge_features]

数据支持自定义

通过data.face来扩展Data

获取数据

在 PyG 中，我们使用的不是这种写法，而是在get()函数中根据 index 返回torch_geometric.data.Data类型的数据，在Data里包含了数据和 label。

数据处理的例子

由于是无向图，因此有 4 条边：(0 -> 1), (1 -> 0), (1 -> 2), (2 -> 1)。每个节点都有自己的特征。上面这个图可以使用 torch_geometric.data.Data来表示如下：

import torch
from torch_geometric.data import Data
# 由于是无向图，因此有 4 条边：(0 -> 1), (1 -> 0), (1 -> 2), (2 -> 1)
edge_index = torch.tensor([[0, 1, 1, 2],
                           [1, 0, 2, 1]], dtype=torch.long)
# 节点的特征                         
x = torch.tensor([[-1], [0], [1]], dtype=torch.float)

data = Data(x=x, edge_index=edge_index)

注意edge_index中边的存储方式，有两个list，第 1 个list是边的起始点，第 2 个list是边的目标节点。注意与下面的存储方式的区别。

import torch
from torch_geometric.data import Data

edge_index = torch.tensor([[0, 1],
                           [1, 0],
                           [1, 2],
                           [2, 1]], dtype=torch.long)
x = torch.tensor([[-1], [0], [1]], dtype=torch.float)

data = Data(x=x, edge_index=edge_index.t().contiguous())

这种情况edge_index需要先转置然后使用contiguous()方法。关于contiguous()函数的作用，查看 PyTorch中的contiguous。

数据集

Dataset

import torch
from torch_geometric.data import InMemoryDataset


class MyOwnDataset(InMemoryDataset): # or (Dataset)
    def __init__(self, root, transform=None, pre_transform=None):
        super(MyOwnDataset, self).__init__(root, transform, pre_transform)
        self.data, self.slices = torch.load(self.processed_paths[0])

    # 返回一个包含没有处理的数据的名字的list。如果你只有一个文件，那么它返回的list将只包含一个元素。事实上，你可以返回一个空list，然后确定你的文件在后面的函数process()中。
    @property
    def raw_file_names(self):
        return ['some_file_1', 'some_file_2', ...]

    # 很像上一个函数，它返回一个包含所有处理过的数据的list。在调用process()这个函数后，通常返回的list只有一个元素，它只保存已经处理过的数据的名字。
    @property
    def processed_file_names(self):
        return ['data.pt']

    def download(self):
        pass
        # Download to `self.raw_dir`. or just pass

    # 整合你的数据成一个包含data的list。然后调用 self.collate()去计算将用DataLodadr的片段。
    def process(self):
        # Read data into huge `Data` list.
        data_list = [...]

        if self.pre_filter is not None:
            data_list [data for data in data_list if self.pre_filter(data)]

        if self.pre_transform is not None:
            data_list = [self.pre_transform(data) for data in data_list]

        data, slices = self.collate(data_list)
        torch.save((data, slices), self.processed_paths[0])

DataLoader

DataLoader 这个类允许你通过batch的方式feed数据。创建一个DotaLoader实例，可以简单的指定数据集和你期望的batch size。

1	loader = DataLoader(dataset, batch_size=512, shuffle=True)

DataLoader的每一次迭代都会产生一个Batch对象。它非常像Data对象。但是带有一个‘batch’属性。它指明了了对应图上的节点连接关系。因为DataLoader聚合来自不同图的的batch的x,y 和edge_index，所以GNN模型需要batch信息去知道那个节点属于哪一图。

1
2
3

for batch in loader:
    batch
    >>> Batch(x=[1024, 21], edge_index=[2, 1568], y=[512], batch=[1024])

MessagePassing(核心)

其中，x 表示表格节点的 embedding，e 表示边的特征，ϕ 表示 message 函数，□ 表示聚合 aggregation 函数，γ 表示 update 函数。上标表示层的 index，比如说，当 k = 1 时，x 则表示所有输入网络的图结构的数据。

为了实现这个，我们需要定义：

message
1. 定义了对于每个节点对 (xi,xj)，怎样生成信息（message）。
update
aggregation scheme
propagate(edge_index, size=None, **kwargs)
1. 这个函数最终会按序调用 message、aggregate 和 update 函数。
update(aggr_out, **kwargs)
1. 这个函数利用聚合好的信息（message）更新每个节点的 embedding。

propagate(edge_index: Union[torch.Tensor, torch_sparse.tensor.SparseTensor], size: Optional[Tuple[int, int]] = None, **kwargs)

edge_index (Tensor or SparseTensor)
1. 输入的边的信息，定义底层图形连接/消息传递流。
2. torch.LongTensor类型
  1. its shape must be defined as [2, num_messages], where messages from nodes in edge_index[0] are sent to nodes in edge_index[1]
3. torch_sparse.SparseTensor类型
  1. its sparse indices (row, col) should relate to row = edge_index[1] and col = edge_index[0].
也不一定是方形节点矩阵。x=(x_N, x_M).

MessagePassing.message(…)

会根据 flow=“source_to_target”和if flow=“target_to_source”或者x_i,x_j,来区分处理的边。

x_j表示提升张量，它包含每个边的源节点特征，即每个节点的邻居。通过在变量名后添加_i或_j，可以自动提升节点特征。事实上，任何张量都可以通过这种方式转换，只要它们包含源节点或目标节点特征。

_j表示每条边的起点，_i表示每条边的终点。x_j表示的就是每条边起点的x值（也就是Feature）。如果你手动加了别的内容，那么它的_j, _i也会自动进行处理，这个自己稍微单步执行一下就知道了

在实现message的时候，节点特征会自动map到各自的source and target nodes。

aggregate(inputs: torch.Tensor, index: torch.Tensor, ptr: Optional[torch.Tensor] = None, dim_size: Optional[int] = None, aggr: Optional[str] = None) → torch.Tensor

aggregation scheme 只需要设置参数就好，“add”, “mean”, “min”, “max” and “mul” operations

MessagePassing.update(aggr_out, …)

aggregation 输出作为第一个参数，后面的参数是 propagate()的

实现GCN 例子

$$
\mathbf{x}i^{(k)} = \sum{j \in \mathcal{N}(i) \cup { i }} \frac{1}{\sqrt{\deg(i)} \cdot \sqrt{\deg(j)}} \cdot \left( \mathbf{\Theta}^{\top} \cdot \mathbf{x}_j^{(k-1)} \right)
$$

该式子先将周围的节点与权重矩阵\theta相乘, 然后通过节点的度degree正则化，最后相加

步骤可以拆分如下

添加self-loop 到邻接矩阵（Adjacency Matrix）。
节点特征的线性变换。
计算归一化系数
Normalize 节点特征。
sum相邻节点的feature（“add”聚合）。

步骤1 和 2 需要在message passing 前被计算好。 3 - 5 可以torch_geometric.nn.MessagePassing 类。

添加self-loop的目的是让featrue在聚合的过程中加入当前节点自己的feature，没有self-loop聚合的就只有邻居节点的信息。

import torch
from torch_geometric.nn import MessagePassing
from torch_geometric.utils import add_self_loops, degree

class GCNConv(MessagePassing):
    def __init__(self, in_channels, out_channels):
        super().__init__(aggr='add')  # "Add" aggregation (Step 5).
        self.lin = torch.nn.Linear(in_channels, out_channels)

    def forward(self, x, edge_index):
        # x has shape [N, in_channels]
        # edge_index has shape [2, E]

        # Step 1: Add self-loops to the adjacency matrix.
        edge_index, _ = add_self_loops(edge_index, num_nodes=x.size(0))

        # Step 2: Linearly transform node feature matrix.
        x = self.lin(x)

        # Step 3: Compute normalization.
        row, col = edge_index
        deg = degree(col, x.size(0), dtype=x.dtype)
        deg_inv_sqrt = deg.pow(-0.5)
        deg_inv_sqrt[deg_inv_sqrt == float('inf')] = 0
        norm = deg_inv_sqrt[row] * deg_inv_sqrt[col]

        # Step 4-5: Start propagating messages.
        return self.propagate(edge_index, x=x, norm=norm)

    def message(self, x_j, norm):
        # x_j has shape [E, out_channels]

        # Step 4: Normalize node features.
        return norm.view(-1, 1) * x_j

所有的逻辑代码都在forward()里面，当我们调用propagate()函数之后，它将会在内部调用message()和update()。

使用 GCN 的例子

1 2	conv = GCNConv(16, 32) x = conv(x, edge_index)

SAGE的例子

聚合函数（aggregation）我们用最大池化（max pooling），这样上述公示中的 AGGREGATE 可以写为：

上述公式中，对于每个邻居节点，都和一个 weighted matrix 相乘，并且加上一个 bias，传给一个激活函数。相关代码如下(对应第二个图)：

class SAGEConv(MessagePassing):
    def __init__(self, in_channels, out_channels):
        super(SAGEConv, self).__init__(aggr='max')
        self.lin = torch.nn.Linear(in_channels, out_channels)
        self.act = torch.nn.ReLU()
      
    def message(self, x_j):
        # x_j has shape [E, in_channels]
 
        x_j = self.lin(x_j)
        x_j = self.act(x_j)
    
        return x_j

对于 update 方法，我们需要聚合更新每个节点的 embedding，然后加上权重矩阵和偏置(对应第一个图第二行)：

class SAGEConv(MessagePassing):
    def __init__(self, in_channels, out_channels):
        self.update_lin = torch.nn.Linear(in_channels + out_channels, in_channels, bias=False)
        self.update_act = torch.nn.ReLU()
      
    def update(self, aggr_out, x):
        # aggr_out has shape [N, out_channels]
      
        new_embedding = torch.cat([aggr_out, x], dim=1)
        new_embedding = self.update_lin(new_embedding)
        new_embedding = torch.update_act(new_embedding)
      
        return new_embedding

综上所述，SageConv 层的定于方法如下：

import torch
from torch.nn import Sequential as Seq, Linear, ReLU
from torch_geometric.nn import MessagePassing
from torch_geometric.utils import remove_self_loops, add_self_loops
class SAGEConv(MessagePassing):
    def __init__(self, in_channels, out_channels):
        super(SAGEConv, self).__init__(aggr='max') #  "Max" aggregation.
        self.lin = torch.nn.Linear(in_channels, out_channels)
        self.act = torch.nn.ReLU()
        self.update_lin = torch.nn.Linear(in_channels + out_channels, in_channels, bias=False)
        self.update_act = torch.nn.ReLU()
      
    def forward(self, x, edge_index):
        # x has shape [N, in_channels]
        # edge_index has shape [2, E]
      
        # Removes every self-loop in the graph given by edge_index, so that (i,i)∉E for every i ∈ V.
        edge_index, _ = remove_self_loops(edge_index)
        # Adds a self-loop (i,i)∈ E to every node i ∈ V in the graph given by edge_index
        edge_index, _ = add_self_loops(edge_index, num_nodes=x.size(0))
      
      
        return self.propagate(edge_index, size=(x.size(0), x.size(0)), x=x)
 
    def message(self, x_j):
        # x_j has shape [E, in_channels]
 
        x_j = self.lin(x_j)
        x_j = self.act(x_j)
      
        return x_j
 
    def update(self, aggr_out, x):
        # aggr_out has shape [N, out_channels]
 
 
        new_embedding = torch.cat([aggr_out, x], dim=1)
      
        new_embedding = self.update_lin(new_embedding)
        new_embedding = self.update_act(new_embedding)
      
        return new_embedding

batch的实现

GNN的batch实现和传统的有区别。

zzq的观点

将网络复制batch次，batchSize的数据产生batchSize个Loss。通过Sum或者Max处理Loss，整体同时更新所有的网络参数。至于网络中循环输入和输出的H^(t-1)和H^t。（感觉直接平均就行了。

有几个可能的问题

网络中参数不是线性层，CNN这种的网络。pytorch会自动并行吗？还需要手动
还有个问题，如果你还想用PyG的X和edge。并不能额外拓展维度。

图像和语言处理领域的传统基本思路：

通过 rescaling or padding(填充) 将相同大小的网络复制，来实现新添加维度。而新添加维度的大小就是batch_size。

但是由于图神经网络的特殊性：边和节点的表示。传统的方法要么不可行，要么会有数据的重复表示产生的大量内存消耗。

ADVANCED MINI-BATCHING in PyG

为此引入了ADVANCED MINI-BATCHING来实现对大量数据的并行。

https://pytorch-geometric.readthedocs.io/en/latest/notes/batching.html

实现：

邻接矩阵以对角线的方式堆叠(创建包含多个孤立子图的巨大图)
节点和目标特征只是在节点维度中串联???

优势

依赖message passing 方案的GNN operators不需要修改，因为消息仍然不能在属于不同图的两个节点之间交换。
没有计算或内存开销。例如，此batching 过程完全可以在不填充节点或边特征的情况下工作。请注意，邻接矩阵没有额外的内存开销，因为它们以稀疏方式保存，只保存非零项，即边。

torch_geometric.loader.DataLoader

可以实现将多个图batch成一个大图。通过重写collate()来实现，并继承了pytorch的所有参数，比如num_workers.

在合并的时候，除开edge_index [2, num_edges]通过增加第二维度。其余（节点）都是增加第一维度的个数。

最重要的作用

# 原本是[2*4]
# 自己实现的话，是直接连接
 >>> tensor([[0, 0, 1, 1, 0, 0, 1, 1],
             [0, 1, 1, 2, 0, 1, 1, 2]])
# 会修改成新的边
 print(batch.edge_index)
 >>> tensor([[0, 0, 1, 1, 2, 2, 3, 3],
             [0, 1, 1, 2, 3, 4, 4, 5]])

torch_geometric.loader.DataLoader 例子1

from torch_geometric.data import Data
from torch_geometric.loader import DataLoader

data_list = [Data(...), ..., Data(...)]
loader = DataLoader(data_list, batch_size=32)

torch_geometric.loader.DataLoader 例子2

from torch_geometric.datasets import TUDataset
from torch_geometric.loader import DataLoader

dataset = TUDataset(root='/tmp/ENZYMES', name='ENZYMES', use_node_attr=True)
loader = DataLoader(dataset, batch_size=32, shuffle=True)

for batch in loader:
    batch
    >>> DataBatch(batch=[1082], edge_index=[2, 4066], x=[1082, 21], y=[32])

    batch.num_graphs
    >>> 32

需要进一步的研究学习

暂无

遇到的问题

暂无

开题缘由、总结、反思、吐槽~~

参考文献

无

Posted 2022-04-13Updated 2025-01-30Artificial Intelligence10 minutes read (About 1448 words)

GNN

图神经网络（Graph Neural Networks，GNN）以及特点

GNN可以分析对象之间的关系，来实现精准的推荐
问题
1. 因为图是不规则的，每个图都有一个大小可变的无序节点，图中的每个节点都有不同数量的相邻节点，导致卷积等操作不适合图。
2. 现有深度学习算法的一个核心假设是数据样本之间彼此独立。对于图来说，每个数据样本（节点）都会有边与图中其他实数据样本（节点）相关，这些信息可用于捕获实例之间的相互依赖关系。

图嵌入 & 网络嵌入

图神经网络的研究与图嵌入（对图嵌入不了解的读者可以参考我的这篇文章《图嵌入综述》）或网络嵌入密切相关。

真实的图（网络）往往是高维、难以处理的，图嵌入的目标是发现高维图的低维向量表示。

图分析任务

节点分类，
链接预测，
聚类，
可视化

图神经网络分类

图卷积网络（Graph Convolution Networks，GCN）
图注意力网络（Graph Attention Networks）
1. 图注意力网络（GAT）是一种基于空间的图卷积网络，它的注意机制是在聚合特征信息时，将注意机制用于确定节点邻域的权重。
图自编码器（ Graph Autoencoders）
图生成网络（ Graph Generative Networks）
图时空网络（Graph Spatial-temporal Networks）。

图卷积网络（Graph Convolution Networks，GCN）

GCN可谓是图神经网络的“开山之作”，它首次将图像处理中的卷积操作简单的用到图结构数据处理中来，并且给出了具体的推导，这里面涉及到复杂的谱图理论。推导过程还是比较复杂的，然而最后的结果却非常简单。

聚合邻居节点的特征然后做一个线性变换吗？没错，确实是这样，同时为了使得GCN能够捕捉到K-hop的邻居节点的信息，作者还堆叠多层GCN layers，如堆叠K层有：

经典的简单几类

Semi-supervised learning for node-level classification：

给定一个网络，其中部分节点被标记，其他节点未标记，ConvGNNs可以学习一个鲁棒模型，有效地识别未标记节点的类标签。为此，可以通过叠加一对图卷积层，然后是用于多类分类的softmax层来构建端到端框架。见图(a)

Supervised learning for graph-level classification：

图级分类的目的是预测整个图的类标签。该任务的端到端学习可以结合图卷积层、图池层和/或readout层来实现。图卷积层负责精确的高级节点表示，图池层则扮演下采样的角色，每次都将每个图粗化成一个子结构。readout层将每个图的节点表示折叠成一个图表示。通过在图表示中应用一个多层感知器和一个softmax层，我们可以建立一个端到端图分类框架。见图(b)

Unsupervised learning for graph embedding：

当图中没有可用的类标签时，我们可以学习在端到端框架中以完全无监督的方式嵌入图。这些算法以两种方式利用边缘级信息。一种简单的方法是采用自编码器框架，编码器使用图卷积层将图嵌入到潜在表示中，在潜在表示上使用解码器重构图结构。另一种常用的方法是利用负采样方法(negative sampling)，即对图中有链接的部分节点对进行负采样，而对图中有链接的节点对进行正采样。然后应用逻辑回归层对的正负配对进行区分。见图(c)

图自动编码器(Graph autoencoders, GAEs)是一种无监督学习框架，它将node或者graph编码成一个潜在的向量空间，并从编码的信息重构图数据。该算法用于学习network embedding和图生成分布。对于network embedding，GAEs通过重构图的邻接矩阵等图结构信息来学习潜在节点表示。对于图的生成，有的方法是一步一步生成图的节点和边，有的方法是一次性输出整个图。

时空图神经网络(Spatial-temporal graph neural network, STGNNs)

旨在从时空图中学习隐藏的模式，在交通速度预测、驾驶员操纵预测和人类行为识别等多种应用中发挥着越来越重要的作用。STGNNs的核心思想是同时考虑空间依赖和时间依赖。目前的许多方法都是通过图卷积来捕获与RNNs或CNNs的空间依赖关系，从而对时间依赖关系进行建模。下图是STGNNs流程图模型。

需要进一步的研究学习

暂无

遇到的问题

暂无

开题缘由、总结、反思、吐槽~~

参考文献

https://zhuanlan.zhihu.com/p/136521625

https://zhuanlan.zhihu.com/p/75307407

https://mp.weixin.qq.com/s/PSrgm7frsXIobSrlcoCWxw

https://zhuanlan.zhihu.com/p/142948273

https://developer.huaweicloud.com/hero/forum.php?mod=viewthread&tid=109580

Posted 2022-03-31Updated 2025-01-30Tutorials2 minutes read (About 301 words)

OpenLDAP

分布式、多平台集成认证系统

ibug在实验室机器整活还行

https://ibug.io/blog/2022/03/linux-openldap-server/

https://harrychen.xyz/2021/01/17/openldap-linux-auth/

https://www.cnblogs.com/dufeixiang/p/11624210.html

改shell

复杂还有bug,我还是改profile吧

https://ibug.io/blog/2022/03/linux-openldap-server/#user-chsh

挂载

挂在同一个地方，肯定是一样的

# shaojiemike @ snode2 in ~ [20:18:20]
$ df -h .
Filesystem       Size  Used Avail Use% Mounted on
10.1.13.1:/home   15T   11T  3.1T  78% /staff

# shaojiemike @ snode0 in ~ [20:25:51]
$ mount|grep staff
10.1.13.1:/home on /staff type nfs4 (rw,relatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,soft,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=10.1.13.50,local_lock=none,addr=10.1.13.1)

tmpfs是磁盘里的虚拟内存的意思。

设置

具体设置要登录到中央机器上去

# shaojiemike @ hades1 in ~ [20:41:06]
$ cat /etc/hosts
127.0.0.1 localhost
127.0.1.1 hades1
# 222.195.72.30 hades0
# 202.38.72.64 hades1
# The following lines are desirable for IPv6 capable hosts
::1     ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters

114.214.198.26  synology
10.1.13.1       acsa-nfs
10.1.13.6       discovery
10.1.13.50      snode0
10.1.13.51      snode1
10.1.13.52      snode2
10.1.13.53      snode3
10.1.13.54      snode4
10.1.13.55      snode5
10.1.13.56      snode6
10.1.13.114     swabl
10.1.13.119     node19
10.1.13.102     node2
10.1.13.58      hades0
10.1.13.57      hades1

# shaojiemike @ snode0 in ~ [20:36:26]
$ sudo cat /etc/nslcd.conf
# /etc/nslcd.conf
# nslcd configuration file. See nslcd.conf(5)
# for details.

# The user and group nslcd should run as.
uid nslcd
gid nslcd

# The location at which the LDAP server(s) should be reachable.
uri ldaps://ldap.swangeese.fun

需要进一步的研究学习

总共涉及几台机器

遇到的问题

暂无

开题缘由、总结、反思、吐槽~~

参考文献

无

Posted 2022-03-30Updated 2025-01-30Tutorialsa few seconds read (About 41 words)

How To Read Code

How to read code

需要进一步的研究学习

暂无

遇到的问题

暂无

开题缘由、总结、反思、吐槽~~

参考文献

无

Posted 2022-03-29Updated 2025-01-30Tutorials21 minutes read (About 3177 words)

Python MPI

全局解释器锁（GIL,Global Interpreter Lock)

Python代码的执行由Python虚拟机（解释器）来控制。

对Python虚拟机的访问由全局解释器锁（GIL）来控制，正是这个锁能保证同时只有一个线程在运行。所以就会出现尽管你设置了多线程的任务，但是只能跑一个的情况。

但是I/O密集的程序(爬虫)相对好一点，因为I/O操作会调用内建的操作系统C代码，所以这时会释放GIL锁，达到部分多线程的效果。

通常我们用的解释器是官方实现的CPython，要真正利用多核，除非重写一个不带GIL的解释器。

Posted 2022-03-18Updated 2025-01-30tips10 minutes read (About 1484 words)

Conference

论文索引网站

Mendeley，

dblp computer science bibliography，按照会议搜索.DBLP默认是以年份和会议名称排序的

Google Scholar

Microsoft Academic。最近，微软宣布将关闭仅次于谷歌Scholar的第二大学术搜索引擎Microsoft Academic

论文会议查找

在 Google Scholar 或者IEEE Xplore和 dblp里输入论文名，在所有版本里选择最新的(一般就是论文发出的会议和期刊)
获取论文会议名后
在CCF推荐会议里查找（不要缩写，查找关键字）

期刊查询

LetPub

期刊查询小助手

翻译

DeepL全文翻译

影响因子

影响因子(impact factor，IF)是ISl的期刊引证报告(Journal Citation Reports，JCR)中公布的一项数据，自1975年开始，JCR每年公布一次上一年的数据。影响因子指某期刊前两年发表的论文在统计当年的被引用总次数除以该期刊在前两年内发表的论文总数。这是一个国际上通用的期刊评价指标。影响因子是以年为单位进行计算的。

期刊影响因子

https://academic-accelerator.com/

https://www.scimagojr.com/journalsearch.php?q=20571&tip=sid

综合

CCF 推荐会议下载 https://www.ccf.org.cn/Focus/2019-04-25/663625.shtml

https://blog.csdn.net/tmb8z9vdm66wh68vx1/article/details/100571714

https://github.com/bugaosuni59/TH-CPL

HPC期刊，会议时间

CCF会议deadline可视化

https://ccfddl.github.io/

call4papers

A类期刊

名称	全称	截稿时间	结果时间	篇幅	官网
A类期刊
TOCS	ACM Transactions on Computer Systems	-	-	-	https://dl.acm.org/journal/tocs
TPDS	IEEE Transactions on Parallel and Distributed Systems	-	-	-	-
TC	IEEE Transactions on Computers	-	-	-	-
TCAD	IEEE Transactions On Computer-Aided Design Of Integrated Circults And Systems	-	-	-	-
TOS	ACM Transactions on Storage	-	-	-	-
综合类A类期刊
JACM	Journal of the ACM	-	-	-	-
Proc. IEEE	Proceedings of the IEEE	-	-	-	-
-	Science China	-	-	-	-
-	中国科学	-	-	-	-

A类会议

名称	全称	上次时间	下次时间	篇幅	官网
FAST	USENIX Conference on File and Storage Technologies	2022-2-22～24	2023-2-20～23	长文11页，短文6页	https://www.usenix.org/conference/fast22/technical-sessions
FPGA	ACM/SIGDA International Symposium on Field-Programmable Gate Arrays	2022-2-27~3-1 online	-	-	https://www.isfpga.org
ASPLOS	International Conference on Archltectural Support for Programming Languages and Operating Systems	2022-02-28～3-4	2023-2	-	https://asplos-conference.org/2022/
PPoPP	ACM SIGPLAN Symposium on Principles & Practice Of Parallel Programming		22-4-2~6 online		https://ppopp22.sigplan.org
HPCA	International Symposium on High-Performance Computer Architecture		2022-4-2～6 线上	-	https://hpca-conf.org/2022/
EuroSys	European Conference on Computer Systems		2022-4-5～8 法国	12页正文	https://2022.eurosys.org
SIGMETRICS	International Conference on Measurement and Modeling Of Computer Systems（计算机性能建模、分析与优化领域的顶级会议）		2022-6-6～10 india	12页正文	https://www.sigmetrics.org/index.shtml
ISCA	International Symposium on Computer Architecture	21-6-14～19	22-6-11～15	-	https://www.iscaconf.org/isca2021/program/
DAC	Design Automation Conference		22-7-10~14 USA	-	https://www.dac.com
USENIX ATC	USENIX Annul Technical Conference		2022-7-11~13 USA	长文11页，短文5页	https://www.usenix.org/conference/atc22
MICRO	IEEE/ACM International Symposium on Microarchitecture	2021-10-18~22 online	2022-10 USA	-	https://www.microarch.org/micro55/
SC	International Conference for High Performance Computing, Networking, Storage, and Analysis		2022-11-12~13 USA	-	https://sc22.supercomputing.org
综合或者交叉学科类A类会议
RECOMB	International Conference on Research in Computational Molecular Biology	2019-11-01	-	-	-
ISMB	International conference on Intelligent Systems for Molecular Biology	2020-01-30	-	-	-
WWW	International World Wide Web Conferences	2019-10-14	2020-1-10	长文12页，短文6页	https://www2020.thewebconf.org/
EC	ACM Conference on Economics and Computation	-	-	-	-

ASPLOS - 计算机系统领域顶级会议

Architectural Support for Programming Languages and Operating
Systems (ASPLOS)

ASPLOS（编程语言和操作系统的体系结构支持会议）是ACM开办的一个以体系结构为核心内容的多学科会议，其研究领域跨越硬件、体系结构、编译器、编程语言、操作系统、网络和应用，尤其关注这些学科间的交叉性研究课题。

ASPLOS的开会年份非常奇怪，82、87、89、91、92、94、96、98、00、02、04、06、08、09，既不是双年会，又不是但年会，还说不准奇数年或偶数年开会，真是个“不走寻常路”的会议。但ASPLOS绝对是一个精品会议，一年仅录用20多篇论文，几乎每篇都会受到计算机领域的大量引用。

ASPLOS从创办至今推动了RISC、RAID和大规模多处理器等多项技术的发展，影响力较大。

SC

一年一度的世界超算大会（International Conference for High Performance Computing, Networking, Storage and Analysis，简称SC) 会发布Top500

IISWC

IEEE International Symposium on Workload Characterization (IISWC)

这个会主要就是研究怎么更科学的设计、分析和评估workload，很多著名的benchmark都会在这个会上发布。

PMBS

IEEE/ACM Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS)

需要进一步的研究学习

https://researchain.net/ is what？

OSDI: USENIX Operating Systems Design and Implementation (26/2=13)

SOSP: ACM SIGOPS Symp on OS Principles (25/2=13)

ASPLOS: Architectural Support for Prog Lang and OS (31)

FAST: USENIX Conference on File and Storage Technologies (23)

Usenix: Annual Usenix Technical Conference (34)

OSDI，这是一个收录范围相当广的会议。提到OSDI，就得提到排名第11的另一
个会议： SOSP。这两个是OS最好的会议，每两年开一次，轮流开，比如今年是OSDI，那么
明年就是SOSP。由于这两个会议方向很广，因此影响很大。

在Architecture领域，最好的会议是ISCA，HPCA和MICRO。

遇到的问题

暂无

开题缘由、总结、反思、吐槽~~

参考文献

查询期刊 https://www.letpub.com.cn/index.php?page=journalapp

http://blog.sina.com.cn/s/blog_556a37e10100mdnc.html

https://www.zhihu.com/question/26583423

https://blog.csdn.net/chen_shiqiang/article/details/76167981

Posted 2022-03-12Updated 2025-01-30Architecture16 minutes read (About 2451 words)

AMD CPU

AMD history

超微半导体公司（英語：Advanced Micro Devices, Inc.；縮寫：AMD、超微，或譯「超威」），創立於1969年，是一家專注於微处理器及相關技術設計的跨国公司，总部位于美國加州舊金山灣區矽谷內的森尼韦尔市。

AMD EPYC 7452 32-Core Processor

由 AMD 于 2019 年年中设计和推出。是基于 Zen 2 微架构的多芯片处理器

> cat lscpu.txt              
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                64
On-line CPU(s) list:   0-63
Thread(s) per core:    1
Core(s) per socket:    32
Socket(s):             2
NUMA node(s):          2
Vendor ID:             AuthenticAMD
CPU family:            23
Model:                 49
Model name:            AMD EPYC 7452 32-Core Processor
Stepping:              0
CPU MHz:               2345.724
BogoMIPS:              4691.44
Virtualization:        AMD-V
L1d cache:             32K
L1i cache:             32K
L2 cache:              512K
L3 cache:              16384K
NUMA node0 CPU(s):     0-31
NUMA node1 CPU(s):     32-63
Flags:               
(Intel) fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht 

(AMD)   syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 

        constant_tsc art rep_good nopl nonstop_tsc extd_apicid aperfmperf eagerfpu 

(intel) pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand 

(AMD)   lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_l2 

        cpb cat_l3 cdp_l3 hw_pstate sme retpoline_amd 

        ssbd ibrs ibpb stibp 

        vmmcall 

(intel) fsgsbase bmi1 avx2 smep bmi2 cqm rdt_a rdseed adx smap clflushopt clwb sha_ni 

        xsaveopt xsavec xgetbv1 

(intel) cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local 

(AMD)   clzero irperf xsaveerptr 

        arat 

(AMD)   npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif 

(intel) umip 

(AMD)   overflow_recov succor smca

CPU\Thread\Socket

CPU(s):64 = the number of logical cores = “Thread(s) per core” × “Core(s) per socket” × “Socket(s)” = 1 * 32 * 2
One socket is one physical CPU package (which occupies one socket on the motherboard);
each socket hosts a number of physical cores, and each core can run one or more threads.
In this case, you have two sockets, each containing a 32-core AMD EPYC 7452 CPU, and since that not supports hyper-threading, each core just run a thread.

CPU flags

Intel-defined CPU features, CPUID level 0x00000001 (edx)

fpu：板载 FPU（浮点支持）
vme：虚拟 8086 模式增强功能
de: 调试扩展 (CR4.DE)
pse：页表大小扩展（4MB 内存页表）
tsc：时间戳计数器（RDTSC）
msr：特定模型的寄存器（RDMSR、WRMSR）
pae：物理地址扩展（支持超过 4GB 的 RAM）
mce：机器检查异常
cx8：CMPXCHG8 指令（64 位比较和交换）
apic：板载 APIC(Advanced Programmable Interrupt Controller)
sep：SYS ENTER/SYS EXIT
mtrr：内存类型范围寄存器
pge：页表全局启用（PDE 和 PTE 中的全局位）
mca：机器检查架构
cmov：CMOV 指令（条件移动）（也称为 FCMOV）
pat：页表属性表
pse36：36 位 PSE（大页表）
pn：处理器序列号
clflush：缓存行刷新指令
mmx：多媒体扩展
fxsr: FXSAVE/FXRSTOR, CR4.OSFXSR #  enables Streaming SIMD Extensions (SSE) instructions and fast FPU save & restore.
sse：英特尔 SSE 矢量指令
sse2：sse2
ht：超线程和/或多核

没有使用到的

ss：CPU自监听
tm：自动时钟控制（Thermal Monitor）
ia64：英特尔安腾架构 64 位（不要与英特尔的 64 位 x86 架构混淆，标志为 x86-64 或由标志 lm 指示的“AMD64”位）
pbe：Pending Break Enable（PBE# 引脚）唤醒支持

AMD-defined CPU features, CPUID level 0x80000001

syscall: SYSCALL (Fast System Call) and SYSRET (Return From Fast System Call)
nx：执行禁用 # NX 位（不执行）是 CPU 中使用的一项技术，用于分隔内存区域，以供处理器指令（代码）存储或数据存储使用
mmxext: AMD MMX extensions
fxsr_opt: FXSAVE/FXRSTOR optimizations
pdpe1gb: One GB pages (allows hugepagesz=1G)
rdtscp: Read Time-Stamp Counter and Processor ID
lm: Long Mode (x86-64: amd64, also known as Intel 64, i.e. 64-bit capable)

没有使用到的

1
2
3

mp: Multiprocessing Capable.
3dnowext: AMD 3DNow! extensions
3dnow: 3DNow! (AMD vector instructions, competing with Intel's SSE1)

Other features, Linux-defined mapping(映射？)

constant_tsc：TSC(Time Stamp Counter) 以恒定速率滴答
art: Always-Running Timer
rep_good：rep 微码运行良好
nopl: The NOPL (0F 1F) instructions # NOPL is long-sized bytes "do nothing" operation
nonstop_tsc: TSC does not stop in C states
extd_apicid: has extended APICID (8 bits) (Advanced Programmable Interrupt Controller)
aperfmperf: APERFMPERF # On x86 hardware, APERF and MPERF are MSR registers that can provide feedback on current CPU frequency.
eagerfpu: Non lazy FPU restore

Intel-defined CPU features, CPUID level 0x00000001 (ecx)

pni: SSE-3 (“2004年,新内核Prescott New Instructions”)
pclmulqdq: 执行四字指令的无进位乘法 - GCM 的加速器）
monitor: Monitor/Mwait support (Intel SSE3 supplements)
ssse3：补充 SSE-3
fma：融合乘加
cx16: CMPXCHG16B # double-width compare-and-swap (DWCAS) implemented by instructions such as x86 CMPXCHG16B
sse4_1：SSE-4.1
sse4_2：SSE-4.2
x2apic: x2APIC
movbe：交换字节指令后移动数据
popcnt：返回设置为1指令的位数的计数（汉明权，即位计数）
aes/aes-ni：高级加密标准（新指令）
xsave：保存处理器扩展状态：还提供 XGETBY、XRSTOR、XSETBY
avx：高级矢量扩展
f16c：16 位 fp 转换 (CVT16)
rdrand：从硬件随机数生成器指令中读取随机数

More extended AMD flags: CPUID level 0x80000001, ecx

lahf_lm：在长模式下从标志 (LAHF) 加载 AH 并将 AH 存储到标志 (SAHF)
cmp_legacy：如果是,超线程无效
svm：“安全虚拟机”：AMD-V
extapic：扩展的 APIC 空间
cr8_legacy：32 位模式下的 CR8
abm：高级位操作
sse4a：SSE-4A
misalignsse：指示当一些旧的 SSE 指令对未对齐的数据进行操作时是否产生一般保护异常 (#GP)。还取决于 CR0 和对齐检查位
3dnowprefetch：3DNow预取指令
osvw：表示 OS Visible Workaround，它允许 OS 绕过处理器勘误表。
ibs：基于指令的采样
xop：扩展的 AVX 指令
skinit：SKINIT/STGI 指令 # x86虚拟化的系列指令
wdt：看门狗定时器
tce：翻译缓存扩展
topoext：拓扑扩展 CPUID 叶
perfctr_core：核心性能计数器扩展
perfctr_nb：NB 性能计数器扩展
bpext：数据断点扩展
perfctr_l2：L2 性能计数器扩展

辅助标志：Linux 定义 - 用于分散在各种 CPUID 级别的功能

cpb：AMD 核心性能提升
cat_l3：缓存分配技术L3
cdp_l3：代码和数据优先级 L3
hw_pstate：AMD HW-PSstate Hardware P-state
sme：AMD 安全内存加密
retpoline_amd：AMD Retpoline 缓解 # 防止被攻击的安全策略

Virtualization flags: Linux defined

1	vmmcall：比 VMCALL 更喜欢 VMMCALL

Intel-defined CPU features, CPUID level 0x00000007:0 (ebx)

fsgsbase：{RD/WR}{FS/GS}BASE 指令
bmi1：第一 组位操作扩展
avx2: AVX2 instructions
smep：主管模式执行保护
bmi2：第二 组位操作扩展
cqm：缓存 QoS 监控(Quality of Service )
rdt_a：资源总监技术分配
rdseed：RDSEED 指令,RDRAND 用于仅需要高质量随机数的应用程序
adx：ADCX 和 ADOX 指令
smap：超级用户模式访问保护
clflushopt：CLFLUSHOPT 指令, Optimized CLFLUSH，优化的缓存行刷回, 能够把指定缓存行（Cache Line）从所有级缓存中淘汰，若该缓存行中的数据被修改过，则将该数据写入主存；支持现状：目前主流处理器均支持该指令。
clwb: CLWB instruction （Cache Line Write Back，缓存行写回）作用与 CLFLUSHOPT 相似，但在将缓存行中的数据写回之后，该缓存行仍将呈现为未被修改过的状态；支持现状
sha_ni: SHA1/SHA256 Instruction Extensions

扩展状态功能，CPUID 级别 0x0000000d:1 (eax)

1
2
3

xsaveopt: Optimized XSAVE
xsavec: XSAVEC 使用压缩保存处理器扩展状态
xgetbv1: XGETBV with ECX = 1

Intel-defined CPU QoS sub-leaf, CPUID level 0x0000000F:0 (edx)

cqm_llc: LLC QoS # last level cache (LLC)
cqm_occup_llc: LLC occupancy monitoring #  Memory Bandwidth Monitoring (MBM)
cqm_mbm_total: LLC total MBM monitoring
cqm_mbm_local: LLC local MBM monitoring

AMD-defined CPU features, CPUID level 0x80000008 (ebx)

1
2
3

clzero：CLZERO 指令,随 Zen 微体系结构引入的 AMD 供应商特定 x86 指令。CLZERO 通过向行中的每个字节写入零来清除由 RAX 寄存器中的逻辑地址指定的缓存行。
irperf：指令退休性能计数器
xsaveerptr：始终保存/恢复 FP 错误指针

Thermal and Power Management leaf, CPUID level 0x00000006 (eax)

1	arat: Always Running APIC Timer

AMD SVM 特征识别，CPUID 级别 0x8000000a (edx)

npt：AMD 嵌套页表支持
lbrv：AMD LBR 虚拟化支持
svm_lock：AMD SVM 锁定 MSR
nrip_save：AMD SVM next_rip 保存
tsc_scale：AMD TSC 缩放支持
vmcb_clean：AMD VMCB 清洁位支持
flushbyasid：AMD 逐个 ASID 支持
解码辅助：AMD 解码辅助支持
pausefilter: AMD 过滤暂停拦截
pfthreshold：AMD 暂停过滤器阈值
avic：虚拟中断控制器
vmsave_vmload：虚拟 VMSAVE VMLOAD
vgif：虚拟 GIF

Intel-defined CPU features, CPUID level 0x00000007:0 (ecx)

1	umip：用户模式指令保护

AMD-defined CPU features, CPUID level 0x80000007 (ebx)

1
2
3

overflow_recov：MCA 溢出恢复支持 # Machine Check Architecture (MCA)
succor：不可纠正的错误控制和恢复
smca：可扩展的 MCA

不知道的flags

ssbd ibrs ibpb stibp

Processor P-states and C-states

英特尔处理器支持多种技术来优化功耗。在本文中，我们概述了 p 状态（运行期间电压和 CPU 频率的优化）和 c 状态（如果内核不必执行任何指令，则优化功耗）。

ADCX 和 ADOX

ADCX
将两个无符号整数加上进位，从进位标志中读取进位，并在必要时将其设置在那里。不影响进位以外的其他标志。
ADOX
将两个无符号整数加上进位，从溢出标志中读取进位，并在必要时将其设置在那里。不影响溢出以外的其他标志。

需要进一步的研究学习

暂无

遇到的问题

暂无

参考文献

https://unix.stackexchange.com/questions/43539/what-do-the-flags-in-proc-cpuinfo-mean