Experiments For PIM Motivation
Address Interleaving 地址交织
Observation on PIM/NDP
page walks
- First, the large number of memory chips and the arbitrary distribution of page table entries make page walks involve expensive cross-chip traffic. [^1]
- Second, the lack of deep cache hierarchies limits the caching of page table entries close to the MPUs.
- Third, the lean nature of the MPU cores (due to the tight power and area constraints) precludes integrating expensive hardware to overlap page walks with useful work.
local and remote access latency
In these systems, a memory access from an MPU within its memory partition is much cheaper than accessing a remote one, as the latter involves traversing expensive NoC and cross-chip interconnects. [^1]
Fig. 4.2
compares the average end-to-end memory access latency depending on the target data’s location:
- same partition, (local access in same vault)
- different partition, same chip, (remote access but in same HMC chip)
- any chip in the network. (remote access but in diff HMC chip)
The three cases are labeled as Partition, Chip, and Network respectively.
参考文献
[^1]: PACT’17 Near-Memory Address Translation
[^2]: 所有内存计算都是骗人的的的 <(`^´)>
Experiments For PIM Motivation
http://icarus.shaojiemike.top/2023/11/14/Work/Architecture/PIM/experimentsForPIMMotivation/