An article for none-Japanese readers.... What I'm recently working on is a feature to load data blocks from NVMe-SSD to GPU using peer-to-peer DMA. It allows to bypass the CPU/RAM under a series of data loading process, thus, also allows to reduce number of expensive data copy. Once data blocks are loaded onto the GPU's RAM, we can process individual records within the blocks, by thousands process