MAPS: GPU Memory Abstraction and Optimization Framework
MAPS Framework


MAPS Framework Performance



Single GPU Matrix Multiplication
The following graph depicts the performance of single-precision matrix multiplication using MAPS, measured on a Tesla K40 GPU:



Multi-GPU Scaling
The incremental speedup of running three different multi-GPU applications over MAPS (Game of Life, Histogram, CUBLAS Matrix Multiplication) on various GPU architectures (GTX 780, Titan Black, Tesla K40m, GTX 980) is shown below:

Maximal speedup is 3.94x on 4 GPUs.


Instruction-Level Parallelism (ILP)
The figure below shows the performance of automatic ILP optimizations, i.e., computing multiple elements per thread, on various GPU architectures using the Game of Life code sample:



Kernel Fusion
The figure below demonstrates the performance of a single kernel that convolves an image and computes its histogram.
The performance of the fused kernel is compared to other libraries (NPP and CUB) in the following graph. The kernel was tested on both the Kepler architecture (NVIDIA Tesla K40c) and the Maxwell architecture (NVIDIA GeForce 750 Ti).