Performance Optimization on GPGPU & Multicore CPU using Roofline Model: A Recent Study
Novel Research Aspects in Mathematical and Computer Science Vol. 4,
28 May 2022
,
Page 8-16
https://doi.org/10.9734/bpi/nramcs/v4/16040D
Abstract
In this chapter, the roofline model is used to determine the optimum optimized platform for training a neural network that recognizes handwritten digits in a multicore CPU and general purpose GPU (GPGPU) hardware environment. For the MNIST dataset, the pattern parallel training technique is used. The training of MNIST's parallel network utilizing several data layouts on multicore CPU and GPGPU is demonstrated. The roofline model has been used to explain several bottlenecks. As this roofline model is so simple, it can be implemented quickly. The best platform is chosen based on layouts and constraints, such as memory or compute limits. All rooflines' computational intensity is shifted to the right, and subsequently performance is improved. The most appropriate hardware platform is chosen as a result of optimization and the diversity of available data size, core number, and operational strength.
- Roofline model
- multicore CPU
- GPU
- parallel work