A method for prediction execution time of GPU programs

Andrey Anatolievich Kleimenov; Клейменов Анатолий Анатольевич; Nina Nikolaevna Popova; Попова Нина Николаевна

doi:10.33693/2313-223X-2021-8-1-38-45

A method for prediction execution time of GPU programs

Authors: Kleimenov A.A.¹, Popova N.N.¹
Affiliations:
1. Lomonosov Moscow State University
Issue: Vol 8, No 1 (2021)
Pages: 38-45
Section: Articles
URL: https://journals.eco-vector.com/2313-223X/article/view/529812
DOI: https://doi.org/10.33693/2313-223X-2021-8-1-38-45
ID: 529812

Cite item

Full Text

Open Access
Restricted Access

Access granted
Restricted Access

Subscription or Fee Access

Abstract
Full Text
About the authors
References
Supplementary files
Statistics

Abstract

The use of coprocessors such as GPU and FPGA is a leading trend in HPC. Therefore a lot of applications from a wide variety of domains were modified for GPUs and successfully used. In this paper, we propose an approach for prediction execution time of CUDA kernels, based on a static analysis of a program source code. The approach is based on building a CUDA core model and a graphics accelerator model. The developed method for estimating the execution time of CUDA kernels is applied to the implementation of matrix multiplication, the Fourier transform and the backpropagation method for training neural networks. As a result of verification, the approach showed good prediction accuracy, especially on low GPU loads.

Keywords

performance analysis, CUDA-kernel, static analysis, GPU model

Full Text

About the authors

Andrey Anatolievich Kleimenov

Lomonosov Moscow State University

Email: andreykleimenov@mail.ru
PhD student at the Department of Computational Mathematics and Cybernetics Moscow, Russian Federation

Nina Nikolaevna Popova

Lomonosov Moscow State University

Email: popova@cs.msu.ru
Cand. Sci. (Eng.); associate professor at the Department of Computational Mathematics and Cybernetics Moscow, Russian Federation

References

Alavani G., Varma K., Sarkar S. Predicting Execution Time of CUDA Kernel Using Static Analysis. IEEE Intl. Conf. on Parallel & Distributed Processing with Applications, Ubiquitous Computing & Communications, Big Data & Cloud Computing, Social Computing & Networking, Sustainable Computing & Communications. 2018. Pp. 948-955. URL: ISPA/IUCC/BDCloud/SocialCom/SustainCom
Arafa Y. et al. Low Overhead Instruction Latency Characterization for NVIDIA GPGPUs. IEEE High Performance Extreme Computing Conference (HPEC). 2019. Pp. 1-8.
Baghsorkhi S.S. et al. An adaptive performance modeling tool for GPU architectures. ACM SIGPLAN Not. 2010. Vol. 45. No. 5. Pp. 105-114.
Bakhoda A. et al. Analyzing CUDA workloads using a detailed GPU simulator. IEEE International Symposium on Performance Analysis of Systems and Software. 2009. Pp. 163-174.
Che S. et al. Rodinia: A benchmark suite for heterogeneous computing. IEEE International Symposium on Workload Characterization (IISWC). 2009. Pp. 44-54.
Collange S. et al. Barra: A Parallel Functional Simulator for GPGPU. IEEE International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems. 2010. Pp. 351-360.
Hlavac M. FFT-cuda [Electronic resource]. URL: https://github.com/mmajko/FFT-cuda (дата обращения: 12.01.2021).
Hong S., Kim H. An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness. ACM SIGARCH Comput. Archit. News. 2009. Vol. 37. No. 3. С. 152-163.
Jia W., Shaw K.A., Martonosi M. Stargazer: Automated regression-based GPU design space exploration. IEEE International Symposium on Performance Analysis of Systems & Software. 2012. Pp. 2-13.
Jia Z. et al. Dissecting the NVIDIA Volta GPU Architecture via Microbenchmarking. arXiv. 2018.
Konstantinidis E., Cotronis Y. A quantitative roofline model for GPU kernel performance estimation using micro-benchmarks and hardware metric profiling. J. Parallel Distrib. Comput. 2017. Vol. 107. Pp. 37-56.
Lattner C., Adve V. LLVM: A compilation framework for lifelong program analysis & transformation. International Symposium on Code Generation and Optimization, 2004. 2004. Pp. 75-86.
Malhotra G., Goel S., Sarangi S.R. GpuTejas: A parallel simulator for GPU architectures. 21st International Conference on High Performance Computing, HiPC 2014. 2014.
Mei X., Chu X. Dissecting GPU Memory Hierarchy Through Microbenchmarking. IEEE Trans. Parallel Distrib. Syst. 2017. Vol. 28. No. 1. Pp. 72-86.
Sim J. et al. A performance analysis framework for identifying potential benefits in GPGPU applications. In: Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming - PPoPP’12. New York, USA: ACM Press, 2012. P. 11.
Wu G. et al. GPGPU performance and power estimation using machine learning. IEEE 21st International Symposium on High Performance Computer Architecture (HPCA). 2015. Pp. 564-576.
Zhang Y., Owens J.D. A quantitative performance analysis model for GPU architectures. IEEE 17th International Symposium on High Performance Computer Architecture. 2011. Pp. 382-393.
Клейменов А.А., Попова Н.Н. Статически-детерминированный метод прогнозирования динамических характеристик параллельных программ // Вестн. ЮУрГУ. Сер.: Выч. матем. информ. 2021. Т. 10. № 1. С. 20-31. [Kleymenov A.A., Popova N.N. A method for prediction dynamic characteristics of parallel programs based on static analysis. Bulletin of the South Ural State University. Series: Computational Mathematics and Software Engineering. 2021. Vol. 10. No. 1. Pp. 20-31. (In Russ.)]
Nvidia GeForce GTX 1050 [Electronic resource]. URL: https://www.nvidia.com/en-in/geforce/products/10series/geforce-gtx-1050/ (access date: 12.01.2021).
CUDA C++ Programming Guide [Electronic resource]. URL: https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html (access date: 12.01.2021).

Supplementary files

Supplementary Files

Action

1. JATS XML

Download

Username
Password
Remember me

Forgot password?	Register

Username
Password
Remember me

Forgot password?	Register