A method for prediction execution time of GPU programs

Andrey Anatolievich Kleimenov; Клейменов Анатолий Анатольевич; Nina Nikolaevna Popova; Попова Нина Николаевна

doi:10.33693/2313-223X-2021-8-1-38-45

A method for prediction execution time of GPU programs

Autores: Kleimenov A.A.¹, Popova N.N.¹
Afiliações:
1. Lomonosov Moscow State University
Edição: Volume 8, Nº 1 (2021)
Páginas: 38-45
Seção: Articles
URL: https://journals.eco-vector.com/2313-223X/article/view/529812
DOI: https://doi.org/10.33693/2313-223X-2021-8-1-38-45
ID: 529812

Citar

Texto integral

Acesso aberto
Acesso é fechado

Acesso está concedido
Acesso é fechado

Acesso é pago ou somente para assinantes

Resumo
Texto integral
Sobre autores
Bibliografia
Arquivos suplementares
Estatísticas

Resumo

The use of coprocessors such as GPU and FPGA is a leading trend in HPC. Therefore a lot of applications from a wide variety of domains were modified for GPUs and successfully used. In this paper, we propose an approach for prediction execution time of CUDA kernels, based on a static analysis of a program source code. The approach is based on building a CUDA core model and a graphics accelerator model. The developed method for estimating the execution time of CUDA kernels is applied to the implementation of matrix multiplication, the Fourier transform and the backpropagation method for training neural networks. As a result of verification, the approach showed good prediction accuracy, especially on low GPU loads.

Palavras-chave

performance analysis, CUDA-kernel, static analysis, GPU model

Texto integral

Sobre autores

Andrey Kleimenov

Lomonosov Moscow State University

Email: andreykleimenov@mail.ru
PhD student at the Department of Computational Mathematics and Cybernetics Moscow, Russian Federation

Nina Popova

Lomonosov Moscow State University

Email: popova@cs.msu.ru
Cand. Sci. (Eng.); associate professor at the Department of Computational Mathematics and Cybernetics Moscow, Russian Federation

Bibliografia

Alavani G., Varma K., Sarkar S. Predicting Execution Time of CUDA Kernel Using Static Analysis. IEEE Intl. Conf. on Parallel & Distributed Processing with Applications, Ubiquitous Computing & Communications, Big Data & Cloud Computing, Social Computing & Networking, Sustainable Computing & Communications. 2018. Pp. 948-955. URL: ISPA/IUCC/BDCloud/SocialCom/SustainCom
Arafa Y. et al. Low Overhead Instruction Latency Characterization for NVIDIA GPGPUs. IEEE High Performance Extreme Computing Conference (HPEC). 2019. Pp. 1-8.
Baghsorkhi S.S. et al. An adaptive performance modeling tool for GPU architectures. ACM SIGPLAN Not. 2010. Vol. 45. No. 5. Pp. 105-114.
Bakhoda A. et al. Analyzing CUDA workloads using a detailed GPU simulator. IEEE International Symposium on Performance Analysis of Systems and Software. 2009. Pp. 163-174.
Che S. et al. Rodinia: A benchmark suite for heterogeneous computing. IEEE International Symposium on Workload Characterization (IISWC). 2009. Pp. 44-54.
Collange S. et al. Barra: A Parallel Functional Simulator for GPGPU. IEEE International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems. 2010. Pp. 351-360.
Hlavac M. FFT-cuda [Electronic resource]. URL: https://github.com/mmajko/FFT-cuda (дата обращения: 12.01.2021).
Hong S., Kim H. An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness. ACM SIGARCH Comput. Archit. News. 2009. Vol. 37. No. 3. С. 152-163.
Jia W., Shaw K.A., Martonosi M. Stargazer: Automated regression-based GPU design space exploration. IEEE International Symposium on Performance Analysis of Systems & Software. 2012. Pp. 2-13.
Jia Z. et al. Dissecting the NVIDIA Volta GPU Architecture via Microbenchmarking. arXiv. 2018.
Konstantinidis E., Cotronis Y. A quantitative roofline model for GPU kernel performance estimation using micro-benchmarks and hardware metric profiling. J. Parallel Distrib. Comput. 2017. Vol. 107. Pp. 37-56.
Lattner C., Adve V. LLVM: A compilation framework for lifelong program analysis & transformation. International Symposium on Code Generation and Optimization, 2004. 2004. Pp. 75-86.
Malhotra G., Goel S., Sarangi S.R. GpuTejas: A parallel simulator for GPU architectures. 21st International Conference on High Performance Computing, HiPC 2014. 2014.
Mei X., Chu X. Dissecting GPU Memory Hierarchy Through Microbenchmarking. IEEE Trans. Parallel Distrib. Syst. 2017. Vol. 28. No. 1. Pp. 72-86.
Sim J. et al. A performance analysis framework for identifying potential benefits in GPGPU applications. In: Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming - PPoPP’12. New York, USA: ACM Press, 2012. P. 11.
Wu G. et al. GPGPU performance and power estimation using machine learning. IEEE 21st International Symposium on High Performance Computer Architecture (HPCA). 2015. Pp. 564-576.
Zhang Y., Owens J.D. A quantitative performance analysis model for GPU architectures. IEEE 17th International Symposium on High Performance Computer Architecture. 2011. Pp. 382-393.
Клейменов А.А., Попова Н.Н. Статически-детерминированный метод прогнозирования динамических характеристик параллельных программ // Вестн. ЮУрГУ. Сер.: Выч. матем. информ. 2021. Т. 10. № 1. С. 20-31. [Kleymenov A.A., Popova N.N. A method for prediction dynamic characteristics of parallel programs based on static analysis. Bulletin of the South Ural State University. Series: Computational Mathematics and Software Engineering. 2021. Vol. 10. No. 1. Pp. 20-31. (In Russ.)]
Nvidia GeForce GTX 1050 [Electronic resource]. URL: https://www.nvidia.com/en-in/geforce/products/10series/geforce-gtx-1050/ (access date: 12.01.2021).
CUDA C++ Programming Guide [Electronic resource]. URL: https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html (access date: 12.01.2021).

Arquivos suplementares

Ação

1. JATS XML

Baixar

Nome de usuário
Senha
Lembrar usuário

Esqueceu a senha?	Cadastro

Nome de usuário
Senha
Lembrar usuário

Esqueceu a senha?	Cadastro