A method for prediction execution time of GPU programs


如何引用文章

全文:

开放存取 开放存取
受限制的访问 ##reader.subscriptionAccessGranted##
受限制的访问 订阅存取

详细

The use of coprocessors such as GPU and FPGA is a leading trend in HPC. Therefore a lot of applications from a wide variety of domains were modified for GPUs and successfully used. In this paper, we propose an approach for prediction execution time of CUDA kernels, based on a static analysis of a program source code. The approach is based on building a CUDA core model and a graphics accelerator model. The developed method for estimating the execution time of CUDA kernels is applied to the implementation of matrix multiplication, the Fourier transform and the backpropagation method for training neural networks. As a result of verification, the approach showed good prediction accuracy, especially on low GPU loads.

全文:

受限制的访问

作者简介

Andrey Kleimenov

Lomonosov Moscow State University

Email: andreykleimenov@mail.ru
PhD student at the Department of Computational Mathematics and Cybernetics Moscow, Russian Federation

Nina Popova

Lomonosov Moscow State University

Email: popova@cs.msu.ru
Cand. Sci. (Eng.); associate professor at the Department of Computational Mathematics and Cybernetics Moscow, Russian Federation

参考

  1. Alavani G., Varma K., Sarkar S. Predicting Execution Time of CUDA Kernel Using Static Analysis. IEEE Intl. Conf. on Parallel & Distributed Processing with Applications, Ubiquitous Computing & Communications, Big Data & Cloud Computing, Social Computing & Networking, Sustainable Computing & Communications. 2018. Pp. 948-955. URL: ISPA/IUCC/BDCloud/SocialCom/SustainCom
  2. Arafa Y. et al. Low Overhead Instruction Latency Characterization for NVIDIA GPGPUs. IEEE High Performance Extreme Computing Conference (HPEC). 2019. Pp. 1-8.
  3. Baghsorkhi S.S. et al. An adaptive performance modeling tool for GPU architectures. ACM SIGPLAN Not. 2010. Vol. 45. No. 5. Pp. 105-114.
  4. Bakhoda A. et al. Analyzing CUDA workloads using a detailed GPU simulator. IEEE International Symposium on Performance Analysis of Systems and Software. 2009. Pp. 163-174.
  5. Che S. et al. Rodinia: A benchmark suite for heterogeneous computing. IEEE International Symposium on Workload Characterization (IISWC). 2009. Pp. 44-54.
  6. Collange S. et al. Barra: A Parallel Functional Simulator for GPGPU. IEEE International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems. 2010. Pp. 351-360.
  7. Hlavac M. FFT-cuda [Electronic resource]. URL: https://github.com/mmajko/FFT-cuda (дата обращения: 12.01.2021).
  8. Hong S., Kim H. An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness. ACM SIGARCH Comput. Archit. News. 2009. Vol. 37. No. 3. С. 152-163.
  9. Jia W., Shaw K.A., Martonosi M. Stargazer: Automated regression-based GPU design space exploration. IEEE International Symposium on Performance Analysis of Systems & Software. 2012. Pp. 2-13.
  10. Jia Z. et al. Dissecting the NVIDIA Volta GPU Architecture via Microbenchmarking. arXiv. 2018.
  11. Konstantinidis E., Cotronis Y. A quantitative roofline model for GPU kernel performance estimation using micro-benchmarks and hardware metric profiling. J. Parallel Distrib. Comput. 2017. Vol. 107. Pp. 37-56.
  12. Lattner C., Adve V. LLVM: A compilation framework for lifelong program analysis & transformation. International Symposium on Code Generation and Optimization, 2004. 2004. Pp. 75-86.
  13. Malhotra G., Goel S., Sarangi S.R. GpuTejas: A parallel simulator for GPU architectures. 21st International Conference on High Performance Computing, HiPC 2014. 2014.
  14. Mei X., Chu X. Dissecting GPU Memory Hierarchy Through Microbenchmarking. IEEE Trans. Parallel Distrib. Syst. 2017. Vol. 28. No. 1. Pp. 72-86.
  15. Sim J. et al. A performance analysis framework for identifying potential benefits in GPGPU applications. In: Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming - PPoPP’12. New York, USA: ACM Press, 2012. P. 11.
  16. Wu G. et al. GPGPU performance and power estimation using machine learning. IEEE 21st International Symposium on High Performance Computer Architecture (HPCA). 2015. Pp. 564-576.
  17. Zhang Y., Owens J.D. A quantitative performance analysis model for GPU architectures. IEEE 17th International Symposium on High Performance Computer Architecture. 2011. Pp. 382-393.
  18. Клейменов А.А., Попова Н.Н. Статически-детерминированный метод прогнозирования динамических характеристик параллельных программ // Вестн. ЮУрГУ. Сер.: Выч. матем. информ. 2021. Т. 10. № 1. С. 20-31. [Kleymenov A.A., Popova N.N. A method for prediction dynamic characteristics of parallel programs based on static analysis. Bulletin of the South Ural State University. Series: Computational Mathematics and Software Engineering. 2021. Vol. 10. No. 1. Pp. 20-31. (In Russ.)]
  19. Nvidia GeForce GTX 1050 [Electronic resource]. URL: https://www.nvidia.com/en-in/geforce/products/10series/geforce-gtx-1050/ (access date: 12.01.2021).
  20. CUDA C++ Programming Guide [Electronic resource]. URL: https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html (access date: 12.01.2021).

补充文件

附件文件
动作
1. JATS XML


##common.cookie##