Bienvenue sur Scribd !

GPGPU Sim Tutorial

Transféré par

0% ont trouvé ce document utile (0 vote)

551 vues28 pages

This document provides an overview of GPGPU-Sim, a simulator for GPU microarchitecture. It describes the functional PTX and SASS models, as well as the timing model that simulates a GPU running CUDA kernels. It outlines two demos - the first covers setup and configuration, while the second analyzes scheduling policies by monitoring warp scheduling order. The document also details the GPU architecture modeled, including pipeline stages, the memory system, and branch divergence handling.

Description originale:

GPGPU sim tutorial

Copyright

Formats disponibles

PDF, TXT ou lisez en ligne sur Scribd

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Signaler ce document

Droits d'auteur :

Formats disponibles

Téléchargez comme PDF, TXT ou lisez en ligne sur Scribd

Signaler comme contenu inapproprié

0% ont trouvé ce document utile (0 vote)

551 vues28 pages

GPGPU Sim Tutorial

Transféré par

Mohan Kumar N

Droits d'auteur :

Formats disponibles

Téléchargez comme PDF, TXT ou lisez en ligne sur Scribd

Signaler comme contenu inapproprié

Passer à la page

Vous êtes sur la page 1sur 28

Rechercher à l'intérieur du document

GPGPU-Sim Tutorial

Zhen Lin
North Carolina State University
Based on GPGPU-Sim Tutorial and Manual by UBC

Outline
GPGPU-Sim Overview
Demo1: Setup & Configuration
GPGPU-Sim Internals
Demo2: Scheduling Study

GPGPU-Sim in a Nutshell
Microarchitecture timing model of contemporary GPUs
Run unmodified CUDA/OpenCL

What GPGPU-Sim Simulates

Functional model
PTX
SASS

Timing model for the compute part of a GPU

Not for CPU or PCIe
Only model microarchitecture timing relevant to compute

Functional model
PTX
A low-level, data-parallel virtual machine and instruction set architecture (ISA)
Between CUDA and hardware ISA (SASS)
Stable ISA that spans multiple GPU generations

SASS/PTXPLUS
Hardware native ISA
PTX -> Translate + Optimize -> SASS
More accurate, but not well supported

CUDA tool chain

Functional Model (PTX)

Scalar ISA
SSA representation: register allocation not done in PTX

Timing Model for GPU Micro-Architecture

GPGPU-Sim simulates the timing model
of a GPU running each launched CUDA
kernel
Report stats (e.g. # cycles) for each kernel
Exclude any time spent on data transfer
on PCIe bus
CPU is assumed to be idle when the GPU
is working

Compilation Path

Outline
GPGPU-Sim Overview
Demo1: Setup & Configuration
GPGPU-Sim Internals
Demo2: Scheduling Study

Demo1
Setup
Stats
Configuration

Outline
GPGPU-Sim Overview
Demo1: Setup & Configuration
GPGPU-Sim Internals
Demo2: Scheduling Study

Overview of the Architecture

Inside a SIMT Core

Pipeline stages

Fetch
Decode
Issue
Read operand
Execution
Writeback

Fetch + Decode
Arbitrate the I-cache
among warps
Cache miss handled by
fetching again later

Fetched instruction is
decoded and then
stored in the I-Buffer
1 or more entries / warp
Only warp with vacant
entries are considered in
fetch

Issue
Selects a warp with a ready
instruction
Acquires the activemask
from TOS of SIMT stack
Invalid the I-buffer

Scoreboard
Checks for RAW and WAW
dependency hazard
Flag instructions with hazards as not ready in I-Buffer
(masking them out from the scheduler)

Instructions reserves dest registers at issue

Release them at writeback

December 2012

GPGPU-Sim Tutorial (MICRO 2012) 4: Microarchitecture Model

4.17

Read Operand
Bank 0

Bank 1

Bank 2

Bank 3

R10

R11

add.s32 R3, R1, R2;

No Conflict

mul.s32 R3, R0, R4;

Conflict at bank 0

Operand Collector Architecture (US Patent: 7834881)

Interleave operand fetch from different threads to achieve full utilization

December 2012

GPGPU-Sim Tutorial (MICRO 2012) 4: Microarchitecture Model

4.18

Operand Collector
(from instruction issue stage)
dispatch

December 2012

GPGPU-Sim Tutorial (MICRO 2012) 4: Microarchitecture Model

4.19

Execution
ALU
Stream processor (SP)
Specific function unit (SFU)

MEM

Shared memory
Local memory
Global memory
Texture memory
Constant memory

ALU Pipelines
SIMD Execution Unit
Fully Pipelined
Each pipe may execute a subset of instructions
Configurable bandwidth and latency (depending on the instruction)
Default: SP + SFU pipes

December 2012

GPGPU-Sim Tutorial (MICRO 2012) 4: Microarchitecture Model

4.21

Memory Unit

Double clock the unit

Each cycle service half the
warp

A
G
U

Bank
Conflict

Shared MSHR
Mem

Access
Coalesc.

Data
Cache

Has a private writeback

path

December 2012

GPGPU-Sim Tutorial (MICRO 2012) 4: Microarchitecture Model

Const.
Cache
Texture
Cache

Memory Port

Model timing for memory

instructions
Support half-warp (16
threads)

4.22

Writeback
Write result to register file
Scoreboard updates the r-bit

Stack-Based Branch Divergence Hardware

When the branch diverge

New entries are pushed to SIMT stack

RPC set to the immediate post dominator
Activemast indicates which threads are active
PC is sent to fetch unit

When RPC is reached

Pop the TOS
PC of new TOS is sent to the fetch unit

Outline
GPGPU-Sim Overview
Demo1: Setup & Configuration
GPGPU-Sim Internals
Demo2: Scheduling Study

Demo2
Software framework overview
To monitor the warp scheduling order
Compare with different scheduling policies

For More Information

http://www.gpgpu-sim.org/

Thanks & question?

Vous aimerez peut-être aussi

SD Card Projects Using the PIC Microcontroller
D'Everand
SD Card Projects Using the PIC Microcontroller
Dogan Ibrahim
Pas encore d'évaluation
Power Point
Document35 pages
Power Point
ALEX BENJAMIN FARFAN FARFAN HUARANCCA
Pas encore d'évaluation
PLC: Programmable Logic Controller – Arktika.: EXPERIMENTAL PRODUCT BASED ON CPLD.
D'Everand
PLC: Programmable Logic Controller – Arktika.: EXPERIMENTAL PRODUCT BASED ON CPLD.
MARIO FRANCO
Pas encore d'évaluation
Arggga
Document14 pages
Arggga
Rajagopal R
Pas encore d'évaluation
Offload Solutions For 5G-NRr
Document10 pages
Offload Solutions For 5G-NRr
Nguyen Anh Duc
Pas encore d'évaluation
2021 08 26 High Performance GPU Tensor CoreCode Generation For Matmul Using MLIR
Document57 pages
2021 08 26 High Performance GPU Tensor CoreCode Generation For Matmul Using MLIR
aniwats
Pas encore d'évaluation
GPU Architecture
Document12 pages
GPU Architecture
Hetvi Shah
Pas encore d'évaluation
Why GPU?: CS8803SC Software and Hardware Cooperative Computing
Document14 pages
Why GPU?: CS8803SC Software and Hardware Cooperative Computing
Sohei La
Pas encore d'évaluation
TABLE 3.1 Optimized Designs Provide Better Area - Time Performance at The Expense of Design Time. Type of Design Design Level Relative Expected Area × Time
Document6 pages
TABLE 3.1 Optimized Designs Provide Better Area - Time Performance at The Expense of Design Time. Type of Design Design Level Relative Expected Area × Time
Alex Postiniuc
Pas encore d'évaluation
Introduction To GP-GPU and CUDA: High Performance Computing Center Hanoi University of Science & Technology
Document43 pages
Introduction To GP-GPU and CUDA: High Performance Computing Center Hanoi University of Science & Technology
Mato Nguyễn
Pas encore d'évaluation
Asiabsdcon2007 Cluster Tutorial A4
Document87 pages
Asiabsdcon2007 Cluster Tutorial A4
Nixbie (Pemula yg serba Kepo)
Pas encore d'évaluation
FPGA
Document16 pages
FPGA
dayat_riders
Pas encore d'évaluation
GPU Vs FPGA
Document18 pages
GPU Vs FPGA
NehaKarunya
Pas encore d'évaluation
Embedded Systems Unit I Part A (2 Marks)
Document47 pages
Embedded Systems Unit I Part A (2 Marks)
Vijaya Kumar
Pas encore d'évaluation
Unit I Embedded System
Document42 pages
Unit I Embedded System
karthick707
Pas encore d'évaluation
Lin Zhong Seminar Slides
Document77 pages
Lin Zhong Seminar Slides
UCSBiee
Pas encore d'évaluation
GPU Computing For Data Science - John Joo
Document34 pages
GPU Computing For Data Science - John Joo
Fabio
Pas encore d'évaluation
Rodrigo Freire - Rhel 6 Performance & Tuning
Document37 pages
Rodrigo Freire - Rhel 6 Performance & Tuning
Filipe Luciano
Pas encore d'évaluation
12c508 PDF
Document84 pages
12c508 PDF
Marcelino Rodriguez
Pas encore d'évaluation
Aca Notes
Document23 pages
Aca Notes
Sriram Janakiraman
Pas encore d'évaluation
Unit 4
Document26 pages
Unit 4
Dr. M V GANESWARA RAO
Pas encore d'évaluation
Nios2 Introduction
Document27 pages
Nios2 Introduction
cointoin
Pas encore d'évaluation
Openmsp430: Author: Olivier Girard
Document129 pages
Openmsp430: Author: Olivier Girard
UNIX destroyer
Pas encore d'évaluation
Pic Microcontroller Unit Iii: Mr. S. Vinod Assistant Professor Eee Department
Document92 pages
Pic Microcontroller Unit Iii: Mr. S. Vinod Assistant Professor Eee Department
Vinod Srinivasan
Pas encore d'évaluation
Analysis of Programs For GPGPU Architectures: Lakshmi M. Gadhikar Dr. Y.S. Rao
Document4 pages
Analysis of Programs For GPGPU Architectures: Lakshmi M. Gadhikar Dr. Y.S. Rao
Miguel Ramírez Carrillo
Pas encore d'évaluation
DR Tahir Zaidi: Targets For Algorithms
Document37 pages
DR Tahir Zaidi: Targets For Algorithms
Bilal Awan
Pas encore d'évaluation
Fpga Design Using VHDL Course - Lec1
Document21 pages
Fpga Design Using VHDL Course - Lec1
نعم سالم محمد شيت neam salim Mohammed sheet
Pas encore d'évaluation
Edt180 Techtalk
Document13 pages
Edt180 Techtalk
api-446563821
Pas encore d'évaluation
Gpu Cuda Part1
Document27 pages
Gpu Cuda Part1
Raghav Ganesh
Pas encore d'évaluation
Analysis of Programs For GPGPU Architectures
Document4 pages
Analysis of Programs For GPGPU Architectures
agcar
Pas encore d'évaluation
Unit 6 ECE131 - Part 4 - K1
Document66 pages
Unit 6 ECE131 - Part 4 - K1
abhi shek
Pas encore d'évaluation
Core of Emb-Sys
Document52 pages
Core of Emb-Sys
Sushrut Zemse
Pas encore d'évaluation
Graphic Processing Unit
Document20 pages
Graphic Processing Unit
Prashant Barve
100% (1)
02-General Purpose Processors
Document37 pages
02-General Purpose Processors
waqar khan77
Pas encore d'évaluation
Socunit 1
Document65 pages
Socunit 1
Sooraj Sattiraju
Pas encore d'évaluation
Introduction To Programming Massively Parallel Graphics Processors
Document84 pages
Introduction To Programming Massively Parallel Graphics Processors
djrive
Pas encore d'évaluation
TU00003 003 How To Generate Code For The Ezdsp F2812 Using Simulink PDF
Document18 pages
TU00003 003 How To Generate Code For The Ezdsp F2812 Using Simulink PDF
khethan
Pas encore d'évaluation
Es Unit1
Document83 pages
Es Unit1
venneti kiran
Pas encore d'évaluation
Raspberry Pi Architecture: Jon Holton and Tim Fratangelo
Document18 pages
Raspberry Pi Architecture: Jon Holton and Tim Fratangelo
Elmer
Pas encore d'évaluation
Gpu Programming
Document96 pages
Gpu Programming
Jino Goju Stark
100% (2)
Elc2009 Qemu Cris
Document43 pages
Elc2009 Qemu Cris
ma haijun
Pas encore d'évaluation
001 29 Spartan FPGA Implementation
Document43 pages
001 29 Spartan FPGA Implementation
Muhammadimran Ali
Pas encore d'évaluation
DES-N1993 Leverage NXP's Board Solutions To Enable Product Development
Document28 pages
DES-N1993 Leverage NXP's Board Solutions To Enable Product Development
technomindsteminds
Pas encore d'évaluation
Readme PDF
Document5 pages
Readme PDF
Ryan Reas
100% (1)
Week1 Hacettepe
Document51 pages
Week1 Hacettepe
ravi_talawar-1
Pas encore d'évaluation
Superscalar Architectures: COMP375 Computer Architecture and Organization
Document35 pages
Superscalar Architectures: COMP375 Computer Architecture and Organization
Indumathi Elayaraja
Pas encore d'évaluation
Chapter 16
Document60 pages
Chapter 16
Daksh Bothra
Pas encore d'évaluation
White Paper: CSG2 Key Performance Indicators CSG2 Release 3.5
Document15 pages
White Paper: CSG2 Key Performance Indicators CSG2 Release 3.5
Anonymous SmYjg7g
Pas encore d'évaluation
Gcbasic
Document543 pages
Gcbasic
Jose Alvarez
Pas encore d'évaluation
Gpgpu Final
Document124 pages
Gpgpu Final
Sibghat Rehman
Pas encore d'évaluation
The Quicklogic Solution: Build Flexibility and Differentiation Into Your Next Low Power Soc
Document2 pages
The Quicklogic Solution: Build Flexibility and Differentiation Into Your Next Low Power Soc
SGS
Pas encore d'évaluation
Embedded Software Design & FPGAs 2022
Document28 pages
Embedded Software Design & FPGAs 2022
Alex Pérez
Pas encore d'évaluation
FPGA Technology Xilinx
Document48 pages
FPGA Technology Xilinx
Hossam Fadeel
Pas encore d'évaluation
STA Prime Time
Document125 pages
STA Prime Time
Nishanth Gowda
Pas encore d'évaluation
CISC Vs RISC and I286, I386, I486
Document12 pages
CISC Vs RISC and I286, I386, I486
Manash Mandal
Pas encore d'évaluation
Infineon-CY8C21334 CY8C21534 Automotive PSoC Programmable System-on-Chip-DataSheet-v12 00-EN
Document39 pages
Infineon-CY8C21334 CY8C21534 Automotive PSoC Programmable System-on-Chip-DataSheet-v12 00-EN
pumaska126
Pas encore d'évaluation
Modern Processors: Su Perscalari y
Document13 pages
Modern Processors: Su Perscalari y
Tanmay Modi
Pas encore d'évaluation
Lecture Notes-Computer Architecture-Module 1
Document20 pages
Lecture Notes-Computer Architecture-Module 1
mokshagnanare26
Pas encore d'évaluation
EE 346 Microprocessor Principles and Applications An Introduction To Microcontrollers, Assembly Language, and Embedded Systems
Document40 pages
EE 346 Microprocessor Principles and Applications An Introduction To Microcontrollers, Assembly Language, and Embedded Systems
AbG Compilation of Videos
Pas encore d'évaluation
Programming Gpus With Cuda: John Mellor-Crummey
Document42 pages
Programming Gpus With Cuda: John Mellor-Crummey
askbilladdmicrosoft
Pas encore d'évaluation
Ch3 - AL Machine Organization
Document107 pages
Ch3 - AL Machine Organization
maiyonis
Pas encore d'évaluation
Assignment 4 Solutions
Document4 pages
Assignment 4 Solutions
islandemiaj
Pas encore d'évaluation
Instruction Level Parallelism
Document11 pages
Instruction Level Parallelism
Zarnigar Altaf
95% (21)
Basic Pipelining: CS2100 - Computer Organization
Document83 pages
Basic Pipelining: CS2100 - Computer Organization
amanda
Pas encore d'évaluation
Addressing Modes
Document20 pages
Addressing Modes
Sandeep Reddy Kankanala
100% (1)
Lecture 08 - Pipelined Processor Design
Document55 pages
Lecture 08 - Pipelined Processor Design
Omar Ahmed
Pas encore d'évaluation
Addressing Modes of 8086
Document9 pages
Addressing Modes of 8086
Prasad Kalekar
Pas encore d'évaluation
Cs433 Sp12 Midterm Sol
Document9 pages
Cs433 Sp12 Midterm Sol
SwatiMeena
Pas encore d'évaluation
11.3 Pipelining: Flip Flops
Document1 page
11.3 Pipelining: Flip Flops
Anonymous YsVh1JJY9T
Pas encore d'évaluation
CS 211: Computer Architecture: Instructor: Prof. Bhagi Narahari
Document82 pages
CS 211: Computer Architecture: Instructor: Prof. Bhagi Narahari
Elizabeth Aliaz
Pas encore d'évaluation
RISC-V Pipeline P3
Document24 pages
RISC-V Pipeline P3
SOHAN DEBNATH
Pas encore d'évaluation
CB CSE211 Set 2 MCQ Ques
Document18 pages
CB CSE211 Set 2 MCQ Ques
Faiz Alam
Pas encore d'évaluation
Chapter 09
Document7 pages
Chapter 09
Maham Tariq
100% (1)
2 A COAL - Final Paper
Document2 pages
2 A COAL - Final Paper
MUHAMMAD AQEEL ANWAR KAMBOH
Pas encore d'évaluation
CH04 Solution
Document24 pages
CH04 Solution
Jonathan Manzaki
Pas encore d'évaluation
Lecture6 ARM
Document50 pages
Lecture6 ARM
Rohith Thurlapati
Pas encore d'évaluation
16-Instruction Set-19-01-2023
Document13 pages
16-Instruction Set-19-01-2023
Varun Bhandari
Pas encore d'évaluation
3SP Wspeculation
Document10 pages
3SP Wspeculation
Aaryan gamit
Pas encore d'évaluation
CMSC414 Practice
Document14 pages
CMSC414 Practice
thenoadfns
Pas encore d'évaluation
Pipelining and Vector Processing
Document28 pages
Pipelining and Vector Processing
BinuVargis
Pas encore d'évaluation
Animated Pipelined Example
Document10 pages
Animated Pipelined Example
Sherif Said
Pas encore d'évaluation
En m3 Ex Sol
Document35 pages
En m3 Ex Sol
Cristiano Neivmj
Pas encore d'évaluation
This Unit: Superscalar Execution: - Idea of Instruction-Level Parallelism - Superscalar Scaling Issues
Document13 pages
This Unit: Superscalar Execution: - Idea of Instruction-Level Parallelism - Superscalar Scaling Issues
g7812
Pas encore d'évaluation
Enhancing Performance With Pipelining
Document71 pages
Enhancing Performance With Pipelining
api-26072581
Pas encore d'évaluation
L 4 Multithreading
Document20 pages
L 4 Multithreading
Lekshmi
Pas encore d'évaluation
e-PG PATHSHALA-Computer Science Computer Architecture
Document9 pages
e-PG PATHSHALA-Computer Science Computer Architecture
biswajit biswal
Pas encore d'évaluation
Aca Midsem2011 Question Paper
Document1 page
Aca Midsem2011 Question Paper
Kartik Makkar
Pas encore d'évaluation
NASM Reference Guide PDF
Document284 pages
NASM Reference Guide PDF
Ravichander Bairi
0% (1)
Return Instruction, RC, RNC, RP, RM, RZ, RNZ, Rpe, Rpo, Ret
Document2 pages
Return Instruction, RC, RNC, RP, RM, RZ, RNZ, Rpe, Rpo, Ret
pad pest
Pas encore d'évaluation
Ece 4750 Pset 4
Document10 pages
Ece 4750 Pset 4
Aman Tyagi
Pas encore d'évaluation