Bienvenue sur Scribd !

Christopher A. Wood Caw4567@rit - Edu

Transféré par

0% ont trouvé ce document utile (0 vote)

19 vues14 pages

The document discusses various techniques for optimizing code performance. It begins by explaining the need to measure performance through profiling to identify hotspots in the code. Various levels for optimization are described, from high-level design changes to low-level tweaks of compiler settings and assembly code. Specific optimization strategies discussed include improving parallelism, data access patterns, control flow, and memory usage. The document also provides an overview of a RISC CPU architecture and its performance characteristics. Common misconceptions about optimization are debunked, and additional resources are referenced.

Description originale:

Titre original

Software Optimization

Copyright

Formats disponibles

PDF, TXT ou lisez en ligne sur Scribd

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Signaler ce document

Droits d'auteur :

Attribution Non-Commercial (BY-NC)

Formats disponibles

Téléchargez comme PDF, TXT ou lisez en ligne sur Scribd

Signaler comme contenu inapproprié

0% ont trouvé ce document utile (0 vote)

19 vues14 pages

Christopher A. Wood Caw4567@rit - Edu

Transféré par

caw4567

Droits d'auteur :

Attribution Non-Commercial (BY-NC)

Formats disponibles

Téléchargez comme PDF, TXT ou lisez en ligne sur Scribd

Signaler comme contenu inapproprié

Passer à la page

Vous êtes sur la page 1sur 14

Rechercher à l'intérieur du document

Christopher A. Wood caw4567@rit.

edu

Code architecture and design High-level source code changes Compiler settings Assembly tweaks

Measure performance

Dynamic program analysis using a software profiler Portions of the code that consume the most CPU cycles and computation time I/O overhead, inefficient algorithm, poor design? Source code tweaks or design changes?

Identify hotspots

3.
4.

Identify cause of hotspots

Change the program

-Donald Knuth

Design changes tend to have the biggest impact on code performance Analysis of the code architecture is the best starting point
Mathematical analysis Understanding technological considerations Parallelism

Change the scope of analysis (module- and global-

based)

Data bandwidth performance

Arithmetic operation performance
functions Think at the bit-level
Keep data in devices that can be accessed faster Know your order of operations and the performance of mathematical

Control flow

Software control flow structures (e.g. indirect

Memory usage

function calls, switch statements, branches) perform differently. Be conscious of processor pipeline predictions
Especially important with embedded devices

High performance, dual-issue, superscalar 32bit RISC CPU Seven stage, highly pipelined microarchitecture Dual instruction fetch, decode, and out-oforder issue Separate instruction and data cache arrays Memory Management Unit (MMU) with separate instruction and data shadow TLBs

Soft processor core designed specifically for Xilinx FPGAs Implemented using general-purpose memory and logic fabric of the FPGA Versatile interconnect system to support embedded applications connected to the PLB, its primary I/O bus User-configured memory aspects (cache size, pipeline depth, embedded peripherals, MMU, etc.) Capable of hosting operating systems that require hardware support (e.g. page tables and address space protection in Linux)

Is it an option on the target platform? Can portions of your algorithm be performed in parallel?
E.g. if your algorithm operates on bytes you may

be able to operate on 2, 4, or 8 of them simultaneously using word-based instructions provided by CPU

Can other hardware components perform computations in parallel with the processor?

Look at the software from both a source code and design perspective Analyze the flow of data in your algorithm High-level API usage Code size!

Improved hardware makes software optimization unimportant Using tables always beats recalculating Using C compilers makes it impossible to optimize code for performance Globals are faster than locals Using smaller data types is faster than larger ones

Powers of 2 Optimize loop overhead Loop manipulation (rolling/unrolling/jamming) Declare local functions as static Pass by value and pass by reference Unsigned vs. signed Leverage early termination of if statements Register usage (global variables arent placed there)

http://www.azillionmonkeys.com/qed/optimize. html http://www.cs.ucsb.edu/~nagy/docs/MAEMostafa.pdf http://www.codeproject.com/KB/cpp/C___Code _Optimization.aspx http://developer.amd.com/documentation/articl es/pages/6212004126.aspx https://www01.ibm.com/chips/techlib/techlib.nsf/techdocs/2 D417029AE3F3089872570F8006D4E99/$file/Pow erPC440x6_um_29Sept10_pub.pdf

Vous aimerez peut-être aussi

Embedded System - Word File
Document5 pages
Embedded System - Word File
Jalpa Desai
Pas encore d'évaluation
Architecture of IBM System - 360
Document15 pages
Architecture of IBM System - 360
Krishna Bharadwaj
100% (1)
Department of Computer Science and Engineering Subject Name: Advanced Computer Architecture Code: Cs2354
Document7 pages
Department of Computer Science and Engineering Subject Name: Advanced Computer Architecture Code: Cs2354
kamalsomu
Pas encore d'évaluation
Intro To Es
Document49 pages
Intro To Es
Galal Atef
Pas encore d'évaluation
Unit Iii General Purpose Processor Software Development
Document11 pages
Unit Iii General Purpose Processor Software Development
Saritha Reddy
Pas encore d'évaluation
A Summary On "Characterizing Processor Architectures For Programmable Network Interfaces"
Document6 pages
A Summary On "Characterizing Processor Architectures For Programmable Network Interfaces"
svasanth007
Pas encore d'évaluation
Programming Concepts and Embedded Programming in C and C++
Document55 pages
Programming Concepts and Embedded Programming in C and C++
prejeeshkp
Pas encore d'évaluation
Engineering & Design
Document12 pages
Engineering & Design
Rasoul Sadeghi
Pas encore d'évaluation
Computer Architecture & Organization
Document3 pages
Computer Architecture & Organization
Alberto Jr. Aguirre
Pas encore d'évaluation
2-INTRODUCTION TO PDC - MOTIVATION - KEY CONCEPTS-03-Dec-2019Material - I - 03-Dec-2019 - Module - 1 PDF
Document63 pages
2-INTRODUCTION TO PDC - MOTIVATION - KEY CONCEPTS-03-Dec-2019Material - I - 03-Dec-2019 - Module - 1 PDF
ANTHONY NIKHIL REDDY
Pas encore d'évaluation
CMP3201 Embedded Systems
Document4 pages
CMP3201 Embedded Systems
sekin
Pas encore d'évaluation
Mod 2
Document20 pages
Mod 2
Praneeth
Pas encore d'évaluation
Riding The Next Wave of Embedded Multicore Processors: - Maximizing CPU Performance in A Power-Constrained World
Document36 pages
Riding The Next Wave of Embedded Multicore Processors: - Maximizing CPU Performance in A Power-Constrained World
api-27099960
Pas encore d'évaluation
ERTS Unit 3
Document38 pages
ERTS Unit 3
mohanadevi chandrasekar
Pas encore d'évaluation
Computer System and Architecture
Document17 pages
Computer System and Architecture
Deepak Kumar Gupta
Pas encore d'évaluation
Advanced Operating Systems - CS703 Power Point Slides Lecture 1
Document17 pages
Advanced Operating Systems - CS703 Power Point Slides Lecture 1
asad
Pas encore d'évaluation
Advance Computing Technology (170704)
Document106 pages
Advance Computing Technology (170704)
Satryo Pramahardi
Pas encore d'évaluation
An Introduction To Embedded Systems: Abs Tract
Document6 pages
An Introduction To Embedded Systems: Abs Tract
Sailu Katragadda
Pas encore d'évaluation
Performance of A Computer
Document83 pages
Performance of A Computer
PrakherGupta
Pas encore d'évaluation
Embedded Systems
Document27 pages
Embedded Systems
Kumar Prabhat
Pas encore d'évaluation
Test 1 Study Guide
Document6 pages
Test 1 Study Guide
Travon333
Pas encore d'évaluation
Computer Appreciation & Applications
Document7 pages
Computer Appreciation & Applications
amangoenka
67% (3)
MICROCONTROLLERS AND MICROPROCESSORS SYSTEMS DESIGN - Chapter
Document12 pages
MICROCONTROLLERS AND MICROPROCESSORS SYSTEMS DESIGN - Chapter
alice katenjele
Pas encore d'évaluation
Es Module 2 Notes PDF
Document11 pages
Es Module 2 Notes PDF
Nithin Gopal
Pas encore d'évaluation
Com Roan in & Ar It Re
Document35 pages
Com Roan in & Ar It Re
Waseem Muhammad Khan
Pas encore d'évaluation
Ch.2 Performance Issues: Computer Organization and Architecture
Document25 pages
Ch.2 Performance Issues: Computer Organization and Architecture
Zhen Xiang
Pas encore d'évaluation
Compression & Decompression
Document31 pages
Compression & Decompression
anadhan
Pas encore d'évaluation
Basic Macro Processor Function, Machine Independent Macro Processor Function, Macro Processor Function Design Option
Document4 pages
Basic Macro Processor Function, Machine Independent Macro Processor Function, Macro Processor Function Design Option
SathyaPriya Ramasamy
Pas encore d'évaluation
Embedded Systems: Quiz 2
Document19 pages
Embedded Systems: Quiz 2
171148 171148
Pas encore d'évaluation
IAS & MIPS Rate
Document42 pages
IAS & MIPS Rate
Waseem Haider
Pas encore d'évaluation
Socunit 1
Document65 pages
Socunit 1
Sooraj Sattiraju
Pas encore d'évaluation
Unit - 4 Embedded Software Development Process and Tools
Document25 pages
Unit - 4 Embedded Software Development Process and Tools
Carey John
Pas encore d'évaluation
Computer Organization: Virtual Memory
Document26 pages
Computer Organization: Virtual Memory
chuks felix michael
Pas encore d'évaluation
Computer Architecture
Document3 pages
Computer Architecture
Rao Behram
Pas encore d'évaluation
Parallel Video Processing Performance Evaluation On The Ibm Cell Broadband Engine Processor
Document13 pages
Parallel Video Processing Performance Evaluation On The Ibm Cell Broadband Engine Processor
najwa_b
Pas encore d'évaluation
CS201
Document2 pages
CS201
DivyanshuVerma
Pas encore d'évaluation
Com 419 - CSD - Cat 2 2023
Document5 pages
Com 419 - CSD - Cat 2 2023
vimintiuas
Pas encore d'évaluation
Co
Document80 pages
Co
gdayanand4u
Pas encore d'évaluation
Computer Systems Programming: Dr. Eyas El-Qawasmeh
Document49 pages
Computer Systems Programming: Dr. Eyas El-Qawasmeh
Lavanya Ashok
Pas encore d'évaluation
Cross-Paltform Development
Document9 pages
Cross-Paltform Development
polskakamlangeni
Pas encore d'évaluation
Microcontroller Architecture and Organization: Unit I
Document6 pages
Microcontroller Architecture and Organization: Unit I
hemanthdreamz
Pas encore d'évaluation
Assigment 2
Document7 pages
Assigment 2
Heaven varghese C S C S
Pas encore d'évaluation
EEE440 Computer Architecture
Document11 pages
EEE440 Computer Architecture
ahmad raza
Pas encore d'évaluation
Module1 CA PDF Final
Document71 pages
Module1 CA PDF Final
Fatema Taha
Pas encore d'évaluation
Basics of Parallel Programming: Unit-1
Document79 pages
Basics of Parallel Programming: Unit-1
jai shree krishna
Pas encore d'évaluation
Lesson 1: Chapter 1: Programming Concepts: Marivic B. Mallari
Document41 pages
Lesson 1: Chapter 1: Programming Concepts: Marivic B. Mallari
JC GABRIEL SALALILA
Pas encore d'évaluation
Embedded Systems: MBA - Tech. - EXTC Sem Vii AY 2022-2023
Document90 pages
Embedded Systems: MBA - Tech. - EXTC Sem Vii AY 2022-2023
Aakarsh Kumar
Pas encore d'évaluation
System On Chip Design and Modelling
Document131 pages
System On Chip Design and Modelling
Gurram Kishore
Pas encore d'évaluation
The Laboratory Use of Computers: Merck Research Laboratories, Merck & Co., Inc., Rahway, NJ, USA
Document20 pages
The Laboratory Use of Computers: Merck Research Laboratories, Merck & Co., Inc., Rahway, NJ, USA
calipsomax
Pas encore d'évaluation
Module 8
Document11 pages
Module 8
efren
Pas encore d'évaluation
Session 30aug
Document50 pages
Session 30aug
jay_prakash791
Pas encore d'évaluation
Lecture10 - Interfacing Hardware
Document13 pages
Lecture10 - Interfacing Hardware
Balaji Rajendran
Pas encore d'évaluation
Assignment 1 ESMP: Syed Nouman Zaidi L1f08bscs2074
Document5 pages
Assignment 1 ESMP: Syed Nouman Zaidi L1f08bscs2074
Syed Athar
Pas encore d'évaluation
Benchmarking Slides
Document9 pages
Benchmarking Slides
sarwan111291
Pas encore d'évaluation
Memory Management: Background Swapping Contiguous Allocation
Document51 pages
Memory Management: Background Swapping Contiguous Allocation
Fazrul Rosli
Pas encore d'évaluation
Compton Hauck RCOverview Paper 2002
Document40 pages
Compton Hauck RCOverview Paper 2002
PritKit
Pas encore d'évaluation
Prof. Santanu Chaudhury Prof. Wajeb Gharibi
Document39 pages
Prof. Santanu Chaudhury Prof. Wajeb Gharibi
Rajeev Rawal
Pas encore d'évaluation
RAPID Prototyping Technology: Huy Nguyen and Michael Vai
Document16 pages
RAPID Prototyping Technology: Huy Nguyen and Michael Vai
veer1992
Pas encore d'évaluation
Embedded Systems Design with Platform FPGAs: Principles and Practices
D'Everand
Embedded Systems Design with Platform FPGAs: Principles and Practices
Ronald Sass
Évaluation : 5 sur 5 étoiles
5/5 (1)
Fundamentals of Modern Computer Architecture: From Logic Gates to Parallel Processing
D'Everand
Fundamentals of Modern Computer Architecture: From Logic Gates to Parallel Processing
Sam Steed
Pas encore d'évaluation
22.NEWS PAPER PROCLAMINATION MONITORING SYSTEM - M.Manikandan
Document39 pages
22.NEWS PAPER PROCLAMINATION MONITORING SYSTEM - M.Manikandan
Saravana Kumar R
Pas encore d'évaluation
AMF2 0+ver01
Document2 pages
AMF2 0+ver01
Rodrigo Goering
Pas encore d'évaluation
Bridgeless Buck PFC Rectifier With Voltage Doubler Output
Document24 pages
Bridgeless Buck PFC Rectifier With Voltage Doubler Output
akshaya5293
Pas encore d'évaluation
Operating Systems: Steven Hand Michaelmas / Lent Term 2008/09
Document11 pages
Operating Systems: Steven Hand Michaelmas / Lent Term 2008/09
Sanjeev Kumar
Pas encore d'évaluation
Operating System Concepts-PG-DAC: Suggested Teaching Guidelines For
Document4 pages
Operating System Concepts-PG-DAC: Suggested Teaching Guidelines For
Md Asif Alam
Pas encore d'évaluation
7.performance Analysis of Wallace Tree Multiplier With Kogge Stone Adder Using 15-4 Compressor
Document38 pages
7.performance Analysis of Wallace Tree Multiplier With Kogge Stone Adder Using 15-4 Compressor
anil kumar
Pas encore d'évaluation
Spring Cloud Sleuth
Document31 pages
Spring Cloud Sleuth
tejpremium
Pas encore d'évaluation
TSI Components
Document46 pages
TSI Components
Shahid Khurshid
Pas encore d'évaluation
Document Code: EB006-30-8
Document15 pages
Document Code: EB006-30-8
Ankur Kundu
Pas encore d'évaluation
Deadlocks: Waiting Waiting Not Held This
Document32 pages
Deadlocks: Waiting Waiting Not Held This
Sirisha Deshpande
Pas encore d'évaluation
OPNET Simulator
Document5 pages
OPNET Simulator
Jovanni Ivan Jabat
Pas encore d'évaluation
Cocktail Audio All Products
Document1 page
Cocktail Audio All Products
Pedro Lopes
Pas encore d'évaluation
Iglogger Readme
Document3 pages
Iglogger Readme
Kath
Pas encore d'évaluation
SITRANS F Profibus PA DP Profile 3 Add-On Module For MAG 6000 and MASS 6000 Operating Instructions en
Document80 pages
SITRANS F Profibus PA DP Profile 3 Add-On Module For MAG 6000 and MASS 6000 Operating Instructions en
Nolan Alexis Rosales Sanchez
Pas encore d'évaluation
Ddcc-Chungli Document Is Released 2007.07.27 15:11:21 +08'00'
Document26 pages
Ddcc-Chungli Document Is Released 2007.07.27 15:11:21 +08'00'
fla tiago
Pas encore d'évaluation
Request For Quotation (DOH) NCRO
Document13 pages
Request For Quotation (DOH) NCRO
Edcel Emerenciana
Pas encore d'évaluation
EM312SR: Input/output Module
Document2 pages
EM312SR: Input/output Module
youssef
Pas encore d'évaluation
Multimedia Authoring ALL
Document28 pages
Multimedia Authoring ALL
Indu Gupta
Pas encore d'évaluation
Python For Sciences and Engineering
Document89 pages
Python For Sciences and Engineering
Layl Zan
100% (1)
Docu9412 Brocade Fabric OS MIB Reference Manual
Document354 pages
Docu9412 Brocade Fabric OS MIB Reference Manual
Demodx Demodxz
Pas encore d'évaluation
8051 Instruction Set
Document98 pages
8051 Instruction Set
Smt.N.Vasantha Gowri Assistant Professor
Pas encore d'évaluation
CRS326-24S+2Q+RM: Our Fastest Switch For The Most Demanding Setups
Document3 pages
CRS326-24S+2Q+RM: Our Fastest Switch For The Most Demanding Setups
Heri Purwanto
Pas encore d'évaluation
Settingsprovider
Document63 pages
Settingsprovider
Jxjd Ksksk
Pas encore d'évaluation
2.3.2.5 Packet Tracer - Implementing Basic Connectivity Instructions PDF
Document4 pages
2.3.2.5 Packet Tracer - Implementing Basic Connectivity Instructions PDF
Hoàng Trung Hiếu
33% (3)
Lenovo Recovery
Document6 pages
Lenovo Recovery
c
Pas encore d'évaluation
Yamaha rx-v530 rx-v430 SCH
Document59 pages
Yamaha rx-v530 rx-v430 SCH
Julio Alberto Cabrera Rodriguez
Pas encore d'évaluation
IT Modular Curriculum 2005 Final Edited
Document153 pages
IT Modular Curriculum 2005 Final Edited
Tsegaye Andargie
100% (1)
Rayanagouda.S.Bulagouda: Experience Summary
Document4 pages
Rayanagouda.S.Bulagouda: Experience Summary
rakvinkar1
Pas encore d'évaluation
RDL-3000 Broadband Radio Phatform: Datasheet
Document2 pages
RDL-3000 Broadband Radio Phatform: Datasheet
Ariane Joy Lucas
Pas encore d'évaluation
Final Year Report PDF
Document56 pages
Final Year Report PDF
Jay Patel
Pas encore d'évaluation