Académique Documents
Professionnel Documents
Culture Documents
Volume I, Issue 14
CACHE
RAM
EC
clock cycle to execute due to something called
memory latency, for example a “load”
instruction must read data from memory
which can take many clock cycles to arrive at
Fig. 1. Operation of a single core processor the processor. The result is that a lot of time
with no hyper-threading the processor is idle. This means that the
instructions are executed and the results of the
execution are committed to memory much less than one instruction per clock cycle.
CACHE
RAM
presents itself as two logical processors to EC
allow the operating system to schedule
separate “threads” of instructions for
execution on each logical processor. Now
when one of the instructions for the first
logical processor pauses while it waits for Fig. 2. Operation of a single core processor
with hyper-threading
data to arrive from main memory, the
execution core can execute instructions that
were scheduled for the other logical processor.
EC EC
execute separate “threads” of
instructions in a truly simultaneous
fashion but they are located on a single
processor die and therefore use up less
Fig. 3. Operation of a multi-core core processor space. Unlike a situation where there are
with hyper-threading two processors in a system, the two
execution cores share the some of the
same cache which can be very helpful if the two threads of execution are executing the
same instructions and if they are working on the same set of data.
A key term that appears several times in the description of the different types of
processors, above, is “threads”. Creating several threads of execution means writing a
program that at some point splits into two separate sequences of code, in such a way that
both pieces of code continue to run at the same time. A program written in this way is
called a multi-thread program and it explicitly demarcates independent sequences of
instructions that the operating system can have run on different processors.
Multi-threaded programming has been around for a long time. It involves creating several
“threads” of execution inside your application that all run at the same time. Most modern
applications use them today even if you are not fully aware of them. For example,
ASP.NET web applications normally consist of a single thread of execution for each
HTTP request that is received. Multiple simultaneous requests mean multiple
simultaneous threads of execution.
Also, they allow you to create highly responsive GUI applications that perform long
compute intensive processes but still respond appropriately to user input. For example,
threads allow you to create a progress bar that is painted correctly on the desktop while it
simultaneously scans the hard-disk for viruses.
There are two ways to add threads to your program. One is to use a specialized API like
the Windows API or the System.Threading namespace of the .NET framework to create
and control threads manually. The other is to use compiler directives as defined by the
OpenMP standard to have the threads created for you automatically.
Threading in .NET
The .NET Framework provides a set of classes to enable multithreaded programming. In
it’s most basic form starting a thread is just a matter of creating a Thread object, passing
it the name of a method that will do the work of the thread and then calling it’s Start()
method. When the Start() method returns there are two threads of execution. Have a look
at the following example:
class ThreadingExample
{
static void Main(string[] args)
{
ThreadStart work = new ThreadStart(printEvens);
Thread thread = new Thread(work);
thread.Start();
printOdds();
64-bit Insider Newsletter
Volume 1, Issue 14
Page 3/6
System.Console.WriteLine("Done.");
}
All .NET applications start running in what’s called the “main” thread which continues to
run normally until the Main() method finishes. This simple program creates a new thread
from within Main() that it has display the even numbers from 0 to 10.
1
While that new thread is running, the main thread is still running too. 0
The main thread displays all the odd numbers from 0 to 10 and then 3
2
finishes. When both threads are finished their work the application 5
terminates. 4
7
6
On the right is an example of what might be printed by this program. 8
9
This is just a sample output because in fact, each time the program runs 10
it may print something else. In fact, sometimes, the message “Done” is Done.
This situation is so common, in fact, that a standard for parallelizing pieces of code
without writing complex thread management logic was developed called OpenMP.
OpenMP consists of a set of compiler directives called pragmas, and specialized
functions that are used, more often than not, to split up the work done in for loops in
C++.
In this example, we have some C++ code that multiplies a matrix and a vector. If these
objects are very large then it might make sense to perform some of the matrix
multiplication in different threads. The great thing about OpenMP is that we can test this
theory with just a single line of code!
The pragma in this sample code tells the compiler to create a set of threads before the for
loop begins and to distribute the iterations of the loop evenly among all the threads.
Just like in the API example we have potential problems when data is shared between the
OpenMP threads. In this example, there is no problem because there are no dependencies
between the iterations of the loops and every iteration writes to a different part of the
result array.
There are many options in OpenMP to configure some aspects of the parallelism. For
example, you can specify things like how many threads are created, how many iterations
are given to each thread, and how to share data between the threads so as to avoid
conflicts. However, flexibility is currently limited. For example, you would find it
difficult to use OpenMP to manage elements of your user interface.
OpenMP is supported by the Visual C++ 2005 from Microsoft. It is also supported by the
Intel C++ compiler. Another advantage of OpenMP is that it is portable. It is a standard
that will be understood by many different compilers on different platforms. And those
that do not understand OpenMP can ignore it.
Summary
To take advantage of the multiple physical and logical processors available in 64-bit
systems and in some 32-bit systems you need to understand the concept of threads and
you need to implement threads in your own application.
API’s exist for most languages on Windows that allow you to create threads in your own
programs. Two common API’s are the one in .NET and the standard Windows API. Also
a standard called OpenMP defines pragmas that can be used to create threads in a more
declarative fashion.
In a future newsletter we will look at the issues that surround synchronizing multiple
threads and how to identify and resolve those issues.
URLs
What is hyperthreading?
http://en.wikipedia.org/wiki/Hyper-threading