Hyperthreading

1.
HYPERTHREADING
DEFINITION:
THREAD: It is a program fragment which is assigned for execution by the multitask
operating system to one of processors of the multiprocessor hardware system. They are
sequences of related instructions or tasks running independently which make up a program.
MULTI THREADING: It is a computation architecture which runs multiple processes

simultaneously and aims to increase utilization of a single core by leveraging thread-level as
well as instruction-level parallelism. Applied for multitasking between multiple related
threads of programs.
ADVANTAGES:
 If a thread gets a lot of cache misses, the other thread(s) can continue, taking
advantage of the unused computing resources, which thus can lead to faster overall
execution, as these resources would have been idle if only a single thread was
executed.
 If a thread cannot use all the computing resources of the CPU (because instructions
depend on each other's result), running another thread permits to not leave these idle.
 If several threads work on the same set of data, they can actually share its caching,
leading to better cache usage or synchronization on its values.
DISADVANTAGES:
 Multiple threads can interfere with each other when sharing hardware resources such
as caches or translation look aside buffers(TLBs).
 Execution times of a single-thread are not improved but can be degraded, even when
only one thread is executing. This is due to slower frequencies and/or additional
pipeline stages that are necessary to accommodate thread-switching hardware.
 Hardware support for Multithreading is more visible to software, thus requiring more
changes to both application programs and operating systems than Multiprocessing.
HYPER THREADING: A high-performance computing architecture that simulates

some degree of overlap in executing two or more independent sets of instructions. It is an
Intel's term for its simultaneous multithreading implementation in their Pentium 4 and Core
i7 CPUs.
11070220
2
HT is a microprocessor simultaneous multithreading technology (SMT) that supports the

concurrent execution of multiple separate instruction streams, referred to as threads of
execution, on a single physical processor. When HT is used with the Intel Xeon processors
that support it, there are two threads of execution per physical processor.
A feature of certain Pentium 4 chips that makes one physical CPU appear as two logical
CPUs. It uses additional registers to overlap two instruction streams in order to achieve an
approximate 30% gain in performance. Multithreaded applications take advantage of the
Hyper-Threaded hardware as they would on any dual-processor system; however, the
performance gain cannot equal that of true dual-processor CPUs.
11070220
3
AN OVERVIEW:
Hyper-threading (officially termed Hyper-Threading Technology or HTT) is an Intel-

proprietary technology used to improve parallelization of computations (doing multiple tasks
at once) performed on PC microprocessors. A processor with hyper-threading enabled is
treated by the operating system as two processors instead of one. This means that only one
processor is physically present but the operating system sees two virtual processors,
and shares the workload between them. Each logical processor that is contained within an
HT processor appears to the operating system as an individual processor. This means that
tools or services within Windows that display information about processors, such as the
Windows Task Manager or Windows Performance Monitor, will display processor
information for every logical processor that Windows is utilizing.
Hyper-threading requires both operating system and CPU support; conventional

multiprocessor support is not enough.
The first Intel processors to support Hyper-Threading Technology (HT) are the IA-32 Xeon
family of processors that were released in the first half of calendar year 2002. Although
Xeon processors are supported in workstations, HT technology is initially targeted at dual-
processor and multiprocessor server configurations.
The HT in the Xeon processors makes two architectural states available on the same physical
processor. Each architectural state can execute an instruction stream, which means that two
concurrent threads of execution can occur on a single physical processor. Each thread of
execution can be independently halted or interrupted. These architectural states are referred
to as logical processors in this report.
The main difference between the execution environment provided by the Xeon HT
processor, compared with that provided by two traditional single-threaded processors, is that
HT shares certain processor resources: there is only one execution engine, one on-
board cache set, and one system bus interface. This means that the logical processors on
an HT processor must compete for use of these shared resources. As a result, an HT
processor will not provide the same performance capability as two similarly equipped single-
threaded processors.
The two logical processors on an HT processor are treated equally with respect to access to
the shared resources. We are to refer to the logical processors on an HT processor, in order
of use, as the first and second logical processors.
Windows XP and Windows .NET Server include generic identification and support for IA-
32 processors that implement HT using the Intel-defined CPUID instruction identification
mechanism. However, this support is not guaranteed for processors that have not been tested
with these operating systems.
11070220
4
SMT processors may support more than two logical processors in the future. However, the
discussions and examples in this white paper assume the use of two logical processors, as
used in the Xeon family of processors.
FIGURE 1: Intel Pentium 4 @ 3.80 GHz with Hyper-Threading Technology.
_______________________________________________
BASIC WORKING OF HYPERTHREADING:
1. This technology is meant to increase efficiency of operation of a processor. The matter

is that, according to Intel, only 30% of all execution units in the processor work the
most part of time. And the idea to load other 70% looks logical (the Pentium 4
processor, by the way, which has incorporated this technology, doesn't suffer from
superfluous performance per megahertz). The main point of the Hyper Threading
technology is that during implementation of one thread of a program, idle execution
units can work with another thread of the program (or a thread of another program).
Or, for example, while executing one sequence of instructions they may wait for data
from memory for execution of another sequence.
2. When executing different threads, the processor must "know" which instructions refer
to which threads. That is why there is some mechanism which helps the processor do
it.
3. It is also clear that taking into account a small number of general-purpose registers in
the x86 architecture (8 in all) each thread has its own set of registers. However, this
limitation is evaded by renaming the registers. That is, there are much more physical
11070220
5
registers than logical ones. The Pentium III has 40. The Pentium 4 has obviously
more. According to the unconfirmed information, they are 128.
4. It's also known that when several threads need the same resources or one of the
threads waits for data the "pause" instruction must be applied to avoid a performance
drop. Certainly, this requires recompilation of the programs.
5. Sometimes execution of several threads can worsen the performance. For example,
because the L2 cache is not extendable, and when active threads will try to load the
cache it's possible that the struggle for the cache will result in constant clearing and
reload of data in the L2 cache.
6. Intel states that the gain can reach 30% in case of optimization of programs for this
technology. (Or, rather, Intel states that on the today's server programs and
applications the measured gain is up to 30%) It's a decent reason for the optimization.
11070220
6
FIGURE 2: Represents threads of program1

Represents threads of program 2
Represents idle CPU processes
 Hyper-threading works by duplicating certain sections of the processor—those that

store the architectural state—but not duplicating the main execution resources. This
allows a hyper-threading processor to appear as two "logical" processors to the host
operating system, allowing the operating system to schedule two threads or processes
simultaneously.
 When execution resources would not be used by the current task in a processor without
hyper-threading, and especially when the processor is stalled, a hyper-threading
equipped processor can use those execution resources to execute another scheduled
task. (The processor may stall due to a cache miss, branch misprediction, or data
dependency.)
11070220
7
 This technology is transparent to operating systems and programs. All that is required
to take advantage of hyper-threading is symmetric multiprocessing (SMP) support in
the operating system, as the logical processors appear as standard separate processors.
 Programs are made up of execution threads. These threads are sequences of related
instructions. Earlier, most programs consisted of a single thread. The operating systems
in those days were capable of running only one such program at a time. The result was
that your PC would freeze while it printed a document or a spreadsheet. The system
was incapable of doing two things simultaneously. Innovations in the operating system
introduced multitasking in which one program could be briefly suspended and another
one run. By quickly swapping programs in and out in this manner, the system gave the
appearance of running the programs simultaneously. However, the underlying
processor was, in fact, at all times running just a single thread.
 It is possible to optimize operating system behaviour on multi-processor hyper-

threading capable systems, such as the Linux techniques discussed in Kernel Traffic.
For example, consider an SMP system with two physical processors that are both
hyper-threaded (for a total of four logical processors). If the operating system's process
scheduler is unaware of hyper-threading it will treat all four processors as being the
same. If only two processes are eligible to run it might choose to schedule those
processes on the two logical processors that happen to belong to one of the physical
processors; that processor would become extremely busy while the other would be idle,
leading to poorer performance than is possible with better scheduling. This problem
can be avoided by improving the scheduler to treat logical processors differently from
physical processors; in a sense, this is a limited form of the scheduler changes that are
required for NUMA systems.
 By the beginning of this decade, processor design had gained additional execution
resources (such as logic dedicated to floating-point and integer math) to support
executing multiple instructions in parallel. Intel saw an opportunity in these extra
facilities. The company reasoned it could make better use of these resources by
employing them to execute two separate threads simultaneously on the same processor
core. Intel named this simultaneous processing Hyper-Threading Technology and
released it on the Intel Xeon processors in 2003. According to Intel benchmarks,
applications that were written using multiple threads could see improvements of up to
30% by running on processors with HT Technology. More important, however, two
programs could now run simultaneously on a processor without having to be swapped
in and out (See Figure 3.) To induce the operating system to recognize one processor as
two possible execution pipelines, the new chips were made to appear as two logical
processors to the operating system.
 The performance boost of HT Technology was limited by the availability of shared

resources to the two executing threads. As a result, HT Technology cannot approach
the processing throughput of two distinct processors because of the contention for these
11070220
8
shared resources. To achieve greater performance gains on a single chip, a processor

would require two separate cores, such that each thread would have its own complete
set of execution resources. Enter multi-core.
FIGURE 3: Threads being executed by early,

recent and next generation processors.
Above diagram portrays difference between an early processor and a not so recent processor.
First consisted of single core which used to handle one program at one time. Multitasking
was not possible.
Second diagram gives an insight into the recent processors which create a
virtual second processor to handle many threads or programs simultaneously.
A third diagram depicts new generation multicore processor in action, executing 4 threads
simultaneously . It provides speed coupled with multitasking abilities at the extreme.
_________________________________________________________________________
COMPONENTS OF HYPERTHREADING:
11070220
9
1. REGISTER ALIAS TABLES:-

• Map the architectural registers to physical rename registers.
• Each logical processor needs its own set of architectural registers because they have to
be tracked independently.
2. ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER:-

 APIC is duplicated so that interrupts for each logical processor can be handled
independently
3. RETURN STACK PREDICTOR:-
11070220
10
 The return stack predictor is duplicated for accurate tracking of call/return pairs
4. INSTRUCTION TRANSLATION LOOK-ASIDE BUFFER:-

• The instruction translation look-aside buffer is duplicated because of its small size and
replication is simpler than sharing.
5. NEXT-INSTRUCTION POINTER:-
• The next-instruction pointer and other control logic permits program progress to be
tracked independently .
6. TRACE-CACHE NEXT-INSTRUCTION POINTER:-
• The trace-cache next-instruction pointer stores decoded instructions and serves as the
first-level instruction cache .
THREADING ALGORITHMS:
 Time-slicing: Time slicing is a technique used for achieving high power-saving effect
on terminal devices. It is based on the time-multiplexed transmission of different
services
 A processor switches between threads in fixed time intervals.
 High expenses, especially if one of the processes is in the wait state.
 Switch-on-event
 Task switching in case of long pauses
 Waiting for data coming from a relatively slow source, CPU resources are given
to other processes
 Multiprocessing
 Distribute the load over many processors
 Adds extra cost
 Simultaneous multi-threading
 Multiple threads execute on a single processor without switching.
Basis of Intel’s Hyper-Threading technology
COMPARING PAST AND HYPERTHREADED

11070220
11
PROCESSORS:
Here are some multitasking workloads that are just too much for a single logical
processor
11070220
12
These benchmarks were taken using popular software packages that are
already multithreaded and shows the percent increase in performance
PROCESSORS COMPATIBLE WITH HYPERTHREADING:

INTEL INTEL SERIES NAME
PROCESSOR:
DESKTOP:
Intel® Pentium® 4 processor Extreme Edition

supporting Intel® Hyper-Threading Technology
NOTEBOOKS:
Mobile Intel® Pentium® 4 processors supporting
11070220
13
Intel® Hyper-Threading Technology
Intel® Xeon® processor
Intel® chipsets:
Intel Desktop Chipsets (DESKTOPS)

CHIPSET
Intel Mobile Chipsets (NOTEBOOKS)
Intel Server Chipsets (WORKSTATIONS)
MULTI CORE PROCESSORS:-

Multi-core processors: Multi-core processors, as the name implies, contain two or more
distinct cores in the same physical package. Figure 2 shows how this appears in relation to
previous technologies.
In this design, each core has its own execution pipeline. And each core has the resources
required to run without blocking resources needed by the other software threads.
While the example in Figure 2 shows a two-core design, there is no inherent limitation in the
number of cores that can be placed on a single chip. Intel has committed to shipping dual-
core processors in 2005, but it will add additional cores in the future. Mainframe processors
today use more than two cores, so there is precedent for this kind of development.
11070220
14
FIGURE 5:
Figure 2. Multi-Core processors have multiple execution cores on a single chip.
The multi-core design enables two or more cores to run at somewhat slower speeds and at
much lower temperatures. The combined throughput of these cores delivers processing
power greater than the maximum available today on single-core processors and at a much
lower level of power consumption. In this way, Intel increases the capabilities of server
platforms as predicted by Moore's Law while the technology no longer pushes the outer
limits of physical constraints
11070220
15
TABLE DEPICTING NOMENCLATURE OF INTEL MULTICORE PROCESSORS:
Intel Core 2 processor family
Desktop Laptop
Logo *
Code-named Core Date released Code-named Core Date released
Conroe dual (65 nm) Aug 2006

Merom dual (65 nm) Jul 2006
Allendale dual (65 nm) Jan 2007
Penryn dual (45 nm) Jan 2008
Wolfdale dual (45 nm) Jan 2008
Conroe XE dual (65 nm) Jul 2006 Merom XE dual (65 nm) Jul 2007
Kentsfield XE quad (65 nm) Nov 2006 Penryn XE dual (45 nm) Jan 2008
Yorkfield XE quad (45 nm) Nov 2007 Penryn XE quad (45 nm) Aug 2008
Kentsfield quad (65 nm) Jan 2007

Penryn quad (45 nm) Aug 2008
Yorkfield quad (45 nm) Mar 2008
Merom solo (65 nm) Sep 2007

Desktop version not available
Penryn solo (45 nm) May 2008
MERITS AND DEMERITS OF USAGE OF hyper THREADING:
11070220
16
ADVANTAGES:
The advantages of hyper-threading are listed as:
 improved support for multi-threaded code,
 allowing multiple threads to run simultaneously. No performance loss if only one

thread is active. Increased performance with multiple threads.
 improved reaction and response time. Faster speed and higher efficiency.
 Better resource utilization.
 Extra architecture only adds about 5% to the total die area.
 The largest boost in performance will likely be noticed while running CPU-intensive
processes, like antivirus scans, playing high end games, ripping/burning media
(requiring file conversion), or searching for folders
DISADVANTAGES:
 To take advantage of hyper-threading performance, serial execution can not be used.
 Threads are non-deterministic and involve extra design

 Threads have increased overhead
 Shared resource conflicts
 In addition to operating system (OS) support, adjustments to existing software are

required to maximize utilization of the computing resources provided by multi-core
processors
 Integration of a multi-core chip drives production yields down and they are more
difficult to manage thermally than lower-density single-chip designs. Intel has
partially countered this first problem by creating its quad-core designs by combining
two dual-core on a single die with a unified cache, hence any two working dual-core
dies can be used, as opposed to producing four cores on a single die and requiring all
four to work to produce a quad-core
11070220
17
 It suffers from a serious security flaw which permits local information disclosure,
including allowing an unprivileged user to steal an RSA private key being used on the
same machine. Administrators of multi-user systems are strongly advised to take
action to disable Hyper-Threading immediately; single-user systems (i.e., desktop
computers) are not affected. Earlier this year, Intel hyper threading was revealed to
have a security flaw where threads could find information from each other through the
shared cache despite having no access to each other's memory space.
 Two processing cores sharing the same system bus and memory bandwidth limits the
real-world performance advantage.
SYMMETRIC MULTIPROCESSING:
Short for Symmetric Multiprocessing, a computer architecture that provides fast performance
by making multiple CPUs available to complete individual processes simultaneously
(multiprocessing). Unlike asymmetrical processing, any idle processor can be assigned any
task, and additional CPUs can be added to improve performance and handle increased loads.
A variety of specialized operating systems and hardware arrangements are available to
support SMP. Specific applications can benefit from SMP if the code allows multithreading.
In computing, symmetric multiprocessing or SMP involves a multiprocessor computer-

architecture where two or more identical processors can connect to a single shared main
memory. Most common multiprocessor systems today use an SMP architecture. In the case
of multi-core processors, the SMP architecture applies to the cores, treating them as separate
processors.
SMP systems allow any processor to work on any task no matter where the data for that task
are located in memory; with proper operating system support, SMP systems can easily move
tasks between processors to balance the workload efficiently.
____________________________________________________________________________________
SIMULTANEOUS MULTITHREADING (SMT):

11070220
18
Simultaneous multithreading (SMT), is a technique for improving the overall efficiency of

superscalar ( executing multiple instructions at the same) CPUs with hardware
multithreading. SMT permits multiple independent threads of execution to better utilize the
resources provided by modern processor architectures.
 SMT is one of the two main implementations of multithreading, the other form being
temporal multithreading. In temporal multithreading, only one thread of instructions
can execute in any given pipeline stage at a time. In simultaneous multithreading,
instructions from more than one thread can be executing in any given pipeline stage at
a time. This is done without great changes to the basic processor architecture: the main
additions needed are the ability to fetch instructions from multiple threads in a cycle,
and a larger register file to hold data from multiple threads. The number of concurrent
threads can be decided by the chip designers, but practical restrictions on chip
complexity have limited the number to two for most SMT implementations.
 Because the technique is really an efficiency solution and there is inevitable increased
conflict on shared resources, measuring or agreeing on the effectiveness of the
solution can be difficult. Some researchers have shown that the extra threads can be
used to proactively seed a shared resource like a cache, to improve the performance of
another single thread, and claim this shows that SMT is not just an efficiency solution.
Others use SMT to provide redundant computation, for some level of error detection
and recovery.
 However, in most current cases, SMT is about hiding memory latency, efficiency and
increased throughput of computations per amount of hardware used.
____________________________________________________________________
Steps To Develop Parallel Programming For A Hyper Threading

Technology:
11070220
19
 Partitioning : The partitioning stage of a design is intended to expose opportunities

for parallel execution. Hence, the focus is on defining a large number of small tasks in
order to yield what is termed a fine-grained decomposition of a problem.
 Communication : The tasks generated by a partition are intended to execute

concurrently but cannot, in general, execute independently. The computation to be
performed in one task will typically require data associated with another task. Data
must then be transferred between tasks so as to allow computation to proceed. This
information flow is specified in the communication phase of a design.
 Agglomeration : In the third stage, we move from the abstract toward the concrete.
We revisit decisions made in the partitioning and communication phases with a view
to obtaining an algorithm that will execute efficiently on some class of parallel
computer. In particular, we consider whether it is useful to combine, or agglomerate,
tasks identified by the partitioning phase, so as to provide a smaller number of tasks,
each of greater size. We also determine whether it is worthwhile to replicate data
and/or computation.
 Mapping : In the fourth and final stage of the parallel algorithm design process, we
specify where each task is to execute. This mapping problem does not arise on
uniprocessors or on shared-memory computers that provide automatic task scheduling.
On the other hand, on the server side, multicore processors are ideal because they allow
many users to connect to a site simultaneously and have independent threads of execution.
This allows for Web servers and application servers that have much better throughput.
FIGURE 5: Depicts effect of HTT on processor speed and performance using Windows Task
11070220
20
manager
FUTURE:
 Older Pentium 4 based CPUs use hyper-threading, but the newer Pentium M based
cores Merom, Conroe, and Woodcrest do not. Hyper-threading is a specialized form of
simultaneous multithreading (SMT).
 The Intel Atom is an in-order single-core processor with hyper-threading, for low
power mobile PCs and low-price desktop PCs.
11070220
21
Diagram of a generic dual core processor, with CPU-local level 1 caches, and a shared,
on-die level 2 cache.
 HT Technology enables gaming enthusiasts to play the latest titles and experience
ultra-realistic effects and game play. And multimedia enthusiasts can create, edit, and
encode graphically intensive files while running background applications such as virus
scan in the background–all without slowing down. Intel released the Nehalem (Core
i7) in November 2008 in which hyper-threading makes a return. Nehalem contains 4
cores and effectively scales 8 threads and has a speed of 3.4 GHz thus providing an
out of the world performance.
 It may drastically improve computer performance in multicore processors by

duplicating the number of virtual processors, thus dividing the workload between the
physical as well as virtual processors. According to Intel, it boosts system
performance by almost 30%.
 It will improve CPU intensive processes, virus scans, ripping and burning CDs and
DVDs, multimedia applications, video quality, improve connectivity between
processor itself .
 As it consumes less power at higher efficiency, it may be embedded in future

multicore processors.
 Improve business productivity by doing more at once without slowing down
 Provide faster response times for Internet and e-Business applications, enhancing
customer experiences
 Increase the number of transactions that can be processed simultaneously
 Utilize existing technologies while maintaining future readiness with compatibility

for existing 32-bit applications and OSs while being prepared for the future of 64-bit
11070220
22
11070220

Hyperthreading

Transféré par

Informations du document

Description originale:

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Hyperthreading

Transféré par

Droits d'auteur :

Formats disponibles

1.

MULTI THREADING: It is a computation architecture which runs multiple processes

HYPER THREADING: A high-performance computing architecture that simulates

HT is a microprocessor simultaneous multithreading technology (SMT) that supports the

Hyper-threading (officially termed Hyper-Threading Technology or HTT) is an Intel-

Hyper-threading requires both operating system and CPU support; conventional

FIGURE 1: Intel Pentium 4 @ 3.80 GHz with Hyper-Threading Technology.

1. This technology is meant to increase efficiency of operation of a processor. The matter

FIGURE 2: Represents threads of program1

Represents idle CPU processes

 Hyper-threading works by duplicating certain sections of the processor—those that

 It is possible to optimize operating system behaviour on multi-processor hyper-

 The performance boost of HT Technology was limited by the availability of shared

shared resources. To achieve greater performance gains on a single chip, a processor

FIGURE 3: Threads being executed by early,

1. REGISTER ALIAS TABLES:-

2. ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER:-

3. RETURN STACK PREDICTOR:-

4. INSTRUCTION TRANSLATION LOOK-ASIDE BUFFER:-

COMPARING PAST AND HYPERTHREADED

PROCESSORS COMPATIBLE WITH HYPERTHREADING:

Intel® Pentium® 4 processor Extreme Edition

Mobile Intel® Pentium® 4 processors supporting

Intel® Hyper-Threading Technology

Intel® Xeon® processor

Intel Desktop Chipsets (DESKTOPS)

MULTI CORE PROCESSORS:-

Figure 2. Multi-Core processors have multiple execution cores on a single chip.

TABLE DEPICTING NOMENCLATURE OF INTEL MULTICORE PROCESSORS:

Intel Core 2 processor family

Conroe dual (65 nm) Aug 2006

Kentsfield quad (65 nm) Jan 2007

Merom solo (65 nm) Sep 2007

MERITS AND DEMERITS OF USAGE OF hyper THREADING:

 improved support for multi-threaded code,

 allowing multiple threads to run simultaneously. No performance loss if only one

 Better resource utilization.

 Extra architecture only adds about 5% to the total die area.

 Threads are non-deterministic and involve extra design

 Shared resource conflicts

 In addition to operating system (OS) support, adjustments to existing software are

In computing, symmetric multiprocessing or SMP involves a multiprocessor computer-

SIMULTANEOUS MULTITHREADING (SMT):

Simultaneous multithreading (SMT), is a technique for improving the overall efficiency of

Steps To Develop Parallel Programming For A Hyper Threading

 Partitioning : The partitioning stage of a design is intended to expose opportunities

 Communication : The tasks generated by a partition are intended to execute

 It may drastically improve computer performance in multicore processors by

 As it consumes less power at higher efficiency, it may be embedded in future

 Improve business productivity by doing more at once without slowing down

 Increase the number of transactions that can be processed simultaneously

 Utilize existing technologies while maintaining future readiness with compatibility

Vous aimerez peut-être aussi