Code
Optimization
for
The
Current Generation of Desktop Processors
Forward
Historically
Moore’s Law has been the goal of the semiconductor processor industry focusing
on the overall processing power of the processor. Previously focus was more on
increasing the clock speed of the processor which increased the speed with
which the processor can compute. But as it started approaching the limits of
Physics the focus shifted to optimizing it to do more with the same clock speed.
Now that it is also approaching the limits the focus is now shifting towards
multiprocessing, which was previously in the realm of super computers and
specialized applications.
The
current generation of desktop processors boasts of technologies like 64-bit
processing, Hyper-threading and Multi Core processing. The current generation
of the Operating Systems uses them to the fullest. Does the program you write
make full use of it?
This
paper is an experiment done to optimize the program we write in order to make
the most out of the current generation of processors.
Introduction
Single
core processor (like Pentium 4) has a single processing unit and all the processing
is done only through this single unit.
Hyper-Threading
processor works by duplicating certain sections of the processor. Hyper-threading
processor appear as two "logical" processors to the host operating
system.
A
multi-core processor is a processing system composed of two or more independent
cores or CPUs. Examples are Intel Dual Core, Intel Core 2 Duo, Intel Core 2
Quad, Intel Xeon Dual, Quad and Hexa-core processors, AMD Phenom X4 (Quad-Core
Processors), AMD Phenom X3 (Triple-Core). Some multi core processors may additionally
implement Hyper Threading. Intel Atom is the brand name for a line of x86 and
x86-64 CPUs from Intel, used mainly in low powered computers like Notebooks. It
is a single core processor and supports Hyper Threading.
Premise
As
increase in the processor architecture from single core to multi core, there
need to be improvement in the programming technology to utilize the multi core
architecture efficiently, thereby increasing the software performance. Traditionally
multithreading is simulated by the Operating System by giving slice of
processing time to each thread/process, but there is no real parallel
processing achieved.
But
now multi-threading will increase performance by utilizing multiple cores of
modern processor due to real parallel processing.
Experiment
We
have a traditional single thread program and in this experiment we will break
it into multiple threads (one for each core) and compare the execution times.
The
experiment will be done in native Windows C++ code and Java code on Windows.
Test Setup
Machine Setup
This experiment has done over various machine having processors core 2 duo,
core 2 quad, Hyper threading Processor and on WINDOWS and vista OS.
Program
This experiment has done over the both c++ and java program on windows. A
typical for loop is divided into the different for loop of different range and
executed with different threads. For this experiment the loop for 0-120 has divided
into 0-60 and 60-120 and executed with two different threads. Each loop
contains a complex calculation of multiplication of long data type and
calculating the tick to execute each segment of code.
NOTE
Not all parts of code can efficiently used the number of cores present in the
system. Only those parts which can be independently divided into the sub
threads can make to use the cores efficiently.
Observation
Multi-Code
Processor
No
of Cores on the PC
|
No
of threads in the program
|
Time
taken as compared to the single thread program
[C++]
(Lower
the better)
|
Time
taken as compared to the single thread program
[JAVA]
(Lower
the better)
|
4
|
4
|
25
%
|
27
%
|
2
|
4
|
67
%
|
83
%
|
1
|
4
|
90
%
|
86
%
|
2
|
2
|
64
%
|
53
%
|
1
|
2
|
98
%
|
106
%
|
Hyper
threading Processor (Single Core Processor)
No
of thread in processor
|
No
of threads in the program
|
Time
taken as compared to the single thread program
[C++]
(Lower
the better)
|
Time
taken as compared to the single thread program
[JAVA]
(Lower
the better)
|
2
|
4
|
65
%
|
49
%
|
2
|
2
|
91
%
|
49
%
|
For
more detail see Threading.xls.
Conclusion
After
going through various case studies it is found that:
Ø
It’s
always better to use threading to increase the performance. But it has some
restriction.
Ø
It is
better to use same no of thread as no of cores/hyper-threads we have.
Ø
It is
also seen that, assigning thread to particular processor is not better way to
Enhance performance rather one can use threading and leave every thing to OS to
do scheduling of CPU.
How to optimize
To
get the core/thread count
-
Windows
API calls :- GetSystemInfo
to get processor no in windows
-
Java Function:
- java.lang.Runtime. availableProcessors () in java
-
.NET Class
:- Parallel class is available with .net 4 with Visual Studio 10 (not
tested)
Dividing
the application into threads:-
After
Getting no of processor, divide the part of code into thread according to the
processor count.
Note:
- Not
all segment of code can be divided in to multiple threads. Programmatically
diving on the fly depending on the core count is even more difficult. That
segment can be divided which is independent to each other after dividing.
We
can determine the total core/hyper-thread count and create optimal number of threads
accordingly to increase the performance of the program.
Author:
Premkant
8
September 2009