Crossroad >> Code >> Article
Google
 
Web akhilesh.in
Home
 » Articles
 » PalmOS




Code Optimization

for

The Current Generation of Desktop Processors

 


 

Forward

Historically Moore’s Law has been the goal of the semiconductor processor industry focusing on the overall processing power of the processor. Previously focus was more on increasing the clock speed of the processor which increased the speed with which the processor can compute. But as it started approaching the limits of Physics the focus shifted to optimizing it to do more with the same clock speed. Now that it is also approaching the limits the focus is now shifting towards multiprocessing, which was previously in the realm of super computers and specialized applications.

 

The current generation of desktop processors boasts of technologies like 64-bit processing, Hyper-threading and Multi Core processing. The current generation of the Operating Systems uses them to the fullest. Does the program you write make full use of it?

 

This paper is an experiment done to optimize the program we write in order to make the most out of the current generation of processors.

 

Introduction

Single core processor (like Pentium 4) has a single processing unit and all the processing is done only through this single unit.

 

Hyper-Threading processor works by duplicating certain sections of the processor. Hyper-threading processor appear as two "logical" processors to the host operating system.

 

A multi-core processor is a processing system composed of two or more independent cores or CPUs. Examples are Intel Dual Core, Intel Core 2 Duo, Intel Core 2 Quad, Intel Xeon Dual, Quad and Hexa-core processors, AMD Phenom X4 (Quad-Core Processors), AMD Phenom X3 (Triple-Core). Some multi core processors may additionally implement Hyper Threading. Intel Atom is the brand name for a line of x86 and x86-64 CPUs from Intel, used mainly in low powered computers like Notebooks. It is a single core processor and supports Hyper Threading.

 

Premise

As increase in the processor architecture from single core to multi core, there need to be improvement in the programming technology to utilize the multi core architecture efficiently, thereby increasing the software performance. Traditionally multithreading is simulated by the Operating System by giving slice of processing time to each thread/process, but there is no real parallel processing achieved.

 

But now multi-threading will increase performance by utilizing multiple cores of modern processor due to real parallel processing.

 


Experiment

 

We have a traditional single thread program and in this experiment we will break it into multiple threads (one for each core) and compare the execution times.

 

The experiment will be done in native Windows C++ code and Java code on Windows.

 

 

Test Setup

 

Machine Setup
This experiment has done over various machine having processors core 2 duo, core 2 quad, Hyper threading Processor and on WINDOWS and vista OS.

Program
This experiment has done over the both c++ and java program on windows. A typical for loop is divided into the different for loop of different range and executed with different threads. For this experiment the loop for 0-120 has divided into 0-60 and 60-120 and executed with two different threads. Each loop contains a complex calculation of multiplication of long data type and calculating the tick to execute each segment of code.

 

NOTE
Not all parts of code can efficiently used the number of cores present in the system. Only those parts which can be independently divided into the sub threads can make to use the cores efficiently.

 


Observation

 

Multi-Code Processor

No of Cores on the PC

 

No of threads in the program

Time taken as compared to the single thread program

[C++]

(Lower the better)

Time taken as compared to the single thread program

[JAVA]

(Lower the better)

4

4

25 %

27 %

2

4

67 %

83 %

1

4

90 %

86 %

2

2

64 %

53 %

1

2

98 %

106 %

 

Hyper threading Processor (Single Core Processor)

No of thread in processor

No of threads in the program

Time taken as compared to the single thread program

[C++]

(Lower the better)

Time taken as compared to the single thread program

[JAVA]

(Lower the better)

2

4

65 %

49 %

2

2

91 %

49 %

 

 

For more detail see Threading.xls.

 

Conclusion

 

After going through various case studies it is found that:

Ø  It’s always better to use threading to increase the performance. But it has some restriction.

Ø  It is better to use same no of thread as no of cores/hyper-threads we have.

Ø  It is also seen that, assigning thread to particular processor is not better way to Enhance performance rather one can use threading and leave every thing to OS to do scheduling of CPU.

 

 

 

How to optimize

To get the core/thread count

-          Windows API calls :- GetSystemInfo to get processor no in windows

-          Java Function: - java.lang.Runtime. availableProcessors () in java

-          .NET Class :- Parallel class is available with .net 4 with Visual Studio 10 (not tested)

 

Dividing the application into threads:-

After Getting no of processor, divide the part of code into thread according to the processor count.



Note: - Not all segment of code can be divided in to multiple threads. Programmatically diving on the fly depending on the core count is even more difficult. That segment can be divided which is independent to each other after dividing.

 

 

We can determine the total core/hyper-thread count and create optimal number of threads accordingly to increase the performance of the program.



Author: Premkant

8 September 2009

 



Crossroad | About Me | Science | Life | Code | Contact | Site Map | Search
Copyright © 2006 - 2009 akhilesh.in