Wednesday, April 13, 2016

Java’s utilization of Multiple CPU Cores for Parallelism or Concurrency

 

While verifying the utilization of multiple CPU Cores in Java for Parallel or Concurrent or Multi Threading programming, I came across interesting numbers. I wrote a simple program which tries to compute 40,000,000 random integer numbers first using a single thread and then again using maximum threads, one per available CPU Cores.

In order to find available CPU Cores on a system, Java exposes a method in java.lang.Runtime:

public int availableProcessors()

Returns the number of processors available to the Java virtual machine.

This value may change during a particular invocation of the virtual machine. Applications that are sensitive to the number of available processors should therefore occasionally poll this property and adjust their resource usage appropriately.

Returns:
the maximum number of processors available to the virtual machine; never smaller than one
Since:
1.4

When I run the program to print out the number of available CPU Cores, I was surprised that it printed “4” instead of “2” because I have a Duo Core Laptop:

duo core

To further verify that, I open up the Task Manager and found this:

task manager

It turns out that it’s “4” because of Hyper-Threading:

“For each processor core that is physically present, the operating system addresses two virtual or logical cores, and shares the workload between them when possible.”

So, finally I ran my single threaded program and observe this:

why all processors were busy for a single thread

Why all the four logical processors were busy in running a single threaded program? shouldn’t that be just one of them?

To dig deeper, I changed my program to run in 4 parallel threads and the result was:

multi threading execution on 4 logical processors

That wasn’t making any sense, clearly both single threaded and multi threaded versions of the program were using all the available logical processors for processing. Searching the internet for clarification reveals that:

“The OS is responsible for scheduling. It is free to stop a thread and start it again on another CPU. It will do this even if there is nothing else the machine is doing.

The process is moved around the CPUs because the OS doesn't assume there is any reason to continue running the thread on the same CPU each time.”

And there comes the concept of CPU or Processor Affinity:

The processor affinity is simply a number that every process is associated with. It serves as a bit array that determines on which CPUs in a system the threads of a particular process are allowed to run. For instance a processor affinity of 2 means that the process can only run on CPU 1, because only the bit at index 1 is set (if the processor affinity is regarded as a bit array with indexing starting at the rightmost bit with zero). A processor affinity of 1 means, that the process, or better yet, the threads of that process, can only run on CPU 0. A processor affinity of 3 means that the process may run on both CPUs 0 and 1. A processor affinity of 0 means that there is no CPU that this process may run on, and is therefore not possible. The processor affinity is normally inherited from the parent process that starts a particular process, but it can also be changed at runtime from another process.

While there are several ways to test the Processor Affinity, the one that I found easy and quick to use was ProcAff. After running the same single threaded version of the program with procaffx64.exe:

procaffx64 command

I observe this:

single thread with processor affinity

That’s how the execution of a single threaded program should look like; utilizes only one logical processor for its execution.

Furthermore it is also quite interesting that the execution time of the following matches (please refer to the Microsoft Excel Sheet “analysis.xlsx” uploaded on GitHub repository along with the code):

Average Time to run a single Thread with no CPU/Process Affinity == Average Time to run a single Thread with CPU/Process Affinity

However, the Task Manager shows visually that former case uses all 4 logical processors while the later case uses only one logical process, but they both end up finishing up their task at almost exactly the same time.