Performance & Architecture Types: Lecture Notes

Parallel Computing: Performance

Let's bring an example: suppose that a Java or C++ program that you write contains a loop that needs to make $40$ computations, all independent of one another. For example, your program has an array $A$ of $40$ integers, and you want to create an array $B$ with integers that are as twice as large as those of $A$, so you need to multiply each of $A$'s integers by $2$, and then store it in $B$.

Because the doubling of each integer is independent from the doubling of any other integer (that is, you can multiply them separately,) we could run the code for multiplying each integer on separate CPU cores in parallel!

If our CPU only has $1$ core, we have no choice but to run all of those computations on that single core. But if we have $2$ cores, for instance, we could run $20$ multiplications on one core, and the other $20$ on the 2nd core. Similarly, if our CPU is a quad-core one, we can run $10$ multiplications on each of the $4$ cores. In other words, the more cores your computer has, the faster the program will get to complete its mission.