An interesting fact that we learn from this formula is that the number of CPU cores in the device is a limiting factor: creating more and more threads won't make code run faster. To actually reach the maximum speedup, the program must create a number of threads that is equal to the number of existing CPU cores (not more and not less) that will run that section of code.
Example: If a computer has 4 CPU cores, and the portion of the program that is supposed to manifest parallelism is 30%, then the overall speedup of the whole program is:
so the program will run about 1.3 times faster than if it were to run fully on a single core.