Let's do some math now to find how fast pipelining works compared to sequential exeuction.
Suppose that some code has $n$ instructions, and that the CPU can break the fetch-decode-execute cycle into $k$ small steps (the previous slide introduced 6 steps, for example.) Also, suppose that each small step needs only one clock cycle to complete.
Note that, if $k \ge 3$ and $n \ge 3$, the quantity $k + n - 1$ is much smaller than $n\cdot k$, which means that pipelining is indeed faster comparing to sequential exeuction! (Remember: fewer clock cycles needed = faster execution.)