To get best performance the graphics stack aims to process multiple render passes in parallel: one being built on the CPU, one being vertex shaded, and one being fragment shaded. The rendering pipeline is therefore very deep – many milliseconds in length – and can even overlap render passes belonging to two neighboring frames. This overlapping of workloads ensures that the available processing units are kept busy, all of the time.
The processing time of each of the three component workloads of a render pass is not usually identical. A well scheduled workload which is processing limited will see the most heavily loaded pipeline stage running all of the time, and the other two going idle periodically waiting for the slowest stage to catch up. The swim lane diagram below shows the typical pipelining for two frames of content which is bottlenecked by fragment processing performance.