Consider we need to use adders and store the results into some register, we could save an adder and add a MUX. However, consider that MUX is often bigger than adders.
This is useful though for larger components such as division and multiplier components. However, tools will often not share these blocks because FPGAs has many built in multiplier and divider blocks.
This is an optimization problem.
OpenCL has a specific syntax to describe parallel operations.
Pragma: an annotation inserted in the C code that tells what the hardware should look like. Some pragma exmaples: