It seems like the TF32 format is similar to BF16 but with 3 more precision bits ...

It seems like the TF32 format is similar to BF16 but with 3 more precision bits (in other words, it is FP32 with 13 low-order bits dropped instead of 16).

Full adders in FP units scale with the square of the mantissa bits, so if the number of mantissa bits stay the same, they can reuse the existing units.

Since Nvidia already has existing FP16 units on die, using those units for TF32 calculations probably doesn't cost too much additional die area.