Mastering Clang's SIMD Optimization for Floating Point Multiplication