Optimizing Data Copying and Locking in C++23 Bytewise Atomic Memcpy Explained