Draft: Fix alignment of Tpool user data to 32 bytes
The thread pool provides user data to be stored by (pre-)allocating the necessary memory. A previous attempt was made to have this user data aligned on a 32 byte boundary (e.g., to prevent segfault with SIMD instructions, or avoid inefficient data access); the current implementation, however, leads to unaligned memory access.
This patch attempts again to implement user data to be 32 byte aligned. First, use memalign() to allocate the actual job on a 32 byte boundary. Second, use alignas(32) to align the pointer to the user data to be aligned to 32 bytes. Since it is the last member of the struct, this ensures that user data, which is allocated right behind it, will be aligned to 32 bytes as well.