feat(vrtsim): CUDA channel simulation pipeline
This merge request introduces a CUDA pipeline to accelerate vrtsim's real-time channel simulation, enabling stable performance with complex channel models. The implementation is a fully asynchronous pipeline that uses pinned memory and bulk API calls to minimize CPU and driver overhead. Data preparation, including padding and type conversion, has been offloaded to a dedicated CUDA kernel. Additionally, the C-API was refined to allow vrtsim to consume results directly from a pinned GPU output buffer, eliminating a redundant host-side memory copy. The entire pipeline is integrated into the existing actor model to ensure non-blocking execution.
For validation and profiling, a new comprehensive benchmark (test_vrtsim_wrapper) has been added.
The feature can be enabled at runtime using the flags: --vrtsim.chanmod 1 --vrtsim.use_gpu 1.