Optimizations of PDSCH Resource Mapping in nr_dlsch.c/nr_modulation.c
These changes add SIMD optimizations for Neon/AVX2/AVX512 in the PDSCH transmit path. The timing improvements are listed here based on the
nr_dlsim -e25 -R273 -b273 -s30 -x "layers" -y 4 -z 4 -P
benchmark with "layers" 2,3,4 and comparing "PHY proc tx":
273 PRBS, mcs25, 64QAM
peafowl (gcc11,AMD EPYC 9374F)
- 2-layer, 4 TX : 431 us (develop 565 us)
- 3-layer, 4 TX : 692 us (develop 849 us)
- 4-layer, 4 TX : 963 us (develop 1172 us)
stupix (gcc10, Xeon Gold 6354)
- 2-layer, 4 TX : 568 us (develop 652 us)
- 3-layer, 4 TX : 901 us (develop 1030 us)
- 4-layer, 4 TX : 1250 us (develop 1396 us)
matix (gcc14, Ryzen 9 PRO 7945)
- 2-layer, 4 TX : 317 us (develop 505 us)
- 3-layer, 4 TX : 538 us (develop 779 us)
- 4-layer, 4 TX : 767 us (develop 1233 us)
Edited by knopp