PDSCH precoding optimization
Based on !2395 (merged): PDSCH clean up.
This MR does:
- Enable PDSCH precoding (
pmi
is no longer hardcoded to 0) - SIMD maltrix multiplication: Improves PDSCH layer precoding by 320%. We do 4 REs with 128 SIMDe within an iteration and 3 iterations for 1 RB with new function
nr_layer_precoder_1RB()
. Call the original non SIMD precodernr_layer_precoder_cm()
if the sub carrier offset is crossing ofdm symbol size. - Not yet in this MR:
- 256 SIMD (maybe 2 RB in a row?)
- Prefetch
- Cache obilivious (we are doing Y = W*X, since W is not big, cache obilivious might not bring much improvments)
Measurements
On sithonia (x86, ubuntu)
sudo ./nr_dlsim -n200 -s35.8 -S40 -x2 -y4 -z4 -e18 -q1 -R106 -b106 -P -p1
2023.w40 |
128 SIMD, 1 RB |
256 SIMD, 2 RB |
||
---|---|---|---|---|
Layer Precoding time | 848.95 us | 605.74 us | 144.34 us | 81.49 us |
Edited by Quency Lin