Skip to content

Improvements for LDPC encoding

knopp requested to merge ldpc_enc_avx512 into develop

AVX512 modifications for LDPC encoding: interleaving, ZC384 BG1, output formatting for encoder, segmentation (memcpy instead of loop). some improvement in TX for aarch64 in same places where AVX512 support was added. Also, rate matching and interleaving are done on bytes containing 8 segments and reformatting of the output is done at the end of segment processing instead of after ldpc encoding.

This improves the overall performance of the NR DL transmitter in gNB. Here is a summary of times on some machines at EURECOM : matix = 5.9 GHz, Ryzen Gen4, peafowl = 4.1 GHz EPYC 9374F, stupix = 3.6 GHz Xeon Gold 6354, broadbill = 3.0 GHz EPYC 8534P falcon-gh200 = 3.6 GHz Nvidia gh200

sudo ./nr_dlsim -n100 -P -x2 -y4 -z4 -R273 -b273 -e 25 -s30

ldpc_enc_avx512 (matix) 252 us
develop (matix) 328 us
ldpc_enc_avx512 (peafowl) 337 us
ldpc_enc_avx512 (peafowl with T2) 209 us
develop (peafowl) 434 us
ldpc_enc_avx512 (falcon-gh200) 462 us
develop (falcon-gh200) 707 us
ldpc_enc_avx512 (broadbill) 473 us
ldpc_enc_avx512 (broadbill with T2)  284 us
develop (broadbill) 594 us

sudo ./nr_dlsim -n100 -P -x2 -y4 -z4 -R273 -b273 -e 25 -s30 -X 8,9,10,11,12

ldpc_enc_avx512 (matix) 187 us (-1,-1,-1,-1-,1)
develop (matix) 253 (-1,-1,-1,-1,-1)
ldpc_enc_avx512 (peafowl) 280 us
ldpc_enc_avx512 (peafowl with T2) 209 us
develop (peafowl) 317 us
ldpc_enc_avx512 (falcon-gh200) 332 us (4,5,6,7,8)
develop (falcon-gh200) 419 us (4,5,6,7,8)
ldpc_enc_avx512 (broadbill) 389 us (42,43,44,45,46)
develop (broadbill) 426 us (42,43,44,45,46)

sudo ./nr_dlsim -n100 -P -x2 -y4 -z4 -R273 -b273 -e 25 -s30 -q1

ldpc_enc_avx512 (matix) 314 us
develop (matix) 466 us
ldpc_enc_avx512 (peafowl) 472 us
ldpc_enc_avx512 (peafowl with T2) 241 us
develop (peafowl) 605 us
ldpc_enc_avx512 (falcon-gh200) 566 us 
develop (falcon-gh200) 990 us
ldpc_enc_avx512 (broadbill) 601 us
ldpc_enc_avx512 (broadbill with T2) 328 us
develop (broadbill) 833 us

sudo ./nr_dlsim -n100 -P -x2 -y4 -z4 -R273 -b273 -e 25 -s30 -q1 -X 8,9,10,11,12

ldpc_enc_avx512 (matix) 217 us (-1,-1,-1,-1,-1)
develop (matix) 330 (-1,-1,-1,-1,-1)
ldpc_enc_avx512 (peafowl) 315 us (8,9,10,11,12)
ldpc_enc_avx512 (peafowl with T2) 241 us
develop (peafowl) 402 us
ldpc_enc_avx512 (falcon-gh200) 364 us (4,5,6,7,8)
develop (falcon-gh200) 496 us (4,5,6,7,8)
ldpc_enc_avx512 (broadbill) 413 us (42,43,44,45,46)
develop (broadbill) 481 us (42,43,44,45,46)
Edited by Jaroslava Fiedlerova

Merge request reports

Loading