Skip to content

AVX512 modifications for LDPC encoding: interleaving, ZC384 BG1, output...

knopp requested to merge ldpc_enc_avx512 into develop

AVX512 modifications for LDPC encoding: interleaving, ZC384 BG1, output formatting for encoder, segmentation (memcpy instead of loop). some improvement in TX for aarch64 in same places where AVX512 support was added. Also, rate matching and interleaving are done on bytes containing 8 segments and reformatting of the output is done at the end of segment processing instead of after ldpc encoding.

This improves the overall performance of the NR DL transmitter in gNB. Here is a summary of times on some machines at EURECOM : matix = 5.9 GHz, Ryzen Gen4, peafowl = 4.1 GHz EPYC 9374F, stupix = 3.6 GHz Xeon Gold 6354, broadbill = 3.0 GHz EPYC 8534P falcon-gh200 = 3.6 GHz Nvidia gh200

sudo ./nr_dlsim -n100 -P -x2 -y4 -z4 -R273 -b273 -e 25 -s30

ldpc_enc_avx512 (matix) 280 us
develop (matix) 328 us
ldpc_enc_avx512 (peafowl) 355 us
ldpc_enc_avx512 (peafowl with T2) 238 us
develop (peafowl) 434 us
ldpc_enc_avx512 (falcon-gh200) 477 us
develop (falcon-gh200) 707 us
ldpc_enc_avx512 (broadbill) 516 us
develop (broadbill) 594 us

sudo ./nr_dlsim -n100 -P -x2 -y4 -z4 -R273 -b273 -e 25 -s30 -X 8,9,10,11,12

ldpc_enc_avx512 (matix) 197 us (-1,-1,-1,-1-,1)
develop (matix) 253 (-1,-1,-1,-1,-1)
ldpc_enc_avx512 (peafowl) 280 us
ldpc_enc_avx512 (peafowl with T2) 238 us
develop (peafowl) 317 us
ldpc_enc_avx512 (falcon-gh200) 332 us (4,5,6,7,8)
develop (falcon-gh200) 419 us (4,5,6,7,8)
ldpc_enc_avx512 (broadbill) 407 us (55,56,57,58,59)
develop (broadbill) 426 us (55,56,57,58,59)

sudo ./nr_dlsim -n100 -P -x2 -y4 -z4 -R273 -b273 -e 25 -s30 -q1

ldpc_enc_avx512 (matix) 348 us
develop (matix) 466 us
ldpc_enc_avx512 (peafowl) 472 us
ldpc_enc_avx512 (peafowl with T2) 272 us
develop (peafowl) 605 us
ldpc_enc_avx512 (falcon-gh200) 595 us 
develop (falcon-gh200) 990 us
ldpc_enc_avx512 (broadbill) 665 us
develop (broadbill) 833 us

sudo ./nr_dlsim -n100 -P -x2 -y4 -z4 -R273 -b273 -e 25 -s30 -q1 -X 8,9,10,11,12

ldpc_enc_avx512 (matix) 296 us (-1,-1,-1,-1,-1)
develop (matix) 330 (-1,-1,-1,-1,-1)
ldpc_enc_avx512 (peafowl) 315 us (8,9,10,11,12)
ldpc_enc_avx512 (peafowl with T2) 272 us
develop (peafowl) 402 us
ldpc_enc_avx512 (falcon-gh200) 364 us (4,5,6,7,8)
develop (falcon-gh200) 496 us (4,5,6,7,8)
ldpc_enc_avx512 (broadbill) 431 us (54,55,56,57,58)
develop (broadbill) 481 us (54,55,56,57,58)
Edited by Jaroslava Fiedlerova

Merge request reports

Loading