AVX512 modifications for LDPC encoding: interleaving, ZC384 BG1, output...
AVX512 modifications for LDPC encoding: interleaving, ZC384 BG1, output formatting for encoder, segmentation (memcpy instead of loop). some improvement in TX for aarch64 in same places where AVX512 support was added. Also, rate matching and interleaving are done on bytes containing 8 segments and reformatting of the output is done at the end of segment processing instead of after ldpc encoding.
This improves the overall performance of the NR DL transmitter in gNB. Here is a summary of times on some machines at EURECOM : matix = 5.9 GHz, Ryzen Gen4, peafowl = 4.1 GHz EPYC 9374F, stupix = 3.6 GHz Xeon Gold 6354, broadbill = 3.0 GHz EPYC 8534P falcon-gh200 = 3.6 GHz Nvidia gh200
sudo ./nr_dlsim -n100 -P -x2 -y4 -z4 -R273 -b273 -e 25 -s30
ldpc_enc_avx512 (matix) 280 us
develop (matix) 328 us
ldpc_enc_avx512 (peafowl) 355 us
ldpc_enc_avx512 (peafowl with T2) 238 us
develop (peafowl) 434 us
ldpc_enc_avx512 (falcon-gh200) 477 us
develop (falcon-gh200) 707 us
ldpc_enc_avx512 (broadbill) 516 us
develop (broadbill) 594 us
sudo ./nr_dlsim -n100 -P -x2 -y4 -z4 -R273 -b273 -e 25 -s30 -X 8,9,10,11,12
ldpc_enc_avx512 (matix) 197 us (-1,-1,-1,-1-,1)
develop (matix) 253 (-1,-1,-1,-1,-1)
ldpc_enc_avx512 (peafowl) 280 us
ldpc_enc_avx512 (peafowl with T2) 238 us
develop (peafowl) 317 us
ldpc_enc_avx512 (falcon-gh200) 332 us (4,5,6,7,8)
develop (falcon-gh200) 419 us (4,5,6,7,8)
ldpc_enc_avx512 (broadbill) 407 us (55,56,57,58,59)
develop (broadbill) 426 us (55,56,57,58,59)
sudo ./nr_dlsim -n100 -P -x2 -y4 -z4 -R273 -b273 -e 25 -s30 -q1
ldpc_enc_avx512 (matix) 348 us
develop (matix) 466 us
ldpc_enc_avx512 (peafowl) 472 us
ldpc_enc_avx512 (peafowl with T2) 272 us
develop (peafowl) 605 us
ldpc_enc_avx512 (falcon-gh200) 595 us
develop (falcon-gh200) 990 us
ldpc_enc_avx512 (broadbill) 665 us
develop (broadbill) 833 us
sudo ./nr_dlsim -n100 -P -x2 -y4 -z4 -R273 -b273 -e 25 -s30 -q1 -X 8,9,10,11,12
ldpc_enc_avx512 (matix) 296 us (-1,-1,-1,-1,-1)
develop (matix) 330 (-1,-1,-1,-1,-1)
ldpc_enc_avx512 (peafowl) 315 us (8,9,10,11,12)
ldpc_enc_avx512 (peafowl with T2) 272 us
develop (peafowl) 402 us
ldpc_enc_avx512 (falcon-gh200) 364 us (4,5,6,7,8)
develop (falcon-gh200) 496 us (4,5,6,7,8)
ldpc_enc_avx512 (broadbill) 431 us (54,55,56,57,58)
develop (broadbill) 481 us (54,55,56,57,58)
Edited by Jaroslava Fiedlerova