Same head PDCP SDU is popped twice by two threads in TDD. Causing memory leak and eventually crash UE by memory allocation failure
**symptom: **
UE asserts in get_free_mem_block()
due to failing the MEM_BLOCK allocation. It happened after running iperf downlink test for few 10mins.
**cause: **
the two threads UE_thread_rxn_txnp4()
access to the same head SDU from pdcp_sdu_list
when they both are doing the pdcp_fifo_flush_sdus()
. The log below capture this issue.
[PDCP][I][pdcp_data_ind] inst=62004 size=1460
[PDCP][I][pdcp_data_ind] inst=62005 size=58
[PDCP][I][pdcp_data_ind] inst=62006 size=1460
[PDCP][I][pdcp_fifo_flush_sdus] [rxn_txnp4_odd] inst=62004 size=1460
[PDCP][I][pdcp_fifo_flush_sdus] [rxn_txnp4_even] inst=62004 size=1460
[PDCP][I][pdcp_fifo_flush_sdus] [rxn_txnp4_even] inst=62005 size=58
[PDCP][I][pdcp_fifo_flush_sdus] [rxn_txnp4_even] inst=62006 size=1460
[PDCP][I][pdcp_fifo_flush_sdus] 1 skip free_mem_block: pdcp_output_sdu_bytes_to_write = -58
[PDCP][I][pdcp_fifo_flush_sdus] 6 skip free_mem_block: bytes_wrote = -58
It happens when the "even" UE_thread_rxn_txnp4()
is processing a downlink subframe, and the "odd" UE_thread_rxn_txnp4()
is processing a special subframe. The "even" goes to pdcp_fifo_flush_sdus()
at a later time in a TTI as it needs to finish the phy_procedures_UE_RX()
; whereas the "odd" goes straight to the pdcp_fifo_flush_sdus()
at the beginning of a TTI when it skips all the phy_procedures_UE_*()
. The collision only occurs by chances but once it happens the memory leak begins and does not stop until memory allocation fails at get_free_mem_block()
.
When the collision occurs, the global variable pdcp_output_sdu_bytes_to_write
is changed by the other thread unexpectedly. The pdcp_fifo_flush_sdus()
then never runs into the part that call free_mem_block()
, causing memory leak.
** fix: **
Add mutex protection to the pdcp_fifo_flush_sdus()
. The protection can be disabled by commenting out the macro PDCP_SDU_FLUSH_LOCK
.
Also added log when such collision occurs, as follow.
[PDCP][I][pdcp_data_ind] inst=698474 size=1460
[PDCP][I][pdcp_data_ind] inst=698475 size=58
[PDCP][I][pdcp_data_ind] inst=698476 size=1460
[PDCP][I][pdcp_data_ind] inst=698477 size=58
[PDCP][I][pdcp_fifo_flush_sdus] [rxn_txnp4_even] SFN/SF=77808/0 inst=698474 size=1460
[PDCP][W][pdcp_fifo_flush_sdus] [rxn_txnp4_odd] at SFN/SF=77808/1 wait for PDCP FIFO to be unlocked
[PDCP][I][pdcp_fifo_flush_sdus] [rxn_txnp4_even] SFN/SF=77808/0 inst=698475 size=58
[PDCP][I][pdcp_fifo_flush_sdus] [rxn_txnp4_even] SFN/SF=77808/0 inst=698476 size=1460
[PDCP][I][pdcp_fifo_flush_sdus] [rxn_txnp4_even] SFN/SF=77808/0 inst=698477 size=58
[PDCP][I][pdcp_fifo_flush_sdus] [rxn_txnp4_odd] at SFN/SF=77808/1 PDCP FIFO is unlocked
**other: **
Expect this issue is unlikely to happen in FDD. And expect the issue is prominent in TDD only when PDSCH is disabled in special subframe, i.e. special-subframe-conf is 0 or 5.