gold_cache: replace linear search with open-addressing hash table
When many UEs are active, the gold sequence cache has to be looked up frequently for each UE's scrambling. The linear scan gets slower as the table grows. Its periodic table reorder causes unpredictable latency spikes. This MR replaces it with a hash table
Here is a benchmark with 128 UEs (done with the tool of !4028 adapted to work with !3902 (merged))
=== Results: 128 UEs, 273 RBs, MCS 28, 10000 slots (warmup 5) ===
Avg PDSCH/slot: 0.3 / 128 UEs
Slot budget: 500 us (mu=1)
Phase mean p50 p90 p99 max
(us) (us) (us) (us) (us)
------------------------------------------------------------------
Scheduler 137.0 103.3 356.7 458.0 873.3
PHY TX 30.9 13.4 55.6 546.4 1053.6
Total 167.9 116.0 421.4 689.1 1126.1
------------------------------------------------------------------
Max total 1126.1 us at slot 3 (iter 115)
WARNING: max total (1126 us) exceeds slot budget (500 us)!
Breakdown /slot /call max calls
(us) (us) (us)
---------------------------------------------------------------------------
Scheduler:
Total 136.9 136.8 873.0 10000
RA scheduling - - - 0
UL scheduling 0.0 0.0 0.1 10000
DL scheduling (PDCCH+PDSCH) 79.3 79.3 537.9 10000
RLC data req 0.3 1.1 7.4 2357
PHY TX:
Total - - - 0
DCI generation - - - 0
DLSCH encoding 7.3 25.6 90.2 2852
segmentation 0.1 0.2 1.0 2967
rate matching 3.0 9.9 22.0 2967
scrambling 6.1 20.7 689.4 2967
DLSCH modulation 0.9 3.2 9.7 2967
layer mapping 1.7 5.8 8.5 2967
precoding 1.7 0.5 1.1 30827
resource mapping 3.5 1.1 1.4 30827
phase compensation 3.4 3.4 4.9 10000
---------------------------------------------------------------------------
Done.After:
=== Results: 128 UEs, 273 RBs, MCS 28, 10000 slots (warmup 5) ===
Avg PDSCH/slot: 1.1 / 128 UEs
Slot budget: 500 us (mu=1)
Phase mean p50 p90 p99 max
(us) (us) (us) (us) (us)
------------------------------------------------------------------
Scheduler 56.4 47.3 97.9 133.1 274.7
PHY TX 53.3 47.9 72.9 87.6 320.3
Total 109.7 97.1 142.6 187.3 435.8
------------------------------------------------------------------
Max total 435.8 us at iter 8528
Breakdown /slot /call max calls
(us) (us) (us)
---------------------------------------------------------------------------
Scheduler:
Total 60.6 48.5 223.5 12498
RA scheduling - - - 0
UL scheduling 0.0 0.0 0.1 12498
DL scheduling (PDCCH+PDSCH) 41.4 33.1 111.4 12498
RLC data req 0.8 0.9 6.9 9150
PHY TX:
Total - - - 0
DCI generation - - - 0
DLSCH encoding 24.2 24.2 289.2 10000
segmentation 0.2 0.2 0.8 11440
rate matching 10.5 9.2 275.5 11440
scrambling 0.8 0.7 16.5 11440
DLSCH modulation 2.9 2.6 9.0 11440
layer mapping 6.3 5.5 8.2 11440
precoding 6.6 0.5 1.0 138760
resource mapping 2.2 0.2 3.0 138760
phase compensation 3.3 3.3 3.5 10000
---------------------------------------------------------------------------