nr-softmodem: reproducible segfault in libuhd with Ettus N310 SDR (including proposed fix)
I am trying to get OAI's nr-softmodem to run with an Ettus URSP N310, using targets/PROJECTS/GENERIC-NR-5GC/CONF/gnb.band78.sa.fr1.106PRB.2x2.usrpn310.conf with just having adapted the IP addresses of the core and device. Regardless of trying with UHD 4.1 to 4.4, and regardless of the tags for OAI used (latest tried: 2023.w36), running nr-softmodem reproducibly segfaults after a few seconds in libuhd:
(gdb) bt full
#0 __memmove_avx_unaligned_erms () at ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:540
No locals.
#1 0x00007fffe1ced9bf in uhd::transport::rx_streamer_impl<uhd::rfnoc::chdr_rx_data_xport, false>::recv(uhd::ref_vector<void*> const&, unsigned long, uhd::rx_metadata_t&, double, bool) () from /usr/local/lib/libuhd.so.4.4.0
No symbol table info available.
#2 0x00007fffe29c762b in trx_usrp_read (device=0x55555a1d9700, ptimestamp=0x7fffef7fd2c8, buff=0x7fffef7fd1d0, nsamps=30720, cc=2) at /home/nornetpp/src/openairinterface5g/radio/USRP/usrp_lib.cpp:757
buff_ptrs = std::vector of length 2, capacity 2 = {0x7ffff000f860, 0x7ffff002d860}
s = 0x7fffdc094aa0
samples_received = 272233
nsamps2 = 3840
buff_tmp = <error reading variable buff_tmp (value requires 245760 bytes, which is more than max-value-size)>
read_count = 0
rxshift = 2
__FUNCTION__ = "trx_usrp_read"
recPlay = 0x7fffdc184b30
#3 0x0000555555f8a235 in rx_rf (ru=0x55555a1d9280, frame=0x7fffef7fd7e8, slot=0x7fffef7fd7e4) at /home/nornetpp/src/openairinterface5g/executables/nr-ru.c:655
proc = 0x55555a1daeb0
fp = 0x7ffff782d010
cfg = 0x55555a1d9db8
rxp = {0x7fffee0f2040, 0x7fffede99040}
rxs = 4146746579
i = 2
samples_per_slot = 30720
samples_per_slot_prev = 32767
ts = 140737211524672
old_ts = 0
__FUNCTION__ = "rx_rf"
gps_sec = 6.9533421303430063e-310
#4 0x0000555555f95434 in ru_thread (param=0x55555a1d9280) at /home/nornetpp/src/openairinterface5g/executables/nr-ru.c:1257
sl = 0
absslot_rx = 0
rt_prof_idx = 0
slot_type = 0
--Type <RET> for more, q to quit, c to continue without paging--
ru_thread_status = 0
ru = 0x55555a1d9280
proc = 0x55555a1daeb0
fp = 0x7ffff782d010
gNB = 0x7ffff7469010
ret = 0
slot = 0
frame = 0
threadname = "ru_thread 0", '\000' <repeats 28 times>
initial_wait = 0
opp_enabled0 = 0
cfg = 0x55555a1d9378
__FUNCTION__ = "ru_thread"
syncMsg = 0x0
res = 0x0
slot_start = {tv_sec = 1182185, tv_nsec = 141695065}
slot_duration = {tv_sec = 0, tv_nsec = 500000}
#5 0x00007ffff7294b43 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
ret = <optimized out>
pd = <optimized out>
out = <optimized out>
unwind_buf = {cancel_jmp_buf = {{jmp_buf = {4294967295, -2513490550454601873, 140737211524672, 0, 140737340065872, 140737488343888, -2513490550927509649, -2513438483471124625}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0,
0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
not_first_call = <optimized out>
#6 0x00007ffff7326a00 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
No locals.
(gdb)
The issue occurs in this code (radio/USRP/usrp_lib.cpp of OAI), in line 757:
samples_received=0;
while (samples_received != nsamps) {
if (cc>1) {
// receive multiple channels (e.g. RF A and RF B)
std::vector<void *> buff_ptrs;
for (int i=0; i<cc; i++) buff_ptrs.push_back(buff_tmp[i]+samples_received);
samples_received += s->rx_stream->recv(buff_ptrs, nsamps, s->rx_md);
} else {
// receive a single channel (e.g. from connector RF A)
samples_received += s->rx_stream->recv((void*)((int32_t*)buff_tmp[0]+samples_received),
nsamps-samples_received, s->rx_md);
}
if ((s->wait_for_first_pps == 0) && (s->rx_md.error_code!=uhd::rx_metadata_t::ERROR_CODE_NONE))
break;
if ((s->wait_for_first_pps == 1) && (samples_received != nsamps)) {
printf("sleep...\n"); //usleep(100);
}
}
Reading the code, it seems to be for reading N channels and reading 1 channel. For reading 1 channel, the length given to s->rx_stream->recv is nsamps-samples_received. However, for reading N channels, the length is always nsamps. Since the buffer may have already been incremented by samples_received > 0, this leads to overwriting the stack. So, this is very likely a bug in libuhd.
Changing the code by:
for (int i=0; i<cc; i++) buff_ptrs.push_back(buff_tmp[i]+samples_received);
- samples_received += s->rx_stream->recv(buff_ptrs, nsamps, s->rx_md);
+ samples_received += s->rx_stream->recv(buff_ptrs, nsamps-samples_received, s->rx_md);
} else {
lets nr-softmodem proceed without segfault.