Only in the non-critical region, nuttx can the respond to the irq and not hold the lock
When returning from the irq, there is no need to check whether the lock needs to be restored
test:
We can use qemu for testing.
compiling
make distclean -j20; ./tools/configure.sh -l qemu-armv8a:nsh_smp ;make -j20
running
qemu-system-aarch64 -cpu cortex-a53 -smp 4 -nographic -machine virt,virtualization=on,gic-version=3 -net none -chardev stdio,id=con,mux=on -serial chardev:con -mon chardev=con,mode=readline -kernel ./nuttx
Signed-off-by: hujun5 <hujun5@xiaomi.com>
cpu0 cpu1:
user_main
signest_test
sched_unlock
nxsched_merge_pending
nxsched_add_readytorun
up_cpu_pause
arm_sigdeliver
enter_critical_section
Reason:
In the SMP, cpu0 is already in the critical section and waiting for cpu1 to enter the suspended state.
However, when cpu1 executes arm_sigdeliver, it is in the irq-disabled state but not in the critical section.
At this point, cpu1 is unable to respond to interrupts and
is continuously attempting to enter the critical section, resulting in a deadlock.
Resolve:
adjust the logic, do not entering the critical section when interrupt-disabled.
test:
We can use qemu for testing.
compiling
make distclean -j20; ./tools/configure.sh -l qemu-armv8a:nsh_smp ;make -j20
running
qemu-system-aarch64 -cpu cortex-a53 -smp 4 -nographic -machine virt,virtualization=on,gic-version=3 -net none -chardev stdio,id=con,mux=on -serial chardev:con -mon chardev=con,mode=readline -kernel ./nuttx
Signed-off-by: hujun5 <hujun5@xiaomi.com>
chip/s698pm_cpustart.c: In function 's698pm_cpu_boot':
Error: chip/s698pm_cpustart.c:74:17: error: unused variable 'tcb' [-Werror=unused-variable]
struct tcb_s *tcb = this_task();
^~~
cc1: all warnings being treated as errors
make[1]: *** [Makefile:99: s698pm_cpustart.o] Error 1
make[1]: Target 'libarch.a' not remade because of errors.
make: *** [tools/LibTargets.mk:164: arch/sparc/src/libarch.a] Error 2
make: Target 'all' not remade because of errors.
/github/workspace/sources/nuttx/tools/testbuild.sh: line 370: /github/workspace/sources/nuttx/../nuttx/nuttx.manifest: No such file or directory
Normalize s698pm-dkit/smp
Signed-off-by: hujun5 <hujun5@xiaomi.com>
cpu0 thread0: cpu1:
sched_yield()
nxsched_set_priority()
nxsched_running_setpriority()
nxsched_reprioritize_rtr()
nxsched_add_readytorun()
up_cpu_pause()
IRQ enter
arm64_pause_handler()
enter_critical_section() begin
up_cpu_paused() pick thread0
arm64_restorestate() set thread0 tcb->xcp.regs to CURRENT_REGS
up_switch_context()
thread0 -> thread1
arm64_syscall()
case SYS_switch_context
change thread0 tcb->xcp.regs
restore_critical_section()
enter_critical_section() done
leave_critical_section()
IRQ leave with restore CURRENT_REGS
ERROR !!!
Reason:
As descript above, cpu0 swith task: thread0 -> thread1, and the
syscall() execute slowly, this time cpu1 pick thread0 to run at
up_cpu_paused(). Then cpu0 syscall execute, cpu1 IRQ leave error.
Resolve:
Move arm64_restorestate() after enter_critical_section() done
This is a continued fix with:
https://github.com/apache/nuttx/pull/6833
Signed-off-by: ligd <liguiding1@xiaomi.com>
Newly added logging in `sched/task_exit.c` obsoletes the existing
ones in `arch/up_exit()`, thus remove the latter to reduce duplications.
Signed-off-by: Yanfeng Liu <yfliu2008@qq.com>
When supporting high-priority interrupts, updating the
g_running_tasks within a high-priority interrupt may be
cause problems. The g_running_tasks should only be updated
when it is determined that a task context switch has occurred.
Signed-off-by: zhangyuan21 <zhangyuan21@xiaomi.com>
Remove TABs
Fix indentation
Fix Multi-line comments
Fix Comments to the Right of Statements.
Fix nuttx coding style
Fix Comments to the Right of Statements.
to avoid the infinite recusive dispatch:
*0 myhandler (signo=27, info=0xf3e38b9c, context=0x0) at ltp/testcases/open_posix_testsuite/conformance/interfaces/sigqueue/7-1.c:39
*1 0x58f1c39e in nxsig_deliver (stcb=0xf4e20f40) at signal/sig_deliver.c:167
*2 0x58fa0664 in up_schedule_sigaction (tcb=0xf4e20f40, sigdeliver=0x58f1bab5 <nxsig_deliver>) at sim/sim_schedulesigaction.c:88
*3 0x58f19907 in nxsig_queue_action (stcb=0xf4e20f40, info=0xf4049334) at signal/sig_dispatch.c:115
*4 0x58f1b089 in nxsig_tcbdispatch (stcb=0xf4e20f40, info=0xf4049334) at signal/sig_dispatch.c:435
*5 0x58f31853 in nxsig_unmask_pendingsignal () at signal/sig_unmaskpendingsignal.c:104
*6 0x58f1ca09 in nxsig_deliver (stcb=0xf4e20f40) at signal/sig_deliver.c:199
*7 0x58fa0664 in up_schedule_sigaction (tcb=0xf4e20f40, sigdeliver=0x58f1bab5 <nxsig_deliver>) at sim/sim_schedulesigaction.c:88
*8 0x58f19907 in nxsig_queue_action (stcb=0xf4e20f40, info=0xf4049304) at signal/sig_dispatch.c:115
*9 0x58f1b089 in nxsig_tcbdispatch (stcb=0xf4e20f40, info=0xf4049304) at signal/sig_dispatch.c:435
*10 0x58f31853 in nxsig_unmask_pendingsignal () at signal/sig_unmaskpendingsignal.c:104
*11 0x58f1ca09 in nxsig_deliver (stcb=0xf4e20f40) at signal/sig_deliver.c:199
*12 0x58fa0664 in up_schedule_sigaction (tcb=0xf4e20f40, sigdeliver=0x58f1bab5 <nxsig_deliver>) at sim/sim_schedulesigaction.c:88
*13 0x58f19907 in nxsig_queue_action (stcb=0xf4e20f40, info=0xf40492d4) at signal/sig_dispatch.c:115
*14 0x58f1b089 in nxsig_tcbdispatch (stcb=0xf4e20f40, info=0xf40492d4) at signal/sig_dispatch.c:435
*15 0x58f31853 in nxsig_unmask_pendingsignal () at signal/sig_unmaskpendingsignal.c:104
*16 0x58f1ca09 in nxsig_deliver (stcb=0xf4e20f40) at signal/sig_deliver.c:199
*17 0x58fa0664 in up_schedule_sigaction (tcb=0xf4e20f40, sigdeliver=0x58f1bab5 <nxsig_deliver>) at sim/sim_schedulesigaction.c:88
*18 0x58f19907 in nxsig_queue_action (stcb=0xf4e20f40, info=0xf40492a4) at signal/sig_dispatch.c:115
*19 0x58f1b089 in nxsig_tcbdispatch (stcb=0xf4e20f40, info=0xf40492a4) at signal/sig_dispatch.c:435
*20 0x58f31853 in nxsig_unmask_pendingsignal () at signal/sig_unmaskpendingsignal.c:104
*21 0x58f1ca09 in nxsig_deliver (stcb=0xf4e20f40) at signal/sig_deliver.c:199
*22 0x58fa0664 in up_schedule_sigaction (tcb=0xf4e20f40, sigdeliver=0x58f1bab5 <nxsig_deliver>) at sim/sim_schedulesigaction.c:88
*23 0x58f19907 in nxsig_queue_action (stcb=0xf4e20f40, info=0xf4049274) at signal/sig_dispatch.c:115
*24 0x58f1b089 in nxsig_tcbdispatch (stcb=0xf4e20f40, info=0xf4049274) at signal/sig_dispatch.c:435
*25 0x58f31853 in nxsig_unmask_pendingsignal () at signal/sig_unmaskpendingsignal.c:104
*26 0x58f1ca09 in nxsig_deliver (stcb=0xf4e20f40) at signal/sig_deliver.c:199
*27 0x58fa0664 in up_schedule_sigaction (tcb=0xf4e20f40, sigdeliver=0x58f1bab5 <nxsig_deliver>) at sim/sim_schedulesigaction.c:88
*28 0x58f19907 in nxsig_queue_action (stcb=0xf4e20f40, info=0xf4049244) at signal/sig_dispatch.c:115
*29 0x58f1b089 in nxsig_tcbdispatch (stcb=0xf4e20f40, info=0xf4049244) at signal/sig_dispatch.c:435
*30 0x58f31853 in nxsig_unmask_pendingsignal () at signal/sig_unmaskpendingsignal.c:104
*31 0x58f1ca09 in nxsig_deliver (stcb=0xf4e20f40) at signal/sig_deliver.c:199
Signed-off-by: Xiang Xiao <xiaoxiang@xiaomi.com>
in SMP, signal processing cannot be nested, we use xcp.sigdeliver to identify whether there is currently a signal being processed, but this state does not match the actual situation
One possible scenario is that signal processing has already been completed, but an interrupt occurs, resulting in xcp.sigdeliver not being correctly set to NULL,
At this point, a new signal arrives, which can only be placed in the queue and cannot be processed immediately
Our solution is that signal processing and signal complete status are set in the same critical section, which can ensure status synchronization
Signed-off-by: hujun5 <hujun5@xiaomi.com>
1. Get the value of sp from dump regs when an exception occurs,
to avoid getting the value of fp from up_getsp and causing
incomplete stack printing.
2. Determine which stack the value belongs to based on the value
of SP to avoid false reports of stack overflow
Signed-off-by: zhangyuan21 <zhangyuan21@xiaomi.com>