The comment about the CPU index remaining stable is incorrect. There is no
guarantee the task does not yield during the exit process, meaning the CPU
can most definitely change. Also, there is no reason why it should not be
allowed to change.
This fixes a full system crash during process exit when the CPU changes
and we query the current task from the old CPU.
reason:
In the kernel, we are planning to remove all occurrences of up_cpu_pause as one of the steps to
simplify the implementation of critical sections. The goal is to enable spin_lock_irqsave to encapsulate critical sections,
thereby facilitating the replacement of critical sections(big lock) with smaller spin_lock_irqsave(small lock)
Signed-off-by: hujun5 <hujun5@xiaomi.com>
reason:
1 To improve efficiency, we mimic Linux's behavior where preemption disabling is only applicable to the current CPU and does not affect other CPUs.
2 In the future, we will implement "spinlock+sched_lock", and use it extensively. Under such circumstances, if preemption is still globally disabled, it will seriously impact the scheduling efficiency.
3 We have removed g_cpu_lockset and used irqcount in order to eliminate the dependency of schedlock on critical sections in the future, simplify the logic, and further enhance the performance of sched_lock.
4 We set lockcount to 1 in order to lock scheduling on all CPUs during startup, without the need to provide additional functions to disable scheduling on other CPUs.
5 Cpu1~n must wait for cpu0 to enter the idle state before enabling scheduling because it prevents CPUs1~n from competing with cpu0 for the memory manager mutex, which could cause the cpu0 idle task to enter a wait state and trigger an assert.
size nuttx
before:
text data bss dec hex filename
265396 51057 63646 380099 5ccc3 nuttx
after:
text data bss dec hex filename
265184 51057 63642 379883 5cbeb nuttx
size -216
Configuring NuttX and compile:
$ ./tools/configure.sh -l qemu-armv8a:nsh_smp
$ make
Running with qemu
$ qemu-system-aarch64 -cpu cortex-a53 -smp 4 -nographic \
-machine virt,virtualization=on,gic-version=3 \
-net none -chardev stdio,id=con,mux=on -serial chardev:con \
-mon chardev=con,mode=readline -kernel ./nuttx
Signed-off-by: hujun5 <hujun5@xiaomi.com>
reason:
1In the scenario of active waiting, context switching is inevitable, and we can eliminate redundant judgments.
code size
before
hujun5@hujun5-OptiPlex-7070:~/downloads1/vela_sim/nuttx$ size nuttx
text data bss dec hex filename
262848 49985 63893 376726 5bf96 nuttx
after
hujun5@hujun5-OptiPlex-7070:~/downloads1/vela_sim/nuttx$ size nuttx
text data bss dec hex filename
263324 49985 63893 377202 5c172 nuttx
reduce code size by -476
Configuring NuttX and compile:
$ ./tools/configure.sh -l qemu-armv8a:nsh_smp
$ make
Running with qemu
$ qemu-system-aarch64 -cpu cortex-a53 -smp 4 -nographic \
-machine virt,virtualization=on,gic-version=3 \
-net none -chardev stdio,id=con,mux=on -serial chardev:con \
-mon chardev=con,mode=readline -kernel ./nuttx
Signed-off-by: hujun5 <hujun5@xiaomi.com>
Most tools used for compliance and SBOM generation use SPDX identifiers
This change brings us a step closer to an easy SBOM generation.
Signed-off-by: Alin Jerpelea <alin.jerpelea@sony.com>
we can use g_cpu_lockset to determine whether we are currently in the scheduling lock,
and all accesses and modifications to g_cpu_lockset, g_cpu_irqlock, g_cpu_irqset
are in the critical section, so we can directly operate on it.
test:
We can use qemu for testing.
compiling
make distclean -j20; ./tools/configure.sh -l qemu-armv8a:nsh_smp ;make -j20
running
qemu-system-aarch64 -cpu cortex-a53 -smp 4 -nographic -machine virt,virtualization=on,gic-version=3 -net none -chardev stdio,id=con,mux=on -serial chardev:con -mon chardev=con,mode=readline -kernel ./nuttx
Signed-off-by: hujun5 <hujun5@xiaomi.com>
Task activation/exit logs are helpful for device bringup,
especially when user space nsh prompt doesn't show up.
Putting them here will allow cleaning of logs in multiple
up_exit() functions later.
Signed-off-by: Yanfeng Liu <yfliu2008@qq.com>
It takes about 10 cycles to obtain the task list according to the task
status. In most cases, we know the task status, so we can directly
add the task from the specified task list to reduce time consuming.
Summary:
- During repeating ostest with sabre-6quad:smp (QEMU),
I noticed that pthread_rwlock_test sometimes stops
- Finally, I found that nxtask_exit() released a critical
section too early before context switching which resulted in
selecting inappropriate TCB
- This commit fixes this issue by moving nxsched_resume_scheduler()
from nxtask_exit() to up_exit() and also removing
spin_setbit() and spin_clrbit() from nxtask_exit()
because the caller holds a critical section
- To be consistent with non-SMP cases, the above changes
were done for all CPU architectures
Impact:
- This commit affects all CPU architectures regardless of SMP
Testing:
- Tested with ostest with the following configs
- sabre-6quad:smp (QEMU, dev board), sabre-6quad:nsh (QEMU)
- spresense:wifi_smp
- sim:smp, sim:ostest
- maix-bit:smp (QEMU)
- esp32-devkitc:smp (QEMU)
- lc823450-xgevk:rndis
Signed-off-by: Masayuki Ishikawa <Masayuki.Ishikawa@jp.sony.com>
Summary:
- This commit fixes global IRQ control logic
- In previous implementation, g_cpu_irqset for a remote CPU was
set in sched_add_readytorun(), sched_remove_readytorun() and
up_schedule_sigaction()
- In this implementation, they are removed.
- Instead, in the pause handler, call enter_critical_setion()
which will call up_cpu_paused() then acquire g_cpu_irqlock
- So if a new task with irqcount > 1 restarts on the remote CPU,
the CPU will only hold a critical section. Thus, the issue such as
'POSSIBLE FOR TWO CPUs TO HOLD A CRITICAL SECTION' could be resolved.
- Fix nxsched_resume_scheduler() so that it does not call spin_clrbit()
if a CPU does not hold a g_cpu_irqset
- Fix nxtask_exit() so that it acquires g_cpu_irqlock
- Update TODO
Impact:
- All SMP implementations
Testing:
- Tested with smp, ostest with the following configurations
- Tested with spresense:wifi_smp (NCPUS=2,4)
- Tested with sabre-6quad:smp (QEMU, dev board)
- Tested with maix-bit:smp (QEMU)
- Tested with esp32-core:smp (QEMU)
- Tested with lc823450-xgevk:rndis
Signed-off-by: Masayuki Ishikawa <Masayuki.Ishikawa@jp.sony.com>
Summary:
- I noticed that nxsched_merge_pending() is called outside a critical section
- The issue happens if a new rtcb does not hold a critical section
- Actually, global IRQ control is done in nxsched_resume_scheduler() in nxtask_exit()
- However, nxsched_merge_pending() was called after calling nxsched_resume_scheduler()
- This commit fixes the issue by moving nxsched_merge_pending() before the function
- NOTE: the sequence was changed for SMP but works for non-SMP as well
Impact:
- This commit affects both SMP and non-SMP
Testing:
- Tested with ostest with the following configurations
- spresense:wifi_smp (NCPUS=2 and 4)
- spresense:wifi (non SMP)
- sabre-6quad:smp (QEMU)
- esp32-core:smp (QEMU)
- maix-bit:smp (QEMU)
- sim:smp
- lc823450-xgevk:rndis
Signed-off-by: Masayuki Ishikawa <Masayuki.Ishikawa@jp.sony.com>
Summary:
- During Wi-Fi audio streaming test, I found a deadlock in nxtask_exit()
- Actually, nxtask_exit() was called and tried to enter critical section
- In enter_critical_section(), there is a deadlock avoidance logic
- However, if switched to a new rtcb with irqcount=0, the logic did not work
- Because the 2nd critical section was treated as if it were the 1st one
- Actually, it tried to run the deadlock avoidance logic
- But nxtask_exit() was called with critical section (i.e. IRQ already disabled)
- So the logic did not work as expected because up_irq_restore() did not enable the IRQ.
- This commit fixes this issue by incrementing irqcount before calling nxtask_terminate()
- Also it adjusts g_cpu_irqlock and g_cpu_lockset
Impact:
- Affects SMP only
Testing:
- Tested with spresense:wifi_smp (smp, ostest, nxplayer, telnetd)
- Tested with sabre-6quad:smp with QEMU (smp, ostest)
- Tested with maix-bit:smp with QEMU (smp, ostest)
- Tested with esp32-core:smp with QEMU (smp, ostest)
Signed-off-by: Masayuki Ishikawa <Masayuki.Ishikawa@jp.sony.com>
utilize the call inside nxtask_exit instead, also move
nxsched_suspend_scheduler to nxtask_exit for symmetry
Signed-off-by: Xiang Xiao <xiaoxiang@xiaomi.com>
Change-Id: I219fc15faf0026e452b0db3906aa40b40ac677f3
* Simplify EINTR/ECANCEL error handling
1. Add semaphore uninterruptible wait function
2 .Replace semaphore wait loop with a single uninterruptible wait
3. Replace all sem_xxx to nxsem_xxx
* Unify the void cast usage
1. Remove void cast for function because many place ignore the returned value witout cast
2. Replace void cast for variable with UNUSED macro
libs/: Remove references to CONFIG_DISABLE_SIGNALS. Signals can no longer be disabled.
syscall/: Remove references to CONFIG_DISABLE_SIGNALS. Signals can no longer be disabled.
wireless/: Remove references to CONFIG_DISABLE_SIGNALS. Signals can no longer be disabled.
Documentation/: Remove references to CONFIG_DISABLE_SIGNALS. Signals can no longer be disabled.
include/: Remove references to CONFIG_DISABLE_SIGNALS. Signals can no longer be disabled.
drivers/: Remove references to CONFIG_DISABLE_SIGNALS. Signals can no longer be disabled.
sched/: Remove references to CONFIG_DISABLE_SIGNALS. Signals can no longer be disabled.
configs: Remove references to CONFIG_DISABLE_SIGNALS. Signals can no longer be disabled.
arch/xtensa: Remove references to CONFIG_DISABLE_SIGNALS. Signals can no longer be disabled.
arch/z80: Remove references to CONFIG_DISABLE_SIGNALS. Signals can no longer be disabled.
arch/x86: Remove references to CONFIG_DISABLE_SIGNALS. Signals can no longer be disabled.
arch/renesas and arch/risc-v: Remove references to CONFIG_DISABLE_SIGNALS. Signals can no longer be disabled.
arch/or1k: Remove all references to CONFIG_DISABLE_SIGNALS. Signals are always enabled.
arch/misoc: Remove all references to CONFIG_DISABLE_SIGNALS. Signals are always enabled.
arch/mips: Remove all references to CONFIG_DISABLE_SIGNALS. Signals are always enabled.
arch/avr: Remove all references to CONFIG_DISABLE_SIGNALS. Signals are always enabled.
arch/arm: Remove all references to CONFIG_DISABLE_SIGNALS. Signals are always enabled.
Squashed commit of the following:
Trivial, cosmetic
sched/, arch/, and include: Rename task_vforkstart() as nxtask_vforkstart()
sched/, arch/, and include: Rename task_vforkabort() as nxtask_vforkabort()
sched/, arch/, and include: Rename task_vforksetup() as nxtask_vfork_setup()
sched/: Rename notify_cancellation() as nxnotify_cancellation()
sched/: Rename task_recover() to nxtask_recover()
sched/task, sched/pthread/, Documentation/: Rename task_argsetup() and task_terminate() to nxtask_argsetup() and nxtask_terminate(), respectively.
sched/task: Rename task_schedsetup() to nxtask_schedsetup()
sched/ (plus some binfmt/, include/, and arch/): Rename task_start() and task_starthook() to nxtask_start() and nxtask_starthook().
arch/ and sched/: Rename task_exit() and task_exithook() to nxtask_exit() and nxtask_exithook(), respectively.
sched/task: Rename all internal, static, functions to begin with the nx prefix.
The previous implementation of clearing global IRQ in sched_addreadytorun()
and sched_removereadytorun() was done too early. As a result, nxsem_post()
would have a chance to enter the critical section even nxsem_wait() is
still not in blocked state. This patch moves clearing global IRQ controls
from sched_addreadytorun() and sched_removereadytorun() to sched_resumescheduler()
to ensure that nxsem_post() can enter the critical section correctly.
For this change, sched_resumescheduler.c is always necessary for SMP configuration.
In addition, by this change, task_exit() had to be modified so that it calls
sched_resumescheduler() because it calls sched_removescheduler() inside the
function, otherwise it will cause a deadlock.
However, I encountered another DEBUGASSERT() in sched_cpu_select() during
HTTP streaming aging test on lc823450-xgevk. Actually sched_cpu_select()
accesses the g_assignedtasks which might be changed by another CPU. Similarly,
other tasklists might be modified simultaneously if both CPUs are executing
scheduling logic. To avoid this, I introduced tasklist protetion APIs.
With these changes, SMP kernel stability has been much improved.
Signed-off-by: Masayuki Ishikawa <Masayuki.Ishikawa@jp.sony.com>