It takes about 10 cycles to obtain the task list according to the task
status. In most cases, we know the task status, so we can directly
add the task from the specified task list to reduce time consuming.
Summary:
- During repeating ostest with sabre-6quad:smp (QEMU),
I noticed that pthread_rwlock_test sometimes stops
- Finally, I found that nxtask_exit() released a critical
section too early before context switching which resulted in
selecting inappropriate TCB
- This commit fixes this issue by moving nxsched_resume_scheduler()
from nxtask_exit() to up_exit() and also removing
spin_setbit() and spin_clrbit() from nxtask_exit()
because the caller holds a critical section
- To be consistent with non-SMP cases, the above changes
were done for all CPU architectures
Impact:
- This commit affects all CPU architectures regardless of SMP
Testing:
- Tested with ostest with the following configs
- sabre-6quad:smp (QEMU, dev board), sabre-6quad:nsh (QEMU)
- spresense:wifi_smp
- sim:smp, sim:ostest
- maix-bit:smp (QEMU)
- esp32-devkitc:smp (QEMU)
- lc823450-xgevk:rndis
Signed-off-by: Masayuki Ishikawa <Masayuki.Ishikawa@jp.sony.com>
Summary:
- This commit fixes global IRQ control logic
- In previous implementation, g_cpu_irqset for a remote CPU was
set in sched_add_readytorun(), sched_remove_readytorun() and
up_schedule_sigaction()
- In this implementation, they are removed.
- Instead, in the pause handler, call enter_critical_setion()
which will call up_cpu_paused() then acquire g_cpu_irqlock
- So if a new task with irqcount > 1 restarts on the remote CPU,
the CPU will only hold a critical section. Thus, the issue such as
'POSSIBLE FOR TWO CPUs TO HOLD A CRITICAL SECTION' could be resolved.
- Fix nxsched_resume_scheduler() so that it does not call spin_clrbit()
if a CPU does not hold a g_cpu_irqset
- Fix nxtask_exit() so that it acquires g_cpu_irqlock
- Update TODO
Impact:
- All SMP implementations
Testing:
- Tested with smp, ostest with the following configurations
- Tested with spresense:wifi_smp (NCPUS=2,4)
- Tested with sabre-6quad:smp (QEMU, dev board)
- Tested with maix-bit:smp (QEMU)
- Tested with esp32-core:smp (QEMU)
- Tested with lc823450-xgevk:rndis
Signed-off-by: Masayuki Ishikawa <Masayuki.Ishikawa@jp.sony.com>
Summary:
- I noticed that nxsched_merge_pending() is called outside a critical section
- The issue happens if a new rtcb does not hold a critical section
- Actually, global IRQ control is done in nxsched_resume_scheduler() in nxtask_exit()
- However, nxsched_merge_pending() was called after calling nxsched_resume_scheduler()
- This commit fixes the issue by moving nxsched_merge_pending() before the function
- NOTE: the sequence was changed for SMP but works for non-SMP as well
Impact:
- This commit affects both SMP and non-SMP
Testing:
- Tested with ostest with the following configurations
- spresense:wifi_smp (NCPUS=2 and 4)
- spresense:wifi (non SMP)
- sabre-6quad:smp (QEMU)
- esp32-core:smp (QEMU)
- maix-bit:smp (QEMU)
- sim:smp
- lc823450-xgevk:rndis
Signed-off-by: Masayuki Ishikawa <Masayuki.Ishikawa@jp.sony.com>
Summary:
- During Wi-Fi audio streaming test, I found a deadlock in nxtask_exit()
- Actually, nxtask_exit() was called and tried to enter critical section
- In enter_critical_section(), there is a deadlock avoidance logic
- However, if switched to a new rtcb with irqcount=0, the logic did not work
- Because the 2nd critical section was treated as if it were the 1st one
- Actually, it tried to run the deadlock avoidance logic
- But nxtask_exit() was called with critical section (i.e. IRQ already disabled)
- So the logic did not work as expected because up_irq_restore() did not enable the IRQ.
- This commit fixes this issue by incrementing irqcount before calling nxtask_terminate()
- Also it adjusts g_cpu_irqlock and g_cpu_lockset
Impact:
- Affects SMP only
Testing:
- Tested with spresense:wifi_smp (smp, ostest, nxplayer, telnetd)
- Tested with sabre-6quad:smp with QEMU (smp, ostest)
- Tested with maix-bit:smp with QEMU (smp, ostest)
- Tested with esp32-core:smp with QEMU (smp, ostest)
Signed-off-by: Masayuki Ishikawa <Masayuki.Ishikawa@jp.sony.com>
utilize the call inside nxtask_exit instead, also move
nxsched_suspend_scheduler to nxtask_exit for symmetry
Signed-off-by: Xiang Xiao <xiaoxiang@xiaomi.com>
Change-Id: I219fc15faf0026e452b0db3906aa40b40ac677f3
* Simplify EINTR/ECANCEL error handling
1. Add semaphore uninterruptible wait function
2 .Replace semaphore wait loop with a single uninterruptible wait
3. Replace all sem_xxx to nxsem_xxx
* Unify the void cast usage
1. Remove void cast for function because many place ignore the returned value witout cast
2. Replace void cast for variable with UNUSED macro
libs/: Remove references to CONFIG_DISABLE_SIGNALS. Signals can no longer be disabled.
syscall/: Remove references to CONFIG_DISABLE_SIGNALS. Signals can no longer be disabled.
wireless/: Remove references to CONFIG_DISABLE_SIGNALS. Signals can no longer be disabled.
Documentation/: Remove references to CONFIG_DISABLE_SIGNALS. Signals can no longer be disabled.
include/: Remove references to CONFIG_DISABLE_SIGNALS. Signals can no longer be disabled.
drivers/: Remove references to CONFIG_DISABLE_SIGNALS. Signals can no longer be disabled.
sched/: Remove references to CONFIG_DISABLE_SIGNALS. Signals can no longer be disabled.
configs: Remove references to CONFIG_DISABLE_SIGNALS. Signals can no longer be disabled.
arch/xtensa: Remove references to CONFIG_DISABLE_SIGNALS. Signals can no longer be disabled.
arch/z80: Remove references to CONFIG_DISABLE_SIGNALS. Signals can no longer be disabled.
arch/x86: Remove references to CONFIG_DISABLE_SIGNALS. Signals can no longer be disabled.
arch/renesas and arch/risc-v: Remove references to CONFIG_DISABLE_SIGNALS. Signals can no longer be disabled.
arch/or1k: Remove all references to CONFIG_DISABLE_SIGNALS. Signals are always enabled.
arch/misoc: Remove all references to CONFIG_DISABLE_SIGNALS. Signals are always enabled.
arch/mips: Remove all references to CONFIG_DISABLE_SIGNALS. Signals are always enabled.
arch/avr: Remove all references to CONFIG_DISABLE_SIGNALS. Signals are always enabled.
arch/arm: Remove all references to CONFIG_DISABLE_SIGNALS. Signals are always enabled.
Squashed commit of the following:
Trivial, cosmetic
sched/, arch/, and include: Rename task_vforkstart() as nxtask_vforkstart()
sched/, arch/, and include: Rename task_vforkabort() as nxtask_vforkabort()
sched/, arch/, and include: Rename task_vforksetup() as nxtask_vfork_setup()
sched/: Rename notify_cancellation() as nxnotify_cancellation()
sched/: Rename task_recover() to nxtask_recover()
sched/task, sched/pthread/, Documentation/: Rename task_argsetup() and task_terminate() to nxtask_argsetup() and nxtask_terminate(), respectively.
sched/task: Rename task_schedsetup() to nxtask_schedsetup()
sched/ (plus some binfmt/, include/, and arch/): Rename task_start() and task_starthook() to nxtask_start() and nxtask_starthook().
arch/ and sched/: Rename task_exit() and task_exithook() to nxtask_exit() and nxtask_exithook(), respectively.
sched/task: Rename all internal, static, functions to begin with the nx prefix.
The previous implementation of clearing global IRQ in sched_addreadytorun()
and sched_removereadytorun() was done too early. As a result, nxsem_post()
would have a chance to enter the critical section even nxsem_wait() is
still not in blocked state. This patch moves clearing global IRQ controls
from sched_addreadytorun() and sched_removereadytorun() to sched_resumescheduler()
to ensure that nxsem_post() can enter the critical section correctly.
For this change, sched_resumescheduler.c is always necessary for SMP configuration.
In addition, by this change, task_exit() had to be modified so that it calls
sched_resumescheduler() because it calls sched_removescheduler() inside the
function, otherwise it will cause a deadlock.
However, I encountered another DEBUGASSERT() in sched_cpu_select() during
HTTP streaming aging test on lc823450-xgevk. Actually sched_cpu_select()
accesses the g_assignedtasks which might be changed by another CPU. Similarly,
other tasklists might be modified simultaneously if both CPUs are executing
scheduling logic. To avoid this, I introduced tasklist protetion APIs.
With these changes, SMP kernel stability has been much improved.
Signed-off-by: Masayuki Ishikawa <Masayuki.Ishikawa@jp.sony.com>