2018-01-30 01:23:49 +08:00
|
|
|
/*
|
|
|
|
* Copyright (c) 2018 Intel corporation
|
|
|
|
*
|
|
|
|
* SPDX-License-Identifier: Apache-2.0
|
|
|
|
*/
|
|
|
|
|
|
|
|
#include <kernel.h>
|
|
|
|
#include <kernel_structs.h>
|
|
|
|
#include <spinlock.h>
|
2018-01-18 03:34:50 +08:00
|
|
|
#include <kswap.h>
|
2018-02-09 01:10:46 +08:00
|
|
|
#include <kernel_internal.h>
|
2018-01-30 01:23:49 +08:00
|
|
|
|
kernel: Rework SMP irq_lock() compatibility layer
This was wrong in two ways, one subtle and one awful.
The subtle problem was that the IRQ lock isn't actually globally
recursive, it gets reset when you context switch (i.e. a _Swap()
implicitly releases and reacquires it). So the recursive count I was
keeping needs to be per-thread or else we risk deadlock any time we
swap away from a thread holding the lock.
And because part of my brain apparently knew this, there was an
"optimization" in the code that tested the current count vs. zero
outside the lock, on the argument that if it was non-zero we must
already hold the lock. Which would be true of a per-thread counter,
but NOT a global one: the other CPU may be holding that lock, and this
test will tell you *you* do. The upshot is that a recursive
irq_lock() would almost always SUCCEED INCORRECTLY when there was lock
contention. That this didn't break more things is amazing to me.
The rework is actually simpler than the original, thankfully. Though
there are some further subtleties:
* The lock state implied by irq_lock() allows the lock to be
implicitly released on context switch (i.e. you can _Swap() with the
lock held at a recursion level higher than 1, which needs to allow
other processes to run). So return paths into threads from _Swap()
and interrupt/exception exit need to check and restore the global
lock state, spinning as needed.
* The idle loop design specifies a k_cpu_idle() function that is on
common architectures expected to enable interrupts (for obvious
reasons), but there is no place to put non-arch code to wire it into
the global lock accounting. So on SMP, even CPU0 needs to use the
"dumb" spinning idle loop.
Finally this patch contains a simple bugfix too, found by inspection:
the interrupt return code used when CONFIG_SWITCH is enabled wasn't
correctly setting the active flag on the threads, opening up the
potential for a race that might result in a thread being scheduled on
two CPUs simultaneously.
Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
2018-04-13 03:50:05 +08:00
|
|
|
#ifdef CONFIG_SMP
|
|
|
|
static atomic_t global_lock;
|
2018-01-30 01:23:49 +08:00
|
|
|
|
|
|
|
unsigned int _smp_global_lock(void)
|
|
|
|
{
|
kernel: Rework SMP irq_lock() compatibility layer
This was wrong in two ways, one subtle and one awful.
The subtle problem was that the IRQ lock isn't actually globally
recursive, it gets reset when you context switch (i.e. a _Swap()
implicitly releases and reacquires it). So the recursive count I was
keeping needs to be per-thread or else we risk deadlock any time we
swap away from a thread holding the lock.
And because part of my brain apparently knew this, there was an
"optimization" in the code that tested the current count vs. zero
outside the lock, on the argument that if it was non-zero we must
already hold the lock. Which would be true of a per-thread counter,
but NOT a global one: the other CPU may be holding that lock, and this
test will tell you *you* do. The upshot is that a recursive
irq_lock() would almost always SUCCEED INCORRECTLY when there was lock
contention. That this didn't break more things is amazing to me.
The rework is actually simpler than the original, thankfully. Though
there are some further subtleties:
* The lock state implied by irq_lock() allows the lock to be
implicitly released on context switch (i.e. you can _Swap() with the
lock held at a recursion level higher than 1, which needs to allow
other processes to run). So return paths into threads from _Swap()
and interrupt/exception exit need to check and restore the global
lock state, spinning as needed.
* The idle loop design specifies a k_cpu_idle() function that is on
common architectures expected to enable interrupts (for obvious
reasons), but there is no place to put non-arch code to wire it into
the global lock accounting. So on SMP, even CPU0 needs to use the
"dumb" spinning idle loop.
Finally this patch contains a simple bugfix too, found by inspection:
the interrupt return code used when CONFIG_SWITCH is enabled wasn't
correctly setting the active flag on the threads, opening up the
potential for a race that might result in a thread being scheduled on
two CPUs simultaneously.
Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
2018-04-13 03:50:05 +08:00
|
|
|
int key = _arch_irq_lock();
|
2018-01-30 01:23:49 +08:00
|
|
|
|
kernel: Rework SMP irq_lock() compatibility layer
This was wrong in two ways, one subtle and one awful.
The subtle problem was that the IRQ lock isn't actually globally
recursive, it gets reset when you context switch (i.e. a _Swap()
implicitly releases and reacquires it). So the recursive count I was
keeping needs to be per-thread or else we risk deadlock any time we
swap away from a thread holding the lock.
And because part of my brain apparently knew this, there was an
"optimization" in the code that tested the current count vs. zero
outside the lock, on the argument that if it was non-zero we must
already hold the lock. Which would be true of a per-thread counter,
but NOT a global one: the other CPU may be holding that lock, and this
test will tell you *you* do. The upshot is that a recursive
irq_lock() would almost always SUCCEED INCORRECTLY when there was lock
contention. That this didn't break more things is amazing to me.
The rework is actually simpler than the original, thankfully. Though
there are some further subtleties:
* The lock state implied by irq_lock() allows the lock to be
implicitly released on context switch (i.e. you can _Swap() with the
lock held at a recursion level higher than 1, which needs to allow
other processes to run). So return paths into threads from _Swap()
and interrupt/exception exit need to check and restore the global
lock state, spinning as needed.
* The idle loop design specifies a k_cpu_idle() function that is on
common architectures expected to enable interrupts (for obvious
reasons), but there is no place to put non-arch code to wire it into
the global lock accounting. So on SMP, even CPU0 needs to use the
"dumb" spinning idle loop.
Finally this patch contains a simple bugfix too, found by inspection:
the interrupt return code used when CONFIG_SWITCH is enabled wasn't
correctly setting the active flag on the threads, opening up the
potential for a race that might result in a thread being scheduled on
two CPUs simultaneously.
Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
2018-04-13 03:50:05 +08:00
|
|
|
if (!_current->base.global_lock_count) {
|
|
|
|
while (!atomic_cas(&global_lock, 0, 1)) {
|
|
|
|
}
|
2018-01-30 01:23:49 +08:00
|
|
|
}
|
|
|
|
|
kernel: Rework SMP irq_lock() compatibility layer
This was wrong in two ways, one subtle and one awful.
The subtle problem was that the IRQ lock isn't actually globally
recursive, it gets reset when you context switch (i.e. a _Swap()
implicitly releases and reacquires it). So the recursive count I was
keeping needs to be per-thread or else we risk deadlock any time we
swap away from a thread holding the lock.
And because part of my brain apparently knew this, there was an
"optimization" in the code that tested the current count vs. zero
outside the lock, on the argument that if it was non-zero we must
already hold the lock. Which would be true of a per-thread counter,
but NOT a global one: the other CPU may be holding that lock, and this
test will tell you *you* do. The upshot is that a recursive
irq_lock() would almost always SUCCEED INCORRECTLY when there was lock
contention. That this didn't break more things is amazing to me.
The rework is actually simpler than the original, thankfully. Though
there are some further subtleties:
* The lock state implied by irq_lock() allows the lock to be
implicitly released on context switch (i.e. you can _Swap() with the
lock held at a recursion level higher than 1, which needs to allow
other processes to run). So return paths into threads from _Swap()
and interrupt/exception exit need to check and restore the global
lock state, spinning as needed.
* The idle loop design specifies a k_cpu_idle() function that is on
common architectures expected to enable interrupts (for obvious
reasons), but there is no place to put non-arch code to wire it into
the global lock accounting. So on SMP, even CPU0 needs to use the
"dumb" spinning idle loop.
Finally this patch contains a simple bugfix too, found by inspection:
the interrupt return code used when CONFIG_SWITCH is enabled wasn't
correctly setting the active flag on the threads, opening up the
potential for a race that might result in a thread being scheduled on
two CPUs simultaneously.
Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
2018-04-13 03:50:05 +08:00
|
|
|
_current->base.global_lock_count++;
|
2018-01-30 01:23:49 +08:00
|
|
|
|
kernel: Rework SMP irq_lock() compatibility layer
This was wrong in two ways, one subtle and one awful.
The subtle problem was that the IRQ lock isn't actually globally
recursive, it gets reset when you context switch (i.e. a _Swap()
implicitly releases and reacquires it). So the recursive count I was
keeping needs to be per-thread or else we risk deadlock any time we
swap away from a thread holding the lock.
And because part of my brain apparently knew this, there was an
"optimization" in the code that tested the current count vs. zero
outside the lock, on the argument that if it was non-zero we must
already hold the lock. Which would be true of a per-thread counter,
but NOT a global one: the other CPU may be holding that lock, and this
test will tell you *you* do. The upshot is that a recursive
irq_lock() would almost always SUCCEED INCORRECTLY when there was lock
contention. That this didn't break more things is amazing to me.
The rework is actually simpler than the original, thankfully. Though
there are some further subtleties:
* The lock state implied by irq_lock() allows the lock to be
implicitly released on context switch (i.e. you can _Swap() with the
lock held at a recursion level higher than 1, which needs to allow
other processes to run). So return paths into threads from _Swap()
and interrupt/exception exit need to check and restore the global
lock state, spinning as needed.
* The idle loop design specifies a k_cpu_idle() function that is on
common architectures expected to enable interrupts (for obvious
reasons), but there is no place to put non-arch code to wire it into
the global lock accounting. So on SMP, even CPU0 needs to use the
"dumb" spinning idle loop.
Finally this patch contains a simple bugfix too, found by inspection:
the interrupt return code used when CONFIG_SWITCH is enabled wasn't
correctly setting the active flag on the threads, opening up the
potential for a race that might result in a thread being scheduled on
two CPUs simultaneously.
Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
2018-04-13 03:50:05 +08:00
|
|
|
return key;
|
2018-01-30 01:23:49 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
void _smp_global_unlock(unsigned int key)
|
|
|
|
{
|
kernel: Rework SMP irq_lock() compatibility layer
This was wrong in two ways, one subtle and one awful.
The subtle problem was that the IRQ lock isn't actually globally
recursive, it gets reset when you context switch (i.e. a _Swap()
implicitly releases and reacquires it). So the recursive count I was
keeping needs to be per-thread or else we risk deadlock any time we
swap away from a thread holding the lock.
And because part of my brain apparently knew this, there was an
"optimization" in the code that tested the current count vs. zero
outside the lock, on the argument that if it was non-zero we must
already hold the lock. Which would be true of a per-thread counter,
but NOT a global one: the other CPU may be holding that lock, and this
test will tell you *you* do. The upshot is that a recursive
irq_lock() would almost always SUCCEED INCORRECTLY when there was lock
contention. That this didn't break more things is amazing to me.
The rework is actually simpler than the original, thankfully. Though
there are some further subtleties:
* The lock state implied by irq_lock() allows the lock to be
implicitly released on context switch (i.e. you can _Swap() with the
lock held at a recursion level higher than 1, which needs to allow
other processes to run). So return paths into threads from _Swap()
and interrupt/exception exit need to check and restore the global
lock state, spinning as needed.
* The idle loop design specifies a k_cpu_idle() function that is on
common architectures expected to enable interrupts (for obvious
reasons), but there is no place to put non-arch code to wire it into
the global lock accounting. So on SMP, even CPU0 needs to use the
"dumb" spinning idle loop.
Finally this patch contains a simple bugfix too, found by inspection:
the interrupt return code used when CONFIG_SWITCH is enabled wasn't
correctly setting the active flag on the threads, opening up the
potential for a race that might result in a thread being scheduled on
two CPUs simultaneously.
Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
2018-04-13 03:50:05 +08:00
|
|
|
if (_current->base.global_lock_count) {
|
|
|
|
_current->base.global_lock_count--;
|
|
|
|
|
|
|
|
if (!_current->base.global_lock_count) {
|
|
|
|
atomic_clear(&global_lock);
|
|
|
|
}
|
2018-01-30 01:23:49 +08:00
|
|
|
}
|
|
|
|
|
kernel: Rework SMP irq_lock() compatibility layer
This was wrong in two ways, one subtle and one awful.
The subtle problem was that the IRQ lock isn't actually globally
recursive, it gets reset when you context switch (i.e. a _Swap()
implicitly releases and reacquires it). So the recursive count I was
keeping needs to be per-thread or else we risk deadlock any time we
swap away from a thread holding the lock.
And because part of my brain apparently knew this, there was an
"optimization" in the code that tested the current count vs. zero
outside the lock, on the argument that if it was non-zero we must
already hold the lock. Which would be true of a per-thread counter,
but NOT a global one: the other CPU may be holding that lock, and this
test will tell you *you* do. The upshot is that a recursive
irq_lock() would almost always SUCCEED INCORRECTLY when there was lock
contention. That this didn't break more things is amazing to me.
The rework is actually simpler than the original, thankfully. Though
there are some further subtleties:
* The lock state implied by irq_lock() allows the lock to be
implicitly released on context switch (i.e. you can _Swap() with the
lock held at a recursion level higher than 1, which needs to allow
other processes to run). So return paths into threads from _Swap()
and interrupt/exception exit need to check and restore the global
lock state, spinning as needed.
* The idle loop design specifies a k_cpu_idle() function that is on
common architectures expected to enable interrupts (for obvious
reasons), but there is no place to put non-arch code to wire it into
the global lock accounting. So on SMP, even CPU0 needs to use the
"dumb" spinning idle loop.
Finally this patch contains a simple bugfix too, found by inspection:
the interrupt return code used when CONFIG_SWITCH is enabled wasn't
correctly setting the active flag on the threads, opening up the
potential for a race that might result in a thread being scheduled on
two CPUs simultaneously.
Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
2018-04-13 03:50:05 +08:00
|
|
|
_arch_irq_unlock(key);
|
|
|
|
}
|
2018-01-30 01:23:49 +08:00
|
|
|
|
kernel: Rework SMP irq_lock() compatibility layer
This was wrong in two ways, one subtle and one awful.
The subtle problem was that the IRQ lock isn't actually globally
recursive, it gets reset when you context switch (i.e. a _Swap()
implicitly releases and reacquires it). So the recursive count I was
keeping needs to be per-thread or else we risk deadlock any time we
swap away from a thread holding the lock.
And because part of my brain apparently knew this, there was an
"optimization" in the code that tested the current count vs. zero
outside the lock, on the argument that if it was non-zero we must
already hold the lock. Which would be true of a per-thread counter,
but NOT a global one: the other CPU may be holding that lock, and this
test will tell you *you* do. The upshot is that a recursive
irq_lock() would almost always SUCCEED INCORRECTLY when there was lock
contention. That this didn't break more things is amazing to me.
The rework is actually simpler than the original, thankfully. Though
there are some further subtleties:
* The lock state implied by irq_lock() allows the lock to be
implicitly released on context switch (i.e. you can _Swap() with the
lock held at a recursion level higher than 1, which needs to allow
other processes to run). So return paths into threads from _Swap()
and interrupt/exception exit need to check and restore the global
lock state, spinning as needed.
* The idle loop design specifies a k_cpu_idle() function that is on
common architectures expected to enable interrupts (for obvious
reasons), but there is no place to put non-arch code to wire it into
the global lock accounting. So on SMP, even CPU0 needs to use the
"dumb" spinning idle loop.
Finally this patch contains a simple bugfix too, found by inspection:
the interrupt return code used when CONFIG_SWITCH is enabled wasn't
correctly setting the active flag on the threads, opening up the
potential for a race that might result in a thread being scheduled on
two CPUs simultaneously.
Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
2018-04-13 03:50:05 +08:00
|
|
|
void _smp_reacquire_global_lock(struct k_thread *thread)
|
|
|
|
{
|
|
|
|
if (thread->base.global_lock_count) {
|
|
|
|
_arch_irq_lock();
|
|
|
|
|
|
|
|
while (!atomic_cas(&global_lock, 0, 1)) {
|
|
|
|
}
|
|
|
|
}
|
2018-01-30 01:23:49 +08:00
|
|
|
}
|
2018-01-18 03:34:50 +08:00
|
|
|
|
kernel: Rework SMP irq_lock() compatibility layer
This was wrong in two ways, one subtle and one awful.
The subtle problem was that the IRQ lock isn't actually globally
recursive, it gets reset when you context switch (i.e. a _Swap()
implicitly releases and reacquires it). So the recursive count I was
keeping needs to be per-thread or else we risk deadlock any time we
swap away from a thread holding the lock.
And because part of my brain apparently knew this, there was an
"optimization" in the code that tested the current count vs. zero
outside the lock, on the argument that if it was non-zero we must
already hold the lock. Which would be true of a per-thread counter,
but NOT a global one: the other CPU may be holding that lock, and this
test will tell you *you* do. The upshot is that a recursive
irq_lock() would almost always SUCCEED INCORRECTLY when there was lock
contention. That this didn't break more things is amazing to me.
The rework is actually simpler than the original, thankfully. Though
there are some further subtleties:
* The lock state implied by irq_lock() allows the lock to be
implicitly released on context switch (i.e. you can _Swap() with the
lock held at a recursion level higher than 1, which needs to allow
other processes to run). So return paths into threads from _Swap()
and interrupt/exception exit need to check and restore the global
lock state, spinning as needed.
* The idle loop design specifies a k_cpu_idle() function that is on
common architectures expected to enable interrupts (for obvious
reasons), but there is no place to put non-arch code to wire it into
the global lock accounting. So on SMP, even CPU0 needs to use the
"dumb" spinning idle loop.
Finally this patch contains a simple bugfix too, found by inspection:
the interrupt return code used when CONFIG_SWITCH is enabled wasn't
correctly setting the active flag on the threads, opening up the
potential for a race that might result in a thread being scheduled on
two CPUs simultaneously.
Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
2018-04-13 03:50:05 +08:00
|
|
|
|
|
|
|
/* Called from within _Swap(), so assumes lock already held */
|
|
|
|
void _smp_release_global_lock(struct k_thread *thread)
|
|
|
|
{
|
|
|
|
if (!thread->base.global_lock_count) {
|
|
|
|
atomic_clear(&global_lock);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
#endif
|
|
|
|
|
2018-01-18 03:34:50 +08:00
|
|
|
extern k_thread_stack_t _interrupt_stack1[];
|
|
|
|
extern k_thread_stack_t _interrupt_stack2[];
|
|
|
|
extern k_thread_stack_t _interrupt_stack3[];
|
|
|
|
|
|
|
|
#ifdef CONFIG_SMP
|
2018-03-07 07:08:55 +08:00
|
|
|
static void smp_init_top(int key, void *arg)
|
2018-01-18 03:34:50 +08:00
|
|
|
{
|
|
|
|
atomic_t *start_flag = arg;
|
|
|
|
|
|
|
|
/* Wait for the signal to begin scheduling */
|
|
|
|
do {
|
|
|
|
k_busy_wait(100);
|
|
|
|
} while (!atomic_get(start_flag));
|
|
|
|
|
|
|
|
/* Switch out of a dummy thread. Trick cribbed from the main
|
|
|
|
* thread init. Should probably unify implementations.
|
|
|
|
*/
|
|
|
|
struct k_thread dummy_thread = {
|
|
|
|
.base.user_options = K_ESSENTIAL,
|
|
|
|
.base.thread_state = _THREAD_DUMMY,
|
|
|
|
};
|
|
|
|
|
|
|
|
_arch_curr_cpu()->current = &dummy_thread;
|
kernel: Rework SMP irq_lock() compatibility layer
This was wrong in two ways, one subtle and one awful.
The subtle problem was that the IRQ lock isn't actually globally
recursive, it gets reset when you context switch (i.e. a _Swap()
implicitly releases and reacquires it). So the recursive count I was
keeping needs to be per-thread or else we risk deadlock any time we
swap away from a thread holding the lock.
And because part of my brain apparently knew this, there was an
"optimization" in the code that tested the current count vs. zero
outside the lock, on the argument that if it was non-zero we must
already hold the lock. Which would be true of a per-thread counter,
but NOT a global one: the other CPU may be holding that lock, and this
test will tell you *you* do. The upshot is that a recursive
irq_lock() would almost always SUCCEED INCORRECTLY when there was lock
contention. That this didn't break more things is amazing to me.
The rework is actually simpler than the original, thankfully. Though
there are some further subtleties:
* The lock state implied by irq_lock() allows the lock to be
implicitly released on context switch (i.e. you can _Swap() with the
lock held at a recursion level higher than 1, which needs to allow
other processes to run). So return paths into threads from _Swap()
and interrupt/exception exit need to check and restore the global
lock state, spinning as needed.
* The idle loop design specifies a k_cpu_idle() function that is on
common architectures expected to enable interrupts (for obvious
reasons), but there is no place to put non-arch code to wire it into
the global lock accounting. So on SMP, even CPU0 needs to use the
"dumb" spinning idle loop.
Finally this patch contains a simple bugfix too, found by inspection:
the interrupt return code used when CONFIG_SWITCH is enabled wasn't
correctly setting the active flag on the threads, opening up the
potential for a race that might result in a thread being scheduled on
two CPUs simultaneously.
Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
2018-04-13 03:50:05 +08:00
|
|
|
int k = irq_lock();
|
2018-01-27 04:30:21 +08:00
|
|
|
smp_timer_init();
|
|
|
|
_Swap(k);
|
2018-01-18 03:34:50 +08:00
|
|
|
|
|
|
|
CODE_UNREACHABLE;
|
|
|
|
}
|
|
|
|
#endif
|
|
|
|
|
|
|
|
void smp_init(void)
|
|
|
|
{
|
|
|
|
atomic_t start_flag;
|
|
|
|
|
|
|
|
atomic_clear(&start_flag);
|
|
|
|
|
|
|
|
#if defined(CONFIG_SMP) && CONFIG_MP_NUM_CPUS > 1
|
|
|
|
_arch_start_cpu(1, _interrupt_stack1, CONFIG_ISR_STACK_SIZE,
|
2018-03-07 07:08:55 +08:00
|
|
|
smp_init_top, &start_flag);
|
2018-01-18 03:34:50 +08:00
|
|
|
#endif
|
|
|
|
|
|
|
|
#if defined(CONFIG_SMP) && CONFIG_MP_NUM_CPUS > 2
|
|
|
|
_arch_start_cpu(2, _interrupt_stack2, CONFIG_ISR_STACK_SIZE,
|
2018-03-07 07:08:55 +08:00
|
|
|
smp_init_top, &start_flag);
|
2018-01-18 03:34:50 +08:00
|
|
|
#endif
|
|
|
|
|
|
|
|
#if defined(CONFIG_SMP) && CONFIG_MP_NUM_CPUS > 3
|
|
|
|
_arch_start_cpu(3, _interrupt_stack3, CONFIG_ISR_STACK_SIZE,
|
2018-03-07 07:08:55 +08:00
|
|
|
smp_init_top, &start_flag);
|
2018-01-18 03:34:50 +08:00
|
|
|
#endif
|
|
|
|
|
|
|
|
atomic_set(&start_flag, 1);
|
|
|
|
}
|