This commit adds a new internal interfaces and fixes a problem with three APIs in the SMP configuration. The new internal interface is sched_cpu_pause(tcb). This function will pause a CPU if the task associated with 'tcb' is running on that CPU. This allows a different CPU to modify that OS data stuctures associated with the CPU. When the other CPU is resumed, those modifications can safely take place.

The three fixes are to handle cases in the SMP configuration where one CPU does need to make modifications to TCB and data structures on a task that could be running running on another CPU. Those three cases are task_delete(), task_restart(), and execution of signal handles. In all three cases the solutions is basically the same: (1) Call sched_cpu_pause(tcb) to pause the CPU on which the task is running, (2) perform the necessary operations, then (3) call up_cpu_resume() to restart the paused CPU.
2016-11-20 07:57:18 -06:00 · 2016-11-20 07:57:18 -06:00 · e24f281401
parent f5b35e0461
commit e24f281401
9 changed files with 250 additions and 143 deletions
--- a/65
+++ b/65
@ -10,7 +10,7 @@ issues related to each board port.
 nuttx/:

 (13)  Task/Scheduler (sched/)
-  (3)  SMP
+  (1)  SMP
  (1)  Memory Management (mm/)
  (1)  Power Management (drivers/pm)
  (3)  Signals (sched/signal, arch/)
@ -307,69 +307,6 @@ o Task/Scheduler (sched/)

 o SMP
  ^^^
-  Title:       SMP ISSUES WITH task_restart() AND task_delete()
-  Description: The interface task_restart() (and probably task_delete()) are
-               not usable in the SMP configuration in the current design.  In
-               the non-SMP case, these are relatively simple:  If the task is
-               are not restarting/deleting itself, then the task to-be-restarted/
-               deleted is is supended and the restart/delete operation is a
-               simple operation on data structures.
-
-               In the SMP configuration, on the other hand, the task to be
-               restarted/deleted my in fact be executing concurrently on
-               another CPU and the existing logic cannot support those
-               operations on the running another CPU.
-
-               There might be a simple way to handler this; perhaps using
-               up_cpu_pause(), you could pause all of the other CPUs, perform
-               the restart/delete operation, then restart all other CPUs.  A
-               better solution would be a new interface like up_cpu_stop().
-               This would be sent to all CPUs and if the task is running on
-               any of them, it would suspend the task and put it in the INVALID
-               state.
-
-               But this seems like a lot of work to support some garbage
-               interfaces that really should be removed anyway.  These are
-               unsafe, non-standard interfaces that really have no place in an
-               RTOS (unsafe because you don't know what resources were held
-               by the task when it was restarted or deleted).
-
-               NOTE: Currently task_restart() is not even built if CONFIG_SMP=y.
-               The task_restart() test is also disabled in apps/examples/ostest
-               in this configuration.  task_delete(), on the other hand, is
-               built (but probably should not be).
-
-  Status:      Open
-  Priority:    Low.  I do not plan to do anything with this in the near future.
-
-  Title:       SMP AND SIGNAL ACTIONS
-  Description: Suppose a task task is running on CPU1 and it is signaled from
-               another task running on CPU0.  How does the signal handler run
-               on CPU1?  I think it does not; I think that the signal will be
-               lost.  I think all testing up to this point has used a task
-               waiting for a signal vs. a running task receiving a signal.
-               This case has never been tested, but I suspect an issues.
-               Here is why... this is the signal handling sequence:
-
-               - sigueue() will set up the siginfo data structure and call
-                 sig_dipatch().
-               - sig_tcbdispach() or group_signal() depending on the
-                 configuration.  Let's assume the simpler sig_tcbdispatch().
-               - sig_tcbdispatch() will call queue the signal action (via
-                 sig_queueaction()) and then call the architecture-specific
-                 up_schedule_signaction set up the invoke the signal handler
-                 (for example in arch/arm/src/armv7-m/up_schedulesigaction.c).
-               - sig_queueaction() will assume that the other task is not
-                 running and will simply modify data structures in the TCB.
-                 This, will have no effect if the task is running and the
-                 signal action will not be performed.
-
-               This is really a variant of the problem described above under
-               "SMP ISSUES WITH task_restart() AND task_delete()" and the
-               same proposed solution applies:  Call up_cpu_pause() to stop
-               all other CPUs before up_schedule_signaction runs.
-  Status:      Open
-  Priority:    High.  This must be fixed.

  Title:       SPINLOCKS AND DATA CACHES
  Description: If spinlocks are used in a system with a data cache, then there
--- a/include/sched.h
+++ b/include/sched.h
@ -227,9 +227,7 @@ int    task_create(FAR const char *name, int priority, int stack_size,
                   main_t entry, FAR char * const argv[]);
 #endif
 int    task_delete(pid_t pid);
-#ifndef CONFIG_SMP /* Not yet supported for the SMP case */
 int    task_restart(pid_t pid);
-#endif

 /* Task Scheduling Interfaces (based on POSIX APIs) */

--- a/sched/sched/Make.defs
+++ b/sched/sched/Make.defs
@ -50,7 +50,8 @@ CSRCS += sched_reprioritize.c
 endif

 ifeq ($(CONFIG_SMP),y)
-CSRCS += sched_getaffinity.c sched_setaffinity.c sched_cpuselect.c
+CSRCS += sched_cpuselect.c sched_cpupause.c
+CSRCS += sched_getaffinity.c sched_setaffinity.c
 endif

 ifeq ($(CONFIG_SCHED_WAITPID),y)
--- a/sched/sched/sched.h
+++ b/sched/sched/sched.h
@ -419,12 +419,13 @@ void sched_sporadic_lowpriority(FAR struct tcb_s *tcb);
 #endif

 #ifdef CONFIG_SMP
-int sched_cpu_select(cpu_set_t affinity);
+int  sched_cpu_select(cpu_set_t affinity);
+int  sched_cpu_pause(FAR struct tcb_s *tcb);
 #  define sched_islocked(tcb) spin_islocked(&g_cpu_schedlock)
 #else
-#  define sched_islocked(tcb) ((tcb)->lockcount > 0)
 #  define sched_cpu_select(a) (0)
-
+#  define sched_cpu_pause(t)  (-38)  /* -ENOSYS */
+#  define sched_islocked(tcb) ((tcb)->lockcount > 0)
 #endif

 /* CPU load measurement support */
--- a/sched/sched/sched_cpupause.c
+++ b/sched/sched/sched_cpupause.c
@ -0,0 +1,119 @@
+/****************************************************************************
+ * sched/sched/sched_cpupause.c
+ *
+ *   Copyright (C) 2016 Gregory Nutt. All rights reserved.
+ *   Author: Gregory Nutt <gnutt@nuttx.org>
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in
+ *    the documentation and/or other materials provided with the
+ *    distribution.
+ * 3. Neither the name NuttX nor the names of its contributors may be
+ *    used to endorse or promote products derived from this software
+ *    without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+ * FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
+ * COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+ * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+ * BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS
+ * OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED
+ * AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+ * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+ * ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+ * POSSIBILITY OF SUCH DAMAGE.
+ *
+ ****************************************************************************/
+
+/****************************************************************************
+ * Included Files
+ ****************************************************************************/
+
+#include <nuttx/config.h>
+
+#include <sys/types.h>
+#include <assert.h>
+#include <errno.h>
+
+#include <nuttx/arch.h>
+
+#include "sched/sched.h"
+
+#ifdef CONFIG_SMP
+
+/****************************************************************************
+ * Public Functions
+ ****************************************************************************/
+
+/****************************************************************************
+ * Name: sched_cpu_pause
+ *
+ * Description:
+ *   Check if task associated with 'tcb' is running on a different CPU.  If
+ *   so then pause that CPU and return its CPU index.
+ *
+ * Input Parameters:
+ *   tcb - The TCB of the task to be conditionally paused.
+ *
+ * Returned Value:
+ *   If a CPU is pauses its non-negative CPU index is returned.  This index
+ *   may then be used to resume the CPU.  If the task is not running at all
+ *   (or if an error occurs), then a negated errno value is returned.  -ESRCH
+ *   is returned in the case where the task is not running on any CPU.
+ *
+ * Assumptions:
+ *   This function was called in a critical section.  In that case, no tasks
+ *   may started or may exit until the we leave the critical section.  This
+ *   critical section should extend until up_cpu_resume() is called in the
+ *   typical case.
+ *
+ ****************************************************************************/
+
+int sched_cpu_pause(FAR struct tcb_s *tcb)
+{
+  int cpu;
+  int ret;
+
+  DEBUGASSERT(tcb != NULL);
+
+  /* If the task is not running at all then our job is easy */
+
+  cpu = tcb->cpu;
+  if (tcb->task_state != TSTATE_TASK_RUNNING)
+    {
+      return -ESRCH;
+    }
+
+  /* Check the CPU that the task is running on */
+
+  DEBUGASSERT(cpu != this_cpu() && (unsigned int)cpu < CONFIG_SMP_NCPUS);
+  if (cpu == this_cpu())
+    {
+      /* We can't pause ourself */
+
+      return -EACCES;
+    }
+
+  /* Pause the CPU that the task is running on */
+
+  ret = up_cpu_pause(cpu);
+  if (ret < 0)
+    {
+      return ret;
+    }
+
+  /* Return the CPU that the task is running on */
+
+  return cpu;
+}
+
+#endif /* CONFIG_SMP */
+
--- a/sched/signal/sig_dispatch.c
+++ b/sched/signal/sig_dispatch.c
@ -345,22 +345,48 @@ int sig_tcbdispatch(FAR struct tcb_s *stcb, siginfo_t *info)

  else
    {
+#ifdef CONFIG_SMP
+      int cpu;
+#endif
      /* Queue any sigaction's requested by this task. */

      ret = sig_queueaction(stcb, info);

+      /* Deliver of the signal must be performed in a critical section */
+
+      flags = enter_critical_section();
+
+#ifdef CONFIG_SMP
+      /* If the thread is running on another CPU, then pause that CPU.  We can
+       * then setup the for signal delivery on the running thread.  When the
+       * CPU is resumed, the signal handler will then execute.
+       */
+
+      cpu = sched_cpu_pause(stcb);
+#endif /* CONFIG_SMP */
+
      /* Then schedule execution of the signal handling action on the
       * recipient's thread.
       */

      up_schedule_sigaction(stcb, sig_deliver);

+#ifdef CONFIG_SMP
+      /* Resume the paused CPU (if any) */
+
+      if (cpu >= 0)
+        {
+          /* I am not yet sure how to handle a failure here. */
+
+          DEBUGVERIFY(up_cpu_resume(cpu));
+        }
+#endif /* CONFIG_SMP */
+
      /* Check if the task is waiting for an unmasked signal.  If so, then
       * unblock it. This must be performed in a critical section because
       * signals can be queued from the interrupt level.
       */

-      flags = enter_critical_section();
      if (stcb->task_state == TSTATE_WAIT_SIG)
        {
          memcpy(&stcb->sigunbinfo, info, sizeof(siginfo_t));
--- a/sched/task/Make.defs
+++ b/sched/task/Make.defs
@ -35,13 +35,9 @@

 CSRCS += task_create.c task_init.c task_setup.c task_activate.c
 CSRCS += task_start.c task_delete.c task_exit.c task_exithook.c
-CSRCS += task_recover.c task_spawnparms.c task_terminate.c
+CSRCS += task_recover.c task_restart.c task_spawnparms.c task_terminate.c
 CSRCS += task_getgroup.c task_prctl.c task_getpid.c exit.c

-ifneq ($(CONFIG_SMP),y)
-CSRCS +=  task_restart.c
-endif
-
 ifeq ($(CONFIG_ARCH_HAVE_VFORK),y)
 ifeq ($(CONFIG_SCHED_WAITPID),y)
 CSRCS += task_vfork.c
--- a/sched/task/task_restart.c
+++ b/sched/task/task_restart.c
@ -51,8 +51,6 @@
 #include "signal/signal.h"
 #include "task/task.h"

-#ifndef CONFIG_SMP /* Not yet supported for the SMP case */
-
 /****************************************************************************
 * Public Functions
 ****************************************************************************/
@ -86,13 +84,10 @@ int task_restart(pid_t pid)
  FAR dq_queue_t *tasklist;
  irqstate_t flags;
  int errcode;
-  int status;
-
-  /* Make sure this task does not become ready-to-run while we are futzing
-   * with its TCB
-   */
-
-  sched_lock();
+#ifdef CONFIG_SMP
+  int cpu;
+#endif
+  int ret;

  /* Check if the task to restart is the calling task */

@ -105,7 +100,15 @@ int task_restart(pid_t pid)
      goto errout_with_lock;
    }

-  /* We are restarting some other task than ourselves */
+  /* We are restarting some other task than ourselves.  Make sure that the
+   * task does not change its state while we are executing.  In the single
+   * CPU state this could be done by disabling pre-emption.  But we will
+   * a little stronger medicine on the SMP case:  The task make be running
+   * on another CPU.
+   */
+
+  flags = enter_critical_section();
+
  /* Find for the TCB associated with matching pid  */

  tcb = (FAR struct task_tcb_s *)sched_gettcb(pid);
@ -122,22 +125,12 @@ int task_restart(pid_t pid)
    }

 #ifdef CONFIG_SMP
-  /* There is currently no capability to restart a task that is actively
-   * running on another CPU.  This is not the calling task so if it is
-   * running, then it could only be running a a different CPU.
-   *
-   * Also, we will need some interlocks to assure that no tasks are
-   * rescheduled on any other CPU while we do this.
+  /* If the task is running on another CPU, then pause that CPU.  We can
+   * then manipulate the TCB of the restarted task and when we resume the
+   * that CPU, the restart take effect.
   */

-#warning Missing SMP logic
-  if (tcb->cmn.task_state == TSTATE_TASK_RUNNING)
-    {
-      /* Not implemented */
-
-      errcode = ENOSYS;
-      goto errout_with_lock;
-    }
+  cpu = sched_cpu_pause(&tcb->cmn);
 #endif /* CONFIG_SMP */

  /* Try to recover from any bad states */
@ -160,10 +153,8 @@ int task_restart(pid_t pid)
  tasklist = TLIST_HEAD(tcb->cmn.task_state);
 #endif

-  flags = enter_critical_section();
  dq_rem((FAR dq_entry_t *)tcb, tasklist);
  tcb->cmn.task_state = TSTATE_TASK_INVALID;
-  leave_critical_section(flags);

  /* Deallocate anything left in the TCB's queues */

@ -193,23 +184,35 @@ int task_restart(pid_t pid)
  dq_addfirst((FAR dq_entry_t *)tcb, (FAR dq_queue_t *)&g_inactivetasks);
  tcb->cmn.task_state = TSTATE_TASK_INACTIVE;

+#ifdef CONFIG_SMP
+  /* Resume the paused CPU (if any) */
+
+  if (cpu >= 0)
+    {
+      ret = up_cpu_resume(cpu);
+      if (ret < 0)
+        {
+          errcode = -ret;
+          goto errout_with_lock;
+        }
+    }
+#endif /* CONFIG_SMP */
+
  /* Activate the task */

-  status = task_activate((FAR struct tcb_s *)tcb);
-  if (status != OK)
+  ret = task_activate((FAR struct tcb_s *)tcb);
+  if (ret != OK)
    {
      (void)task_delete(pid);
-      errcode = -status;
+      errcode = -ret;
      goto errout_with_lock;
    }

-  sched_unlock();
+  leave_critical_section(flags);
  return OK;

 errout_with_lock:
  set_errno(errcode);
-  sched_unlock();
+  leave_critical_section(flags);
  return ERROR;
 }
-
-#endif /* CONFIG_SMP */
--- a/sched/task/task_terminate.c
+++ b/sched/task/task_terminate.c
@ -103,12 +103,17 @@ int task_terminate(pid_t pid, bool nonblocking)
  FAR struct tcb_s *dtcb;
  FAR dq_queue_t *tasklist;
  irqstate_t flags;
+#ifdef CONFIG_SMP
+  int cpu;
+#endif
+  int ret;

-  /* Make sure the task does not become ready-to-run while we are futzing with
-   * its TCB by locking ourselves as the executing task.
+  /* Make sure the task does not become ready-to-run while we are futzing
+   * with its TCB.  Within the critical section, no new task may be started
+   * or terminated (even in the SMP case).
   */

-  sched_lock();
+  flags = enter_critical_section();

  /* Find for the TCB associated with matching PID */

@ -117,26 +122,60 @@ int task_terminate(pid_t pid, bool nonblocking)
    {
      /* This PID does not correspond to any known task */

-      sched_unlock();
-      return -ESRCH;
+      ret = -ESRCH;
+      goto errout_with_lock;
    }

-#ifdef CONFIG_SMP
-  /* We will need some interlocks to assure that no tasks are rescheduled
-   * on any other CPU while we do this.
-   */
-
-#  warning Missing SMP logic
-#endif
-
  /* Verify our internal sanity */

-  if (dtcb->task_state == TSTATE_TASK_RUNNING ||
-      dtcb->task_state >= NUM_TASK_STATES)
+#ifdef CONFIG_SMP
+  DEBUGASSERT(dtcb->task_state < NUM_TASK_STATES);
+#else
+  DEBUGASSERT(dtcb->task_state != TSTATE_TASK_RUNNING &&
+              dtcb->task_state < NUM_TASK_STATES);
+#endif
+
+  /* Remove the task from the OS's task lists.  We must be in a critical
+   * section and the must must not be running to do this.
+   */
+
+#ifdef CONFIG_SMP
+  /* In the SMP case, the thread may be running on another CPU.  If that is
+   * the case, then we will pause the CPU that the thread is running on.
+   */
+
+  cpu = sched_cpu_pause(dtcb);
+
+  /* Get the task list associated with the the thread's state and CPU */
+
+  tasklist = TLIST_HEAD(dtcb->task_state, cpu);
+#else
+  /* In the non-SMP case, we can be assured that the task to be terminated
+   * is not running.  get the task list associated with the task state.
+   */
+
+  tasklist = TLIST_HEAD(dtcb->task_state);
+#endif
+
+  /* Remove the task from the task list */
+
+  dq_rem((FAR dq_entry_t *)dtcb, tasklist);
+  dtcb->task_state = TSTATE_TASK_INVALID;
+
+  /* At this point, the TCB should no longer be accessible to the system */
+
+#ifdef CONFIG_SMP
+  /* Resume the paused CPU (if any) */
+
+  if (cpu >= 0)
    {
-      sched_unlock();
-      PANIC();
+      /* I am not yet sure how to handle a failure here. */
+
+      DEBUGVERIFY(up_cpu_resume(cpu));
    }
+#endif /* CONFIG_SMP */
+
+  leave_critical_section(flags);

  /* Perform common task termination logic (flushing streams, calling
   * functions registered by at_exit/on_exit, etc.).  We need to do
@ -151,23 +190,6 @@ int task_terminate(pid_t pid, bool nonblocking)

  task_exithook(dtcb, EXIT_SUCCESS, nonblocking);

-  /* Remove the task from the OS's task lists. */
-
-#ifdef CONFIG_SMP
-  tasklist = TLIST_HEAD(dtcb->task_state, dtcb->cpu);
-#else
-  tasklist = TLIST_HEAD(dtcb->task_state);
-#endif
-
-  flags = enter_critical_section();
-  dq_rem((FAR dq_entry_t *)dtcb, tasklist);
-  dtcb->task_state = TSTATE_TASK_INVALID;
-  leave_critical_section(flags);
-
-  /* At this point, the TCB should no longer be accessible to the system */
-
-  sched_unlock();
-
  /* Since all tasks pass through this function as the final step in their
   * exit sequence, this is an appropriate place to inform any instrumentation
   * layer that the task no longer exists.
@ -178,4 +200,8 @@ int task_terminate(pid_t pid, bool nonblocking)
  /* Deallocate its TCB */

  return sched_releasetcb(dtcb, dtcb->flags & TCB_FLAG_TTYPE_MASK);
+
+errout_with_lock:
+  leave_critical_section(flags);
+  return ret;
 }