1. 26 Feb, 2013 1 commit
    • Thomas Gleixner's avatar
      stop_machine: Mark per cpu stopper enabled early · 46c498c2
      Thomas Gleixner authored
      commit 14e568e7 (stop_machine: Use smpboot threads) introduced the
      following regression:
      
      Before this commit the stopper enabled bit was set in the online
      notifier.
      
      CPU0				CPU1
      cpu_up
      				cpu online
      hotplug_notifier(ONLINE)
        stopper(CPU1)->enabled = true;
      ...
      stop_machine()
      
      The conversion to smpboot threads moved the enablement to the wakeup
      path of the parked thread. The majority of users seem to have the
      following working order:
      
      CPU0				CPU1
      cpu_up
      				cpu online
      unpark_threads()
        wakeup(stopper[CPU1])
      ....
      				stopper thread runs
      				  stopper(CPU1)->enabled = true;
      stop_machine()
      
      But Konrad and Sander have observed:
      
      CPU0				CPU1
      cpu_up
      				cpu online
      unpark_threads()
        wakeup(stopper[CPU1])
      ....
      stop_machine()
      				stopper thread runs
      				  stopper(CPU1)->enabled = true;
      
      Now the stop machinery kicks CPU0 into the stop loop, where it gets
      stuck forever because the queue code saw stopper(CPU1)->enabled ==
      false, so CPU0 waits for CPU1 to enter stomp_machine, but the CPU1
      stopper work got discarded due to enabled == false.
      
      Add a pre_unpark function to the smpboot thread descriptor and call it
      before waking the thread.
      
      This fixes the problem at hand, but the stop_machine code should be
      more robust. The stopper->enabled flag smells fishy at best.
      
      Thanks to Konrad for going through a loop of debug patches and
      providing the information to decode this issue.
      Reported-and-tested-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Reported-and-tested-by: default avatarSander Eikelenboom <linux@eikelenboom.it>
      Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1302261843240.22263@ionosSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      46c498c2
  2. 14 Feb, 2013 2 commits
  3. 01 Nov, 2011 1 commit
  4. 31 Oct, 2011 1 commit
    • Paul Gortmaker's avatar
      kernel: Map most files to use export.h instead of module.h · 9984de1a
      Paul Gortmaker authored
      The changed files were only including linux/module.h for the
      EXPORT_SYMBOL infrastructure, and nothing else.  Revector them
      onto the isolated export header for faster compile times.
      
      Nothing to see here but a whole lot of instances of:
      
        -#include <linux/module.h>
        +#include <linux/export.h>
      
      This commit is only changing the kernel dir; next targets
      will probably be mm, fs, the arch dirs, etc.
      Signed-off-by: default avatarPaul Gortmaker <paul.gortmaker@windriver.com>
      9984de1a
  5. 25 Oct, 2011 1 commit
  6. 26 Jul, 2011 1 commit
  7. 27 Jun, 2011 4 commits
  8. 23 Mar, 2011 1 commit
  9. 26 Oct, 2010 2 commits
  10. 18 Oct, 2010 1 commit
    • Peter Zijlstra's avatar
      sched: Create special class for stop/migrate work · 34f971f6
      Peter Zijlstra authored
      In order to separate the stop/migrate work thread from the SCHED_FIFO
      implementation, create a special class for it that is of higher priority than
      SCHED_FIFO itself.
      
      This currently solves a problem where cpu-hotplug consumes so much cpu-time
      that the SCHED_FIFO class gets throttled, but has the bandwidth replenishment
      timer pending on the now dead cpu.
      
      It is also required for when we add the planned deadline scheduling class above
      SCHED_FIFO, as the stop/migrate thread still needs to transcent those tasks.
      Tested-by: default avatarHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <1285165776.2275.1022.camel@laptop>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      34f971f6
  11. 10 Aug, 2010 1 commit
  12. 31 May, 2010 1 commit
    • Amit K. Arora's avatar
      sched: Make sure timers have migrated before killing the migration_thread · 54e88fad
      Amit K. Arora authored
      Problem: In a stress test where some heavy tests were running along with
      regular CPU offlining and onlining, a hang was observed. The system seems
      to be hung at a point where migration_call() tries to kill the
      migration_thread of the dying CPU, which just got moved to the current
      CPU. This migration thread does not get a chance to run (and die) since
      rt_throttled is set to 1 on current, and it doesn't get cleared as the
      hrtimer which is supposed to reset the rt bandwidth
      (sched_rt_period_timer) is tied to the CPU which we just marked dead!
      
      Solution: This patch pushes the killing of migration thread to
      "CPU_POST_DEAD" event. By then all the timers (including
      sched_rt_period_timer) should have got migrated (along with other
      callbacks).
      Signed-off-by: default avatarAmit Arora <aarora@in.ibm.com>
      Signed-off-by: default avatarGautham R Shenoy <ego@in.ibm.com>
      Acked-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      LKML-Reference: <20100525132346.GA14986@amitarora.in.ibm.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      54e88fad
  13. 17 May, 2010 1 commit
  14. 08 May, 2010 1 commit
    • Tejun Heo's avatar
      cpu_stop: add dummy implementation for UP · bbf1bb3e
      Tejun Heo authored
      When !CONFIG_SMP, cpu_stop functions weren't defined at all which
      could lead to build failures if UP code uses cpu_stop facility.  Add
      dummy cpu_stop implementation for UP.  The waiting variants execute
      the work function directly with preempt disabled and
      stop_one_cpu_nowait() schedules a workqueue work.
      
      Makefile and ifdefs around stop_machine implementation are updated to
      accomodate CONFIG_SMP && !CONFIG_STOP_MACHINE case.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Reported-by: default avatarIngo Molnar <mingo@elte.hu>
      bbf1bb3e
  15. 06 May, 2010 3 commits
    • Tejun Heo's avatar
      sched: replace migration_thread with cpu_stop · 969c7921
      Tejun Heo authored
      Currently migration_thread is serving three purposes - migration
      pusher, context to execute active_load_balance() and forced context
      switcher for expedited RCU synchronize_sched.  All three roles are
      hardcoded into migration_thread() and determining which job is
      scheduled is slightly messy.
      
      This patch kills migration_thread and replaces all three uses with
      cpu_stop.  The three different roles of migration_thread() are
      splitted into three separate cpu_stop callbacks -
      migration_cpu_stop(), active_load_balance_cpu_stop() and
      synchronize_sched_expedited_cpu_stop() - and each use case now simply
      asks cpu_stop to execute the callback as necessary.
      
      synchronize_sched_expedited() was implemented with private
      preallocated resources and custom multi-cpu queueing and waiting
      logic, both of which are provided by cpu_stop.
      synchronize_sched_expedited_count is made atomic and all other shared
      resources along with the mutex are dropped.
      
      synchronize_sched_expedited() also implemented a check to detect cases
      where not all the callback got executed on their assigned cpus and
      fall back to synchronize_sched().  If called with cpu hotplug blocked,
      cpu_stop already guarantees that and the condition cannot happen;
      otherwise, stop_machine() would break.  However, this patch preserves
      the paranoid check using a cpumask to record on which cpus the stopper
      ran so that it can serve as a bisection point if something actually
      goes wrong theree.
      
      Because the internal execution state is no longer visible,
      rcu_expedited_torture_stats() is removed.
      
      This patch also renames cpu_stop threads to from "stopper/%d" to
      "migration/%d".  The names of these threads ultimately don't matter
      and there's no reason to make unnecessary userland visible changes.
      
      With this patch applied, stop_machine() and sched now share the same
      resources.  stop_machine() is faster without wasting any resources and
      sched migration users are much cleaner.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Dipankar Sarma <dipankar@in.ibm.com>
      Cc: Josh Triplett <josh@freedesktop.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Dimitri Sivanich <sivanich@sgi.com>
      969c7921
    • Tejun Heo's avatar
      stop_machine: reimplement using cpu_stop · 3fc1f1e2
      Tejun Heo authored
      Reimplement stop_machine using cpu_stop.  As cpu stoppers are
      guaranteed to be available for all online cpus,
      stop_machine_create/destroy() are no longer necessary and removed.
      
      With resource management and synchronization handled by cpu_stop, the
      new implementation is much simpler.  Asking the cpu_stop to execute
      the stop_cpu() state machine on all online cpus with cpu hotplug
      disabled is enough.
      
      stop_machine itself doesn't need to manage any global resources
      anymore, so all per-instance information is rolled into struct
      stop_machine_data and the mutex and all static data variables are
      removed.
      
      The previous implementation created and destroyed RT workqueues as
      necessary which made stop_machine() calls highly expensive on very
      large machines.  According to Dimitri Sivanich, preventing the dynamic
      creation/destruction makes booting faster more than twice on very
      large machines.  cpu_stop resources are preallocated for all online
      cpus and should have the same effect.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarRusty Russell <rusty@rustcorp.com.au>
      Acked-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Dimitri Sivanich <sivanich@sgi.com>
      3fc1f1e2
    • Tejun Heo's avatar
      cpu_stop: implement stop_cpu[s]() · 1142d810
      Tejun Heo authored
      Implement a simplistic per-cpu maximum priority cpu monopolization
      mechanism.  A non-sleeping callback can be scheduled to run on one or
      multiple cpus with maximum priority monopolozing those cpus.  This is
      primarily to replace and unify RT workqueue usage in stop_machine and
      scheduler migration_thread which currently is serving multiple
      purposes.
      
      Four functions are provided - stop_one_cpu(), stop_one_cpu_nowait(),
      stop_cpus() and try_stop_cpus().
      
      This is to allow clean sharing of resources among stop_cpu and all the
      migration thread users.  One stopper thread per cpu is created which
      is currently named "stopper/CPU".  This will eventually replace the
      migration thread and take on its name.
      
      * This facility was originally named cpuhog and lived in separate
        files but Peter Zijlstra nacked the name and thus got renamed to
        cpu_stop and moved into stop_machine.c.
      
      * Better reporting of preemption leak as per Peter's suggestion.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Dimitri Sivanich <sivanich@sgi.com>
      1142d810
  16. 17 Feb, 2010 1 commit
  17. 30 Mar, 2009 1 commit
  18. 20 Feb, 2009 1 commit
  19. 04 Jan, 2009 1 commit
    • Heiko Carstens's avatar
      stop_machine: introduce stop_machine_create/destroy. · 9ea09af3
      Heiko Carstens authored
      Introduce stop_machine_create/destroy. With this interface subsystems
      that need a non-failing stop_machine environment can create the
      stop_machine machine threads before actually calling stop_machine.
      When the threads aren't needed anymore they can be killed with
      stop_machine_destroy again.
      
      When stop_machine gets called and the threads aren't present they
      will be created and destroyed automatically. This restores the old
      behaviour of stop_machine.
      
      This patch also converts cpu hotplug to the new interface since it
      is special: cpu_down calls __stop_machine instead of stop_machine.
      However the kstop threads will only be created when stop_machine
      gets called.
      
      Changing the code so that the threads would be created automatically
      on __stop_machine is currently not possible: when __stop_machine gets
      called we hold cpu_add_remove_lock, which is the same lock that
      create_rt_workqueue would take. So the workqueue needs to be created
      before the cpu hotplug code locks cpu_add_remove_lock.
      Signed-off-by: default avatarHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: default avatarRusty Russell <rusty@rustcorp.com.au>
      9ea09af3
  20. 31 Dec, 2008 1 commit
  21. 16 Nov, 2008 1 commit
  22. 26 Oct, 2008 1 commit
    • Linus Torvalds's avatar
      Revert "Call init_workqueues before pre smp initcalls." · 4403b406
      Linus Torvalds authored
      This reverts commit a802dd0e by moving
      the call to init_workqueues() back where it belongs - after SMP has been
      initialized.
      
      It also moves stop_machine_init() - which needs workqueues - to a later
      phase using a core_initcall() instead of early_initcall().  That should
      satisfy all ordering requirements, and was apparently the reason why
      init_workqueues() was moved to be too early.
      
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      4403b406
  23. 21 Oct, 2008 2 commits
  24. 12 Aug, 2008 1 commit
  25. 28 Jul, 2008 3 commits
  26. 26 Jul, 2008 1 commit
  27. 18 Jul, 2008 1 commit
    • Mike Travis's avatar
      cpumask: Replace cpumask_of_cpu with cpumask_of_cpu_ptr · 65c01184
      Mike Travis authored
        * This patch replaces the dangerous lvalue version of cpumask_of_cpu
          with new cpumask_of_cpu_ptr macros.  These are patterned after the
          node_to_cpumask_ptr macros.
      
          In general terms, if there is a cpumask_of_cpu_map[] then a pointer to
          the cpumask_of_cpu_map[cpu] entry is used.  The cpumask_of_cpu_map
          is provided when there is a large NR_CPUS count, reducing
          greatly the amount of code generated and stack space used for
          cpumask_of_cpu().  The pointer to the cpumask_t value is needed for
          calling set_cpus_allowed_ptr() to reduce the amount of stack space
          needed to pass the cpumask_t value.
      
          If there isn't a cpumask_of_cpu_map[], then a temporary variable is
          declared and filled in with value from cpumask_of_cpu(cpu) as well as
          a pointer variable pointing to this temporary variable.  Afterwards,
          the pointer is used to reference the cpumask value.  The compiler
          will optimize out the extra dereference through the pointer as well
          as the stack space used for the pointer, resulting in identical code.
      
          A good example of the orthogonal usages is in net/sunrpc/svc.c:
      
      	case SVC_POOL_PERCPU:
      	{
      		unsigned int cpu = m->pool_to[pidx];
      		cpumask_of_cpu_ptr(cpumask, cpu);
      
      		*oldmask = current->cpus_allowed;
      		set_cpus_allowed_ptr(current, cpumask);
      		return 1;
      	}
      	case SVC_POOL_PERNODE:
      	{
      		unsigned int node = m->pool_to[pidx];
      		node_to_cpumask_ptr(nodecpumask, node);
      
      		*oldmask = current->cpus_allowed;
      		set_cpus_allowed_ptr(current, nodecpumask);
      		return 1;
      	}
      Signed-off-by: default avatarMike Travis <travis@sgi.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      65c01184
  28. 23 Jun, 2008 1 commit
    • Rusty Russell's avatar
      sched: add new API sched_setscheduler_nocheck: add a flag to control access checks · 961ccddd
      Rusty Russell authored
      Hidehiro Kawai noticed that sched_setscheduler() can fail in
      stop_machine: it calls sched_setscheduler() from insmod, which can
      have CAP_SYS_MODULE without CAP_SYS_NICE.
      
      Two cases could have failed, so are changed to sched_setscheduler_nocheck:
        kernel/softirq.c:cpu_callback()
      	- CPU hotplug callback
        kernel/stop_machine.c:__stop_machine_run()
      	- Called from various places, including modprobe()
      Signed-off-by: default avatarRusty Russell <rusty@rustcorp.com.au>
      Cc: Jeremy Fitzhardinge <jeremy@goop.org>
      Cc: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: linux-mm@kvack.org
      Cc: sugita <yumiko.sugita.yf@hitachi.com>
      Cc: Satoshi OSHIMA <satoshi.oshima.fk@hitachi.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      961ccddd
  29. 23 May, 2008 1 commit
    • Christian Borntraeger's avatar
      stop_machine: make stop_machine_run more virtualization friendly · 3401a61e
      Christian Borntraeger authored
      On kvm I have seen some rare hangs in stop_machine when I used more guest
      cpus than hosts cpus. e.g. 32 guest cpus on 1 host cpu triggered the
      hang quite often. I could also reproduce the problem on a 4 way z/VM host with
      a 64 way guest.
      
      It turned out that the guest was consuming all available cpus mostly for
      spinning on scheduler locks like rq->lock. This is expected as the threads are
      calling yield all the time.
      The problem is now, that the host scheduling decisings together with the guest
      scheduling decisions and spinlocks not being fair managed to create an
      interesting scenario similar to a live lock. (Sometimes the hang resolved
      itself after some minutes)
      
      Changing stop_machine to yield the cpu to the hypervisor when yielding inside
      the guest fixed the problem for me. While I am not completely happy with this
      patch, I think it causes no harm and it really improves the situation for me.
      
      I used cpu_relax for yielding to the hypervisor, does that work on all
      architectures?
      
      p.s.: If you want to reproduce the problem, cpu hotplug and kprobes use
      stop_machine_run and both triggered the problem after some retries.
      Signed-off-by: default avatarChristian Borntraeger <borntraeger@de.ibm.com>
      CC: Ingo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarRusty Russell <rusty@rustcorp.com.au>
      3401a61e
  30. 21 Apr, 2008 1 commit