1. 12 Jan, 2006 1 commit
  2. 10 Jan, 2006 2 commits
  3. 09 Jan, 2006 5 commits
    • Ingo Molnar's avatar
      [PATCH] mutex subsystem, debugging code · 408894ee
      Ingo Molnar authored
      
      
      mutex implementation - add debugging code.
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarArjan van de Ven <arjan@infradead.org>
      408894ee
    • David Howells's avatar
      [PATCH] keys: Permit running process to instantiate keys · b5f545c8
      David Howells authored
      
      
      Make it possible for a running process (such as gssapid) to be able to
      instantiate a key, as was requested by Trond Myklebust for NFS4.
      
      The patch makes the following changes:
      
       (1) A new, optional key type method has been added. This permits a key type
           to intercept requests at the point /sbin/request-key is about to be
           spawned and do something else with them - passing them over the
           rpc_pipefs files or netlink sockets for instance.
      
           The uninstantiated key, the authorisation key and the intended operation
           name are passed to the method.
      
       (2) The callout_info is no longer passed as an argument to /sbin/request-key
           to prevent unauthorised viewing of this data using ps or by looking in
           /proc/pid/cmdline.
      
           This means that the old /sbin/request-key program will not work with the
           patched kernel as it will expect to see an extra argument that is no
           longer there.
      
           A revised keyutils package will be made available tomorrow.
      
       (3) The callout_info is now attached to the authorisation key. Reading this
           key will retrieve the information.
      
       (4) A new field has been added to the task_struct. This holds the
           authorisation key currently active for a thread. Searches now look here
           for the caller's set of keys rather than looking for an auth key in the
           lowest level of the session keyring.
      
           This permits a thread to be servicing multiple requests at once and to
           switch between them. Note that this is per-thread, not per-process, and
           so is usable in multithreaded programs.
      
           The setting of this field is inherited across fork and exec.
      
       (5) A new keyctl function (KEYCTL_ASSUME_AUTHORITY) has been added that
           permits a thread to assume the authority to deal with an uninstantiated
           key. Assumption is only permitted if the authorisation key associated
           with the uninstantiated key is somewhere in the thread's keyrings.
      
           This function can also clear the assumption.
      
       (6) A new magic key specifier has been added to refer to the currently
           assumed authorisation key (KEY_SPEC_REQKEY_AUTH_KEY).
      
       (7) Instantiation will only proceed if the appropriate authorisation key is
           assumed first. The assumed authorisation key is discarded if
           instantiation is successful.
      
       (8) key_validate() is moved from the file of request_key functions to the
           file of permissions functions.
      
       (9) The documentation is updated.
      
      From: <Valdis.Kletnieks@vt.edu>
      
          Build fix.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
      Cc: Alexander Zangerl <az@bond.edu.au>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      b5f545c8
    • Paul E. McKenney's avatar
      [PATCH] remove get_task_struct_rcu() · d4829cd5
      Paul E. McKenney authored
      
      
      The latest set of signal-RCU patches does not use get_task_struct_rcu().
      Attached is a patch that removes it.
      Signed-off-by: default avatar"Paul E. McKenney" <paulmck@us.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      d4829cd5
    • Ingo Molnar's avatar
      [PATCH] RCU signal handling · e56d0903
      Ingo Molnar authored
      
      
      RCU tasklist_lock and RCU signal handling: send signals RCU-read-locked
      instead of tasklist_lock read-locked.  This is a scalability improvement on
      SMP and a preemption-latency improvement under PREEMPT_RCU.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@us.ibm.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      Acked-by: default avatarWilliam Irwin <wli@holomorphy.com>
      Cc: Roland McGrath <roland@redhat.com>
      Cc: Oleg Nesterov <oleg@tv-sign.ru>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      e56d0903
    • Christoph Lameter's avatar
      [PATCH] Swap Migration V5: PF_SWAPWRITE to allow writing to swap · 930d9152
      Christoph Lameter authored
      
      
      Add PF_SWAPWRITE to control a processes permission to write to swap.
      
      - Use PF_SWAPWRITE in may_write_to_queue() instead of checking for kswapd
        and pdflush
      
      - Set PF_SWAPWRITE flag for kswapd and pdflush
      Signed-off-by: default avatarChristoph Lameter <clameter@sgi.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      930d9152
  4. 06 Jan, 2006 1 commit
    • Christoph Lameter's avatar
      [PATCH] atomic_long_t & include/asm-generic/atomic.h V2 · d3cb4871
      Christoph Lameter authored
      
      
      Several counters already have the need to use 64 atomic variables on 64 bit
      platforms (see mm_counter_t in sched.h).  We have to do ugly ifdefs to fall
      back to 32 bit atomic on 32 bit platforms.
      
      The VM statistics patch that I am working on will also make more extensive
      use of atomic64.
      
      This patch introduces a new type atomic_long_t by providing definitions in
      asm-generic/atomic.h that works similar to the c "long" type.  Its 32 bits
      on 32 bit platforms and 64 bits on 64 bit platforms.
      
      Also cleans up the determination of the mm_counter_t in sched.h.
      Signed-off-by: default avatarChristoph Lameter <clameter@sgi.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      d3cb4871
  5. 28 Nov, 2005 1 commit
    • Ashok Raj's avatar
      [PATCH] clean up lock_cpu_hotplug() in cpufreq · a9d9baa1
      Ashok Raj authored
      
      
      There are some callers in cpufreq hotplug notify path that the lowest
      function calls lock_cpu_hotplug().  The lock is already held during
      cpu_up() and cpu_down() calls when the notify calls are broadcast to
      registered clients.
      
      Ideally if possible, we could disable_preempt() at the highest caller and
      make sure we dont sleep in the path down in cpufreq->driver_target() calls
      but the calls are so intertwined and cumbersome to cleanup.
      
      Hence we consistently use lock_cpu_hotplug() and unlock_cpu_hotplug() in
      all places.
      
       - Removed export of cpucontrol semaphore and made it static.
       - removed explicit uses of up/down with lock_cpu_hotplug()
         so we can keep track of the the callers in same thread context and
         just keep refcounts without calling a down() that causes a deadlock.
       - Removed current_in_hotplug() uses
       - Removed PF_HOTPLUG_CPU in sched.h introduced for the current_in_hotplug()
         temporary workaround.
      
      Tested with insmod of cpufreq_stat.ko, and logical online/offline
      to make sure we dont have any hang situations.
      Signed-off-by: default avatarAshok Raj <ashok.raj@intel.com>
      Cc: Zwane Mwaikambo <zwane@linuxpower.ca>
      Cc: Shaohua Li <shaohua.li@intel.com>
      Cc: "Siddha, Suresh B" <suresh.b.siddha@intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      a9d9baa1
  6. 14 Nov, 2005 4 commits
  7. 09 Nov, 2005 1 commit
    • Ashok Raj's avatar
      [PATCH] cpu hotplug: fix locking in cpufreq drivers · 90d45d17
      Ashok Raj authored
      
      
      When calling target drivers to set frequency, we take cpucontrol lock.
      When we modified the code to accomodate CPU hotplug, there was an attempt
      to take a double lock of cpucontrol leading to a deadlock.  Since the
      current thread context is already holding the cpucontrol lock, we dont need
      to make another attempt to acquire it.
      
      Now we leave a trace in current->flags indicating current thread already is
      under cpucontrol lock held, so we dont attempt to do this another time.
      
      Thanks to Andrew Morton for the beating:-)
      
      From: Brice Goglin <Brice.Goglin@ens-lyon.org>
      
        Build fix
      
      (akpm: this patch is still unpleasant.  Ashok continues to look for a cleaner
      solution, doesn't he?  ;))
      Signed-off-by: default avatarAshok Raj <ashok.raj@intel.com>
      Signed-off-by: default avatarBrice Goglin <Brice.Goglin@ens-lyon.org>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      90d45d17
  8. 31 Oct, 2005 3 commits
    • Oleg Nesterov's avatar
      [PATCH] cleanup the usage of SEND_SIG_xxx constants · 621d3121
      Oleg Nesterov authored
      
      
      This patch simplifies some checks for magic siginfo values.  It should not
      change the behaviour in any way.
      Signed-off-by: default avatarOleg Nesterov <oleg@tv-sign.ru>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      621d3121
    • Paul Jackson's avatar
      [PATCH] sched: hardcode non-smp set_cpus_allowed · 4098f991
      Paul Jackson authored
      
      
      Simplify the UP (1 CPU) implementatin of set_cpus_allowed.
      
      The one CPU is hardcoded to be cpu 0 - so just test for that bit, and avoid
      having to pick up the cpu_online_map.
      
      Also, unexport cpu_online_map: it was only needed for set_cpus_allowed().
      Signed-off-by: default avatarPaul Jackson <pj@sgi.com>
      Acked-by: default avatarIngo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      4098f991
    • Paul Jackson's avatar
      [PATCH] cpusets: dual semaphore locking overhaul · 053199ed
      Paul Jackson authored
      
      
      Overhaul cpuset locking.  Replace single semaphore with two semaphores.
      
      The suggestion to use two locks was made by Roman Zippel.
      
      Both locks are global.  Code that wants to modify cpusets must first
      acquire the exclusive manage_sem, which allows them read-only access to
      cpusets, and holds off other would-be modifiers.  Before making actual
      changes, the second semaphore, callback_sem must be acquired as well.  Code
      that needs only to query cpusets must acquire callback_sem, which is also a
      global exclusive lock.
      
      The earlier problems with double tripping are avoided, because it is
      allowed for holders of manage_sem to nest the second callback_sem lock, and
      only callback_sem is needed by code called from within __alloc_pages(),
      where the double tripping had been possible.
      
      This is not quite the same as a normal read/write semaphore, because
      obtaining read-only access with intent to change must hold off other such
      attempts, while allowing read-only access w/o such intention.  Changing
      cpusets involves several related checks and changes, which must be done
      while allowing read-only queries (to avoid the double trip), but while
      ensuring nothing changes (holding off other would be modifiers.)
      
      This overhaul of cpuset locking also makes careful use of task_lock() to
      guard access to the task->cpuset pointer, closing a couple of race
      conditions noticed while reading this code (thanks, Roman).  I've never
      seen these races fail in any use or test.
      
      See further the comments in the code.
      Signed-off-by: default avatarPaul Jackson <pj@sgi.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      053199ed
  9. 30 Oct, 2005 4 commits
    • Hugh Dickins's avatar
      [PATCH] mm: fix rss and mmlist locking · f412ac08
      Hugh Dickins authored
      
      
      A couple of oddities were guarded by page_table_lock, no longer properly
      guarded when that is split.
      
      The mm_counters of file_rss and anon_rss: make those an atomic_t, or an
      atomic64_t if the architecture supports it, in such a case.  Definitions by
      courtesy of Christoph Lameter: who spent considerable effort on more scalable
      ways of counting, but found insufficient benefit in practice.
      
      And adding an mm with swap to the mmlist for swapoff: the list is well-
      guarded by its own lock, but the list_empty check now has to be repeated
      inside it.
      Signed-off-by: default avatarHugh Dickins <hugh@veritas.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      f412ac08
    • Hugh Dickins's avatar
      [PATCH] mm: mm_struct hiwaters moved · f449952b
      Hugh Dickins authored
      
      
      Slight and timid rearrangement of mm_struct: hiwater_rss and hiwater_vm were
      tacked on the end, but it seems better to keep them near _file_rss, _anon_rss
      and total_vm, in the same cacheline on those arches verified.
      
      There are likely to be more profitable rearrangements, but less obvious (is it
      good or bad that saved_auxv[AT_VECTOR_SIZE] isolates cpu_vm_mask and context
      from many others?), needing serious instrumentation.
      Signed-off-by: default avatarHugh Dickins <hugh@veritas.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      f449952b
    • Hugh Dickins's avatar
      [PATCH] mm: update_hiwaters just in time · 365e9c87
      Hugh Dickins authored
      
      
      update_mem_hiwater has attracted various criticisms, in particular from those
      concerned with mm scalability.  Originally it was called whenever rss or
      total_vm got raised.  Then many of those callsites were replaced by a timer
      tick call from account_system_time.  Now Frank van Maarseveen reports that to
      be found inadequate.  How about this?  Works for Frank.
      
      Replace update_mem_hiwater, a poor combination of two unrelated ops, by macros
      update_hiwater_rss and update_hiwater_vm.  Don't attempt to keep
      mm->hiwater_rss up to date at timer tick, nor every time we raise rss (usually
      by 1): those are hot paths.  Do the opposite, update only when about to lower
      rss (usually by many), or just before final accounting in do_exit.  Handle
      mm->hiwater_vm in the same way, though it's much less of an issue.  Demand
      that whoever collects these hiwater statistics do the work of taking the
      maximum with rss or total_vm.
      
      And there has been no collector of these hiwater statistics in the tree.  The
      new convention needs an example, so match Frank's usage by adding a VmPeak
      line above VmSize to /proc/<pid>/status, and also a VmHWM line above VmRSS
      (High-Water-Mark or High-Water-Memory).
      
      There was a particular anomaly during mremap move, that hiwater_vm might be
      captured too high.  A fleeting such anomaly remains, but it's quickly
      corrected now, whereas before it would stick.
      
      What locking?  None: if the app is racy then these statistics will be racy,
      it's not worth any overhead to make them exact.  But whenever it suits,
      hiwater_vm is updated under exclusive mmap_sem, and hiwater_rss under
      page_table_lock (for now) or with preemption disabled (later on): without
      going to any trouble, minimize the time between reading current values and
      updating, to minimize those occasions when a racing thread bumps a count up
      and back down in between.
      Signed-off-by: default avatarHugh Dickins <hugh@veritas.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      365e9c87
    • Hugh Dickins's avatar
      [PATCH] mm: rss = file_rss + anon_rss · 4294621f
      Hugh Dickins authored
      
      
      I was lazy when we added anon_rss, and chose to change as few places as
      possible.  So currently each anonymous page has to be counted twice, in rss
      and in anon_rss.  Which won't be so good if those are atomic counts in some
      configurations.
      
      Change that around: keep file_rss and anon_rss separately, and add them
      together (with get_mm_rss macro) when the total is needed - reading two
      atomics is much cheaper than updating two atomics.  And update anon_rss
      upfront, typically in memory.c, not tucked away in page_add_anon_rmap.
      Signed-off-by: default avatarHugh Dickins <hugh@veritas.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      4294621f
  10. 10 Oct, 2005 1 commit
    • Harald Welte's avatar
      [PATCH] Fix signal sending in usbdevio on async URB completion · 46113830
      Harald Welte authored
      
      
      If a process issues an URB from userspace and (starts to) terminate
      before the URB comes back, we run into the issue described above.  This
      is because the urb saves a pointer to "current" when it is posted to the
      device, but there's no guarantee that this pointer is still valid
      afterwards.
      
      In fact, there are three separate issues:
      
      1) the pointer to "current" can become invalid, since the task could be
         completely gone when the URB completion comes back from the device.
      
      2) Even if the saved task pointer is still pointing to a valid task_struct,
         task_struct->sighand could have gone meanwhile.
      
      3) Even if the process is perfectly fine, permissions may have changed,
         and we can no longer send it a signal.
      
      So what we do instead, is to save the PID and uid's of the process, and
      introduce a new kill_proc_info_as_uid() function.
      Signed-off-by: default avatarHarald Welte <laforge@gnumonks.org>
      [ Fixed up types and added symbol exports ]
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      46113830
  11. 29 Sep, 2005 2 commits
    • Linus Torvalds's avatar
      Revert task flag re-ordering, add comments · 4a8342d2
      Linus Torvalds authored
      Roland points out that the flags end up having non-obvious dependencies
      elsewhere, so revert aa55a086
      
       and add
      some comments about why things are as they are.
      
      We'll just have to fix up the broken comparisons. Roland has a patch.
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      4a8342d2
    • Oleg Nesterov's avatar
      [PATCH] fix TASK_STOPPED vs TASK_NONINTERACTIVE interaction · aa55a086
      Oleg Nesterov authored
      
      
      do_signal_stop:
      
      	for_each_thread(t) {
      		if (t->state < TASK_STOPPED)
      			++sig->group_stop_count;
      	}
      
      However, TASK_NONINTERACTIVE > TASK_STOPPED, so this loop will not
      count TASK_INTERRUPTIBLE | TASK_NONINTERACTIVE threads.
      
      See also wait_task_stopped(), which checks ->state > TASK_STOPPED.
      Signed-off-by: default avatarOleg Nesterov <oleg@tv-sign.ru>
      
      [ We really probably should always use the appropriate bitmasks to test
        task states, not do it like this. Using something like
      
      	#define TASK_RUNNABLE (TASK_RUNNING | TASK_INTERRUPTIBLE | \
      				TASK_UNINTERRUPTIBLE | TASK_NONINTERACTIVE)
      
        and then doing "if (task->state & TASK_RUNNABLE)" or similar. But the
        ordering of the task states is historical, and keeping the ordering
        does make sense regardless. ]
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      aa55a086
  12. 13 Sep, 2005 1 commit
    • Andrew Morton's avatar
      [PATCH] set_current_state() commentary · 498d0c57
      Andrew Morton authored
      
      
      Explain the mysteries of set_current_state().
      
      Quoth Linus:
      
       The scheduler itself never needs the memory barrier at all.
      
       The barrier is needed only if the user itself ends up testing some other
       thing afterwards, ie if you have
      
       	set_process_state(TASK_INTERRUPTIBLE);
       	if (still_need_to_sleep())
       		schedule();
      
       then the "still_need_to_sleep()" thing may test flags and wakeup events,
       and then you _may_ want to (and often do) make sure that the write of
       TASK_INTERRUPTIBLE is serialized wrt the reads of any wakeup data (since
       the wakeup may have happened on another CPU).
      
       So the comment is somewhat wrong. We don't really _care_ whether the state
       propagates out to other CPU's since all of our actions are purely local,
       and there is nothing we do that is conditional on any other CPU: we're
       going to sleep unconditionally, and the scheduler only cares about _our_
       state, not about somebody elses state.
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      498d0c57
  13. 12 Sep, 2005 1 commit
    • Paul Jackson's avatar
      [PATCH] cpuset semaphore depth check optimize · b3426599
      Paul Jackson authored
      
      
      Optimize the deadlock avoidance check on the global cpuset
      semaphore cpuset_sem.  Instead of adding a depth counter to the
      task struct of each task, rather just two words are enough, one
      to store the depth and the other the current cpuset_sem holder.
      
      Thanks to Nikita Danilov for the idea.
      Signed-off-by: default avatarPaul Jackson <pj@sgi.com>
      
      [ We may want to change this further, but at least it's now
        a totally internal decision to the cpusets code ]
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      b3426599
  14. 11 Sep, 2005 1 commit
  15. 10 Sep, 2005 3 commits
    • Nishanth Aravamudan's avatar
      [PATCH] add schedule_timeout_{,un}interruptible() interfaces · 64ed93a2
      Nishanth Aravamudan authored
      
      
      Add schedule_timeout_{,un}interruptible() interfaces so that
      schedule_timeout() callers don't have to worry about forgetting to add the
      set_current_state() call beforehand.
      Signed-off-by: default avatarNishanth Aravamudan <nacc@us.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      64ed93a2
    • Ingo Molnar's avatar
      [PATCH] sched: TASK_NONINTERACTIVE · d79fc0fc
      Ingo Molnar authored
      
      
      This patch implements a task state bit (TASK_NONINTERACTIVE), which can be
      used by blocking points to mark the task's wait as "non-interactive".  This
      does not mean the task will be considered a CPU-hog - the wait will simply
      not have an effect on the waiting task's priority - positive or negative
      alike.  Right now only pipe_wait() will make use of it, because it's a
      common source of not-so-interactive waits (kernel compilation jobs, etc.).
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      d79fc0fc
    • Paul Jackson's avatar
      [PATCH] cpuset semaphore depth check deadlock fix · 4247bdc6
      Paul Jackson authored
      
      
      The cpusets-formalize-intermediate-gfp_kernel-containment patch
      has a deadlock problem.
      
      This patch was part of a set of four patches to make more
      extensive use of the cpuset 'mem_exclusive' attribute to
      manage kernel GFP_KERNEL memory allocations and to constrain
      the out-of-memory (oom) killer.
      
      A task that is changing cpusets in particular ways on a system
      when it is very short of free memory could double trip over
      the global cpuset_sem semaphore (get the lock and then deadlock
      trying to get it again).
      
      The second attempt to get cpuset_sem would be in the routine
      cpuset_zone_allowed().  This was discovered by code inspection.
      I can not reproduce the problem except with an artifically
      hacked kernel and a specialized stress test.
      
      In real life you cannot hit this unless you are manipulating
      cpusets, and are very unlikely to hit it unless you are rapidly
      modifying cpusets on a memory tight system.  Even then it would
      be a rare occurence.
      
      If you did hit it, the task double tripping over cpuset_sem
      would deadlock in the kernel, and any other task also trying
      to manipulate cpusets would deadlock there too, on cpuset_sem.
      Your batch manager would be wedged solid (if it was cpuset
      savvy), but classic Unix shells and utilities would work well
      enough to reboot the system.
      
      The unusual condition that led to this bug is that unlike most
      semaphores, cpuset_sem _can_ be acquired while in the page
      allocation code, when __alloc_pages() calls cpuset_zone_allowed.
      So it easy to mistakenly perform the following sequence:
        1) task makes system call to alter a cpuset
        2) take cpuset_sem
        3) try to allocate memory
        4) memory allocator, via cpuset_zone_allowed, trys to take cpuset_sem
        5) deadlock
      
      The reason that this is not a serious bug for most users
      is that almost all calls to allocate memory don't require
      taking cpuset_sem.  Only some code paths off the beaten
      track require taking cpuset_sem -- which is good.  Taking
      a global semaphore on the main code path for allocating
      memory would not scale well.
      
      This patch fixes this deadlock by wrapping the up() and down()
      calls on cpuset_sem in kernel/cpuset.c with code that tracks
      the nesting depth of the current task on that semaphore, and
      only does the real down() if the task doesn't hold the lock
      already, and only does the real up() if the nesting depth
      (number of unmatched downs) is exactly one.
      
      The previous required use of refresh_mems(), anytime that
      the cpuset_sem semaphore was acquired and the code executed
      while holding that semaphore might try to allocate memory, is
      no longer required.  Two refresh_mems() calls were removed
      thanks to this.  This is a good change, as failing to get
      all the necessary refresh_mems() calls placed was a primary
      source of bugs in this cpuset code.  The only remaining call
      to refresh_mems() is made while doing a memory allocation,
      if certain task memory placement data needs to be updated
      from its cpuset, due to the cpuset having been changed behind
      the tasks back.
      Signed-off-by: default avatarPaul Jackson <pj@sgi.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      4247bdc6
  16. 09 Sep, 2005 1 commit
  17. 07 Sep, 2005 3 commits
  18. 13 Jul, 2005 1 commit
    • Robert Love's avatar
      [PATCH] inotify · 0eeca283
      Robert Love authored
      
      
      inotify is intended to correct the deficiencies of dnotify, particularly
      its inability to scale and its terrible user interface:
      
              * dnotify requires the opening of one fd per each directory
                that you intend to watch. This quickly results in too many
                open files and pins removable media, preventing unmount.
              * dnotify is directory-based. You only learn about changes to
                directories. Sure, a change to a file in a directory affects
                the directory, but you are then forced to keep a cache of
                stat structures.
              * dnotify's interface to user-space is awful.  Signals?
      
      inotify provides a more usable, simple, powerful solution to file change
      notification:
      
              * inotify's interface is a system call that returns a fd, not SIGIO.
      	  You get a single fd, which is select()-able.
              * inotify has an event that says "the filesystem that the item
                you were watching is on was unmounted."
              * inotify can watch directories or files.
      
      Inotify is currently used by Beagle (a desktop search infrastructure),
      Gamin (a FAM replacement), and other projects.
      
      See Documentation/filesystems/inotify.txt.
      Signed-off-by: default avatarRobert Love <rml@novell.com>
      Cc: John McCutchan <ttb@tentacle.dhs.org>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      0eeca283
  19. 27 Jun, 2005 1 commit
    • Jens Axboe's avatar
      [PATCH] Update cfq io scheduler to time sliced design · 22e2c507
      Jens Axboe authored
      
      
      This updates the CFQ io scheduler to the new time sliced design (cfq
      v3).  It provides full process fairness, while giving excellent
      aggregate system throughput even for many competing processes.  It
      supports io priorities, either inherited from the cpu nice value or set
      directly with the ioprio_get/set syscalls.  The latter closely mimic
      set/getpriority.
      
      This import is based on my latest from -mm.
      Signed-off-by: default avatarJens Axboe <axboe@suse.de>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      22e2c507
  20. 26 Jun, 2005 1 commit
    • Christoph Lameter's avatar
      [PATCH] Cleanup patch for process freezing · 3e1d1d28
      Christoph Lameter authored
      
      
      1. Establish a simple API for process freezing defined in linux/include/sched.h:
      
         frozen(process)		Check for frozen process
         freezing(process)		Check if a process is being frozen
         freeze(process)		Tell a process to freeze (go to refrigerator)
         thaw_process(process)	Restart process
         frozen_process(process)	Process is frozen now
      
      2. Remove all references to PF_FREEZE and PF_FROZEN from all
         kernel sources except sched.h
      
      3. Fix numerous locations where try_to_freeze is manually done by a driver
      
      4. Remove the argument that is no longer necessary from two function calls.
      
      5. Some whitespace cleanup
      
      6. Clear potential race in refrigerator (provides an open window of PF_FREEZE
         cleared before setting PF_FROZEN, recalc_sigpending does not check
         PF_FROZEN).
      
      This patch does not address the problem of freeze_processes() violating the rule
      that a task may only modify its own flags by setting PF_FREEZE. This is not clean
      in an SMP environment. freeze(process) is therefore not SMP safe!
      Signed-off-by: default avatarChristoph Lameter <christoph@lameter.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      3e1d1d28
  21. 25 Jun, 2005 2 commits
    • Dinakar Guniguntala's avatar
      [PATCH] Dynamic sched domains: sched changes · 1a20ff27
      Dinakar Guniguntala authored
      
      
      The following patches add dynamic sched domains functionality that was
      extensively discussed on lkml and lse-tech.  I would like to see this added to
      -mm
      
      o The main advantage with this feature is that it ensures that the scheduler
        load balacing code only balances against the cpus that are in the sched
        domain as defined by an exclusive cpuset and not all of the cpus in the
        system. This removes any overhead due to load balancing code trying to
        pull tasks outside of the cpu exclusive cpuset only to be prevented by
        the tasks' cpus_allowed mask.
      o cpu exclusive cpusets are useful for servers running orthogonal
        workloads such as RT applications requiring low latency and HPC
        applications that are throughput sensitive
      
      o It provides a new API partition_sched_domains in sched.c
        that makes dynamic sched domains possible.
      o cpu_exclusive cpusets sets are now associated with a sched domain.
        Which means that the users can dynamically modify the sched domains
        through the cpuset file system interface
      o ia64 sched domain code has been updated to support this feature as well
      o Currently, this does not support hotplug. (However some of my tests
        indicate hotplug+preempt is currently broken)
      o I have tested it extensively on x86.
      o This should have very minimal impact on performance as none of
        the fast paths are affected
      Signed-off-by: default avatarDinakar Guniguntala <dino@in.ibm.com>
      Acked-by: default avatarPaul Jackson <pj@sgi.com>
      Acked-by: default avatarNick Piggin <nickpiggin@yahoo.com.au>
      Acked-by: default avatarMatthew Dobson <colpatch@us.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      1a20ff27
    • Nick Piggin's avatar
      [PATCH] sched: consolidate sbe sbf · 476d139c
      Nick Piggin authored
      
      
      Consolidate balance-on-exec with balance-on-fork.  This is made easy by the
      sched-domains RCU patches.
      
      As well as the general goodness of code reduction, this allows the runqueues
      to be unlocked during balance-on-fork.
      
      schedstats is a problem.  Maybe just have balance-on-event instead of
      distinguishing fork and exec?
      Signed-off-by: default avatarNick Piggin <nickpiggin@yahoo.com.au>
      Acked-by: default avatarIngo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      476d139c