1. 06 May, 2015 1 commit
  2. 29 Apr, 2015 1 commit
    • Kirill A. Shutemov's avatar
      mm: Fix NULL pointer dereference in madvise(MADV_WILLNEED) support · 23f1538b
      Kirill A. Shutemov authored
      commit ee53664bda169f519ce3c6a22d378f0b946c8178 upstream.
      
      Sasha Levin found a NULL pointer dereference that is due to a missing
      page table lock, which in turn is due to the pmd entry in question being
      a transparent huge-table entry.
      
      The code - introduced in commit 1998cc04 ("mm: make
      madvise(MADV_WILLNEED) support swap file prefetch") - correctly checks
      for this situation using pmd_none_or_trans_huge_or_clear_bad(), but it
      turns out that that function doesn't work correctly.
      
      pmd_none_or_trans_huge_or_clear_bad() expected that pmd_bad() would
      trigger if the transparent hugepage bit was set, but it doesn't do that
      if pmd_numa() is also set. Note that the NUMA bit only gets set on real
      NUMA machines, so people trying to reproduce this on most normal
      development systems would never actually trigger this.
      
      Fix it by removing the very subtle (and subtly incorrect) expectation,
      and instead just checking pmd_trans_huge() explicitly.
      Reported-by: default avatarSasha Levin <sasha.levin@oracle.com>
      Acked-by: default avatarAndrea Arcangeli <aarcange@redhat.com>
      [ Additionally remove the now stale test for pmd_trans_huge() inside the
        pmd_bad() case - Linus ]
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Wang Long <long.wanglong@huawei.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      23f1538b
  3. 09 Sep, 2014 1 commit
  4. 31 May, 2014 1 commit
    • Mel Gorman's avatar
      mm: use paravirt friendly ops for NUMA hinting ptes · 6fe8c0a0
      Mel Gorman authored
      commit 29c7787075c92ca8af353acd5301481e6f37082f upstream.
      
      David Vrabel identified a regression when using automatic NUMA balancing
      under Xen whereby page table entries were getting corrupted due to the
      use of native PTE operations.  Quoting him
      
      	Xen PV guest page tables require that their entries use machine
      	addresses if the preset bit (_PAGE_PRESENT) is set, and (for
      	successful migration) non-present PTEs must use pseudo-physical
      	addresses.  This is because on migration MFNs in present PTEs are
      	translated to PFNs (canonicalised) so they may be translated back
      	to the new MFN in the destination domain (uncanonicalised).
      
      	pte_mknonnuma(), pmd_mknonnuma(), pte_mknuma() and pmd_mknuma()
      	set and clear the _PAGE_PRESENT bit using pte_set_flags(),
      	pte_clear_flags(), etc.
      
      	In a Xen PV guest, these functions must translate MFNs to PFNs
      	when clearing _PAGE_PRESENT and translate PFNs to MFNs when setting
      	_PAGE_PRESENT.
      
      His suggested fix converted p[te|md]_[set|clear]_flags to using
      paravirt-friendly ops but this is overkill.  He suggested an alternative
      of using p[te|md]_modify in the NUMA page table operations but this is
      does more work than necessary and would require looking up a VMA for
      protections.
      
      This patch modifies the NUMA page table operations to use paravirt
      friendly operations to set/clear the flags of interest.  Unfortunately
      this will take a performance hit when updating the PTEs on
      CONFIG_PARAVIRT but I do not see a way around it that does not break
      Xen.
      Signed-off-by: default avatarMel Gorman <mgorman@suse.de>
      Acked-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      Tested-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Peter Anvin <hpa@zytor.com>
      Cc: Fengguang Wu <fengguang.wu@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Steven Noonan <steven@uplinklabs.net>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
      Cc: Cyrill Gorcunov <gorcunov@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6fe8c0a0
  5. 09 Jan, 2014 1 commit
    • Rik van Riel's avatar
      mm: fix TLB flush race between migration, and change_protection_range · d303cf46
      Rik van Riel authored
      commit 20841405940e7be0617612d521e206e4b6b325db upstream.
      
      There are a few subtle races, between change_protection_range (used by
      mprotect and change_prot_numa) on one side, and NUMA page migration and
      compaction on the other side.
      
      The basic race is that there is a time window between when the PTE gets
      made non-present (PROT_NONE or NUMA), and the TLB is flushed.
      
      During that time, a CPU may continue writing to the page.
      
      This is fine most of the time, however compaction or the NUMA migration
      code may come in, and migrate the page away.
      
      When that happens, the CPU may continue writing, through the cached
      translation, to what is no longer the current memory location of the
      process.
      
      This only affects x86, which has a somewhat optimistic pte_accessible.
      All other architectures appear to be safe, and will either always flush,
      or flush whenever there is a valid mapping, even with no permissions
      (SPARC).
      
      The basic race looks like this:
      
      CPU A			CPU B			CPU C
      
      						load TLB entry
      make entry PTE/PMD_NUMA
      			fault on entry
      						read/write old page
      			start migrating page
      			change PTE/PMD to new page
      						read/write old page [*]
      flush TLB
      						reload TLB from new entry
      						read/write new page
      						lose data
      
      [*] the old page may belong to a new user at this point!
      
      The obvious fix is to flush remote TLB entries, by making sure that
      pte_accessible aware of the fact that PROT_NONE and PROT_NUMA memory may
      still be accessible if there is a TLB flush pending for the mm.
      
      This should fix both NUMA migration and compaction.
      
      [mgorman@suse.de: fix build]
      Signed-off-by: default avatarRik van Riel <riel@redhat.com>
      Signed-off-by: default avatarMel Gorman <mgorman@suse.de>
      Cc: Alex Thorlton <athorlton@sgi.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d303cf46
  6. 13 Oct, 2013 1 commit
  7. 20 Aug, 2013 1 commit
    • Linus Torvalds's avatar
      Fix TLB gather virtual address range invalidation corner cases · 8e220cfd
      Linus Torvalds authored
      commit 2b047252d087be7f2ba088b4933cd904f92e6fce upstream.
      
      Ben Tebulin reported:
      
       "Since v3.7.2 on two independent machines a very specific Git
        repository fails in 9/10 cases on git-fsck due to an SHA1/memory
        failures.  This only occurs on a very specific repository and can be
        reproduced stably on two independent laptops.  Git mailing list ran
        out of ideas and for me this looks like some very exotic kernel issue"
      
      and bisected the failure to the backport of commit 53a59fc6 ("mm:
      limit mmu_gather batching to fix soft lockups on !CONFIG_PREEMPT").
      
      That commit itself is not actually buggy, but what it does is to make it
      much more likely to hit the partial TLB invalidation case, since it
      introduces a new case in tlb_next_batch() that previously only ever
      happened when running out of memory.
      
      The real bug is that the TLB gather virtual memory range setup is subtly
      buggered.  It was introduced in commit 597e1c35 ("mm/mmu_gather:
      enable tlb flush range in generic mmu_gather"), and the range handling
      was already fixed at least once in commit e6c495a96ce0 ("mm: fix the TLB
      range flushed when __tlb_remove_page() runs out of slots"), but that fix
      was not complete.
      
      The problem with the TLB gather virtual address range is that it isn't
      set up by the initial tlb_gather_mmu() initialization (which didn't get
      the TLB range information), but it is set up ad-hoc later by the
      functions that actually flush the TLB.  And so any such case that forgot
      to update the TLB range entries would potentially miss TLB invalidates.
      
      Rather than try to figure out exactly which particular ad-hoc range
      setup was missing (I personally suspect it's the hugetlb case in
      zap_huge_pmd(), which didn't have the same logic as zap_pte_range()
      did), this patch just gets rid of the problem at the source: make the
      TLB range information available to tlb_gather_mmu(), and initialize it
      when initializing all the other tlb gather fields.
      
      This makes the patch larger, but conceptually much simpler.  And the end
      result is much more understandable; even if you want to play games with
      partial ranges when invalidating the TLB contents in chunks, now the
      range information is always there, and anybody who doesn't want to
      bother with it won't introduce subtle bugs.
      
      Ben verified that this fixes his problem.
      Reported-bisected-and-tested-by: default avatarBen Tebulin <tebulin@googlemail.com>
      Build-testing-by: default avatarStephen Rothwell <sfr@canb.auug.org.au>
      Build-testing-by: default avatarRichard Weinberger <richard.weinberger@gmail.com>
      Reviewed-by: default avatarMichal Hocko <mhocko@suse.cz>
      Acked-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8e220cfd
  8. 06 Jun, 2013 1 commit
    • Peter Zijlstra's avatar
      arch, mm: Remove tlb_fast_mode() · 29eb7782
      Peter Zijlstra authored
      Since the introduction of preemptible mmu_gather TLB fast mode has been
      broken. TLB fast mode relies on there being absolutely no concurrency;
      it frees pages first and invalidates TLBs later.
      
      However now we can get concurrency and stuff goes *bang*.
      
      This patch removes all tlb_fast_mode() code; it was found the better
      option vs trying to patch the hole by entangling tlb invalidation with
      the scheduler.
      
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Tony Luck <tony.luck@intel.com>
      Reported-by: default avatarMax Filippov <jcmvbkbc@gmail.com>
      Signed-off-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      29eb7782
  9. 05 Jun, 2013 1 commit
  10. 22 May, 2013 1 commit
    • Michael Holzheu's avatar
      kernel: Fix s390 absolute memory access for /dev/mem · 576ebd74
      Michael Holzheu authored
      On s390 the prefix page and absolute zero pages are not correctly
      returned when reading /dev/mem. The reason is that the s390 asm/io.h
      file includes the asm-generic/io.h file which then defines
      xlate_dev_mem_ptr() and therefore overwrites the s390 specific
      version that does the correct swap operation for prefix and absolute
      zero pages. The problem is a regression that was introduced with git
      commit cd248341 (s390/pci: base support).
      
      To fix the problem add "#ifndef xlate_dev_mem_ptr" in asm-generic/io.h
      and "#define xlate_dev_mem_ptr" in asm/io.h. This ensures that the
      s390 version is used. For completeness also add the "#ifndef"
      construct for xlate_dev_kmem_ptr().
      Signed-off-by: default avatarMichael Holzheu <holzheu@linux.vnet.ibm.com>
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      576ebd74
  11. 29 Apr, 2013 2 commits
    • Hugh Dickins's avatar
      mm: allow arch code to control the user page table ceiling · 6ee8630e
      Hugh Dickins authored
      On architectures where a pgd entry may be shared between user and kernel
      (e.g.  ARM+LPAE), freeing page tables needs a ceiling other than 0.
      This patch introduces a generic USER_PGTABLES_CEILING that arch code can
      override.  It is the responsibility of the arch code setting the ceiling
      to ensure the complete freeing of the page tables (usually in
      pgd_free()).
      
      [catalin.marinas@arm.com: commit log; shift_arg_pages(), asm-generic/pgtables.h changes]
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: <stable@vger.kernel.org>	[3.3+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      6ee8630e
    • Gerald Schaefer's avatar
      mm/hugetlb: add more arch-defined huge_pte functions · 106c992a
      Gerald Schaefer authored
      Commit abf09bed ("s390/mm: implement software dirty bits")
      introduced another difference in the pte layout vs.  the pmd layout on
      s390, thoroughly breaking the s390 support for hugetlbfs.  This requires
      replacing some more pte_xxx functions in mm/hugetlbfs.c with a
      huge_pte_xxx version.
      
      This patch introduces those huge_pte_xxx functions and their generic
      implementation in asm-generic/hugetlb.h, which will now be included on
      all architectures supporting hugetlbfs apart from s390.  This change
      will be a no-op for those architectures.
      
      [akpm@linux-foundation.org: fix warning]
      Signed-off-by: default avatarGerald Schaefer <gerald.schaefer@de.ibm.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Hillf Danton <dhillf@gmail.com>
      Acked-by: Michal Hocko <mhocko@suse.cz>	[for !s390 parts]
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      106c992a
  12. 26 Apr, 2013 1 commit
    • Kevin Hilman's avatar
      cputime_nsecs: use math64.h for nsec resolution conversion helpers · 8c23b80e
      Kevin Hilman authored
      For the nsec resolution conversions to be useable on non 64-bit
      architectures, the helpers in <linux/math64.h> need to be used so the
      right arch-specific 64-bit math helpers can be used (e.g. do_div())
      Signed-off-by: default avatarKevin Hilman <khilman@linaro.org>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Hakan Akkan <hakanakkan@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Kevin Hilman <khilman@linaro.org>
      Cc: Li Zhong <zhong@linux.vnet.ibm.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      8c23b80e
  13. 25 Apr, 2013 1 commit
  14. 12 Apr, 2013 1 commit
    • Dave Hansen's avatar
      x86-32: Fix possible incomplete TLB invalidate with PAE pagetables · 1de14c3c
      Dave Hansen authored
      This patch attempts to fix:
      
      	https://bugzilla.kernel.org/show_bug.cgi?id=56461
      
      The symptom is a crash and messages like this:
      
      	chrome: Corrupted page table at address 34a03000
      	*pdpt = 0000000000000000 *pde = 0000000000000000
      	Bad pagetable: 000f [#1] PREEMPT SMP
      
      Ingo guesses this got introduced by commit 611ae8e3 ("x86/tlb:
      enable tlb flush range support for x86") since that code started to free
      unused pagetables.
      
      On x86-32 PAE kernels, that new code has the potential to free an entire
      PMD page and will clear one of the four page-directory-pointer-table
      (aka pgd_t entries).
      
      The hardware aggressively "caches" these top-level entries and invlpg
      does not actually affect the CPU's copy.  If we clear one we *HAVE* to
      do a full TLB flush, otherwise we might continue using a freed pmd page.
      (note, we do this properly on the population side in pud_populate()).
      
      This patch tracks whenever we clear one of these entries in the 'struct
      mmu_gather', and ensures that we follow up with a full tlb flush.
      
      BTW, I disassembled and checked that:
      
      	if (tlb->fullmm == 0)
      and
      	if (!tlb->fullmm && !tlb->need_flush_all)
      
      generate essentially the same code, so there should be zero impact there
      to the !PAE case.
      Signed-off-by: default avatarDave Hansen <dave.hansen@linux.intel.com>
      Cc: Peter Anvin <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Artem S Tashkinov <t.artem@mailcity.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      1de14c3c
  15. 15 Mar, 2013 1 commit
    • Rusty Russell's avatar
      CONFIG_SYMBOL_PREFIX: cleanup. · b92021b0
      Rusty Russell authored
      We have CONFIG_SYMBOL_PREFIX, which three archs define to the string
      "_".  But Al Viro broke this in "consolidate cond_syscall and
      SYSCALL_ALIAS declarations" (in linux-next), and he's not the first to
      do so.
      
      Using CONFIG_SYMBOL_PREFIX is awkward, since we usually just want to
      prefix it so something.  So various places define helpers which are
      defined to nothing if CONFIG_SYMBOL_PREFIX isn't set:
      
      1) include/asm-generic/unistd.h defines __SYMBOL_PREFIX.
      2) include/asm-generic/vmlinux.lds.h defines VMLINUX_SYMBOL(sym)
      3) include/linux/export.h defines MODULE_SYMBOL_PREFIX.
      4) include/linux/kernel.h defines SYMBOL_PREFIX (which differs from #7)
      5) kernel/modsign_certificate.S defines ASM_SYMBOL(sym)
      6) scripts/modpost.c defines MODULE_SYMBOL_PREFIX
      7) scripts/Makefile.lib defines SYMBOL_PREFIX on the commandline if
         CONFIG_SYMBOL_PREFIX is set, so that we have a non-string version
         for pasting.
      
      (arch/h8300/include/asm/linkage.h defines SYMBOL_NAME(), too).
      
      Let's solve this properly:
      1) No more generic prefix, just CONFIG_HAVE_UNDERSCORE_SYMBOL_PREFIX.
      2) Make linux/export.h usable from asm.
      3) Define VMLINUX_SYMBOL() and VMLINUX_SYMBOL_STR().
      4) Make everyone use them.
      Signed-off-by: default avatarRusty Russell <rusty@rustcorp.com.au>
      Reviewed-by: default avatarJames Hogan <james.hogan@imgtec.com>
      Tested-by: James Hogan <james.hogan@imgtec.com> (metag)
      b92021b0
  16. 13 Mar, 2013 1 commit
  17. 04 Mar, 2013 1 commit
  18. 02 Mar, 2013 2 commits
  19. 24 Feb, 2013 1 commit
  20. 19 Feb, 2013 1 commit
  21. 14 Feb, 2013 3 commits
    • Martin Schwidefsky's avatar
      s390/mm: implement software dirty bits · abf09bed
      Martin Schwidefsky authored
      The s390 architecture is unique in respect to dirty page detection,
      it uses the change bit in the per-page storage key to track page
      modifications. All other architectures track dirty bits by means
      of page table entries. This property of s390 has caused numerous
      problems in the past, e.g. see git commit ef5d437f
      "mm: fix XFS oops due to dirty pages without buffers on s390".
      
      To avoid future issues in regard to per-page dirty bits convert
      s390 to a fault based software dirty bit detection mechanism. All
      user page table entries which are marked as clean will be hardware
      read-only, even if the pte is supposed to be writable. A write by
      the user process will trigger a protection fault which will cause
      the user pte to be marked as dirty and the hardware read-only bit
      is removed.
      
      With this change the dirty bit in the storage key is irrelevant
      for Linux as a host, but the storage key is still required for
      KVM guests. The effect is that page_test_and_clear_dirty and the
      related code can be removed. The referenced bit in the storage
      key is still used by the page_test_and_clear_young primitive to
      provide page age information.
      
      For page cache pages of mappings with mapping_cap_account_dirty
      there will not be any change in behavior as the dirty bit tracking
      already uses read-only ptes to control the amount of dirty pages.
      Only for swap cache pages and pages of mappings without
      mapping_cap_account_dirty there can be additional protection faults.
      To avoid an excessive number of additional faults the mk_pte
      primitive checks for PageDirty if the pgprot value allows for writes
      and pre-dirties the pte. That avoids all additional faults for
      tmpfs and shmem pages until these pages are added to the swap cache.
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      abf09bed
    • Heiko Carstens's avatar
      asm-generic/io.h: convert readX defines to functions · 7292e7e0
      Heiko Carstens authored
      E.g. readl is defined like this
      
       #define readl(addr) __le32_to_cpu(__raw_readl(addr))
      
      If a there is a readl() call that doesn't check the return value
      this will cause a compile warning on big endian machines due to
      the __le32_to_cpu macro magic.
      
      E.g. code like this:
      
      	readl(addr);
      
      will generate the following compile warning:
      
      warning: value computed is not used [-Wunused-value]
      
      With this patch we get rid of dozens of compile warnings on s390.
      Signed-off-by: default avatarHeiko Carstens <heiko.carstens@de.ibm.com>
      Acked-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      7292e7e0
    • Al Viro's avatar
      burying unused conditionals · d64008a8
      Al Viro authored
      __ARCH_WANT_SYS_RT_SIGACTION,
      __ARCH_WANT_SYS_RT_SIGSUSPEND,
      __ARCH_WANT_COMPAT_SYS_RT_SIGSUSPEND,
      __ARCH_WANT_COMPAT_SYS_SCHED_RR_GET_INTERVAL - not used anymore
      CONFIG_GENERIC_{SIGALTSTACK,COMPAT_RT_SIG{ACTION,QUEUEINFO,PENDING,PROCMASK}} -
      can be assumed always set.
      d64008a8
  22. 12 Feb, 2013 1 commit
  23. 11 Feb, 2013 4 commits
  24. 09 Feb, 2013 1 commit
  25. 04 Feb, 2013 1 commit
  26. 03 Feb, 2013 2 commits
  27. 27 Jan, 2013 3 commits
    • Frederic Weisbecker's avatar
      cputime: Generic on-demand virtual cputime accounting · abf917cd
      Frederic Weisbecker authored
      If we want to stop the tick further idle, we need to be
      able to account the cputime without using the tick.
      
      Virtual based cputime accounting solves that problem by
      hooking into kernel/user boundaries.
      
      However implementing CONFIG_VIRT_CPU_ACCOUNTING require
      low level hooks and involves more overhead. But we already
      have a generic context tracking subsystem that is required
      for RCU needs by archs which plan to shut down the tick
      outside idle.
      
      This patch implements a generic virtual based cputime
      accounting that relies on these generic kernel/user hooks.
      
      There are some upsides of doing this:
      
      - This requires no arch code to implement CONFIG_VIRT_CPU_ACCOUNTING
      if context tracking is already built (already necessary for RCU in full
      tickless mode).
      
      - We can rely on the generic context tracking subsystem to dynamically
      (de)activate the hooks, so that we can switch anytime between virtual
      and tick based accounting. This way we don't have the overhead
      of the virtual accounting when the tick is running periodically.
      
      And one downside:
      
      - There is probably more overhead than a native virtual based cputime
      accounting. But this relies on hooks that are already set anyway.
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Li Zhong <zhong@linux.vnet.ibm.com>
      Cc: Namhyung Kim <namhyung.kim@lge.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      abf917cd
    • Frederic Weisbecker's avatar
      cputime: Move default nsecs_to_cputime() to jiffies based cputime file · ae8dda5c
      Frederic Weisbecker authored
      If the architecture doesn't provide an implementation of
      nsecs_to_cputime(), the cputime accounting core uses a
      default one that converts the nanoseconds to jiffies. However
      this only makes sense if we use the jiffies based cputime.
      
      For now it doesn't matter much because this API is only
      called on code that uses jiffies based cputime accounting.
      
      But the code may evolve and this API may be used more
      broadly in the future. Keeping this default implementation
      around is very error prone as it may introduce a bug and
      hide it on architectures that don't override this API.
      
      Fix this by moving this definition to the jiffies based
      cputime headers as it is the only place where it belongs to.
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Li Zhong <zhong@linux.vnet.ibm.com>
      Cc: Namhyung Kim <namhyung.kim@lge.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      ae8dda5c
    • Frederic Weisbecker's avatar
      cputime: Librarize per nsecs resolution cputime definitions · 39613766
      Frederic Weisbecker authored
      The full dynticks cputime accounting that we'll soon introduce
      will rely on sched_clock(). And its clock can have a per
      nanosecond granularity.
      
      To prepare for this, we need to have a cputime_t implementation
      that has this precision.
      
      ia64 virtual cputime accounting already uses that granularity
      so all we need is to librarize its implementation in the asm
      generic headers.
      
      Also librarize the default per jiffy granularity cputime_t
      as well so that we can easily pick either implementation
      depending on the cputime accounting config we choose.
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Li Zhong <zhong@linux.vnet.ibm.com>
      Cc: Namhyung Kim <namhyung.kim@lge.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      39613766
  28. 24 Jan, 2013 1 commit
  29. 22 Jan, 2013 2 commits
    • Shawn Guo's avatar
      gpio: devm_gpio_* support should not depend on GPIOLIB · 6a89a314
      Shawn Guo authored
      Some architectures (e.g. blackfin) provide gpio API without requiring
      GPIOLIB support (ARCH_WANT_OPTIONAL_GPIOLIB).  devm_gpio_* functions
      should also work for these architectures, since they do not really
      depend on GPIOLIB.
      
      Add a new option GPIO_DEVRES (enabled by default) to control the build
      of devres.c.  It also removes the empty version of devm_gpio_*
      functions for !GENERIC_GPIO build from linux/gpio.h, and moves the
      function declarations from asm-generic/gpio.h into linux/gpio.h.
      Signed-off-by: default avatarShawn Guo <shawn.guo@linaro.org>
      Signed-off-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      6a89a314
    • Shawn Guo's avatar
      gpio: fix warning of 'struct gpio_chip' declaration · d59b4eaa
      Shawn Guo authored
      The struct gpio_chip is only defined inside #ifdef CONFIG_GPIOLIB,
      but it's referenced by gpiochip_add_pin_range() and
      gpiochip_remove_pin_ranges() which are outside #ifdef CONFIG_GPIOLIB.
      Thus, we see the following warning when building blackfin image, where
      GPIOLIB is not required.
      
        CC      arch/blackfin/kernel/bfin_gpio.o
        CC      init/version.o
      In file included from arch/blackfin/include/asm/gpio.h:321,
                       from arch/blackfin/kernel/bfin_gpio.c:15:
      include/asm-generic/gpio.h:298: warning: 'struct gpio_chip' declared inside parameter list
      include/asm-generic/gpio.h:298: warning: its scope is only this definition or declaration, which is probably not what you want
      include/asm-generic/gpio.h:304: warning: 'struct gpio_chip' declared inside parameter list
      
      Move pinctrl trunk into #ifdef CONFIG_GPIOLIB to fix the warning,
      since it appears that pinctrl gpio range support depends on GPIOLIB.
      Signed-off-by: default avatarShawn Guo <shawn.guo@linaro.org>
      Signed-off-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      d59b4eaa