Commit bd9c6f7f authored by Rohit Gupta's avatar Rohit Gupta
Browse files

add the rejects and new files created after patch

parent 28f6e08d
The module hwlat_detector is a special purpose kernel module that is used to
detect large system latencies induced by the behavior of certain underlying
hardware or firmware, independent of Linux itself. The code was developed
originally to detect SMIs (System Management Interrupts) on x86 systems,
however there is nothing x86 specific about this patchset. It was
originally written for use by the "RT" patch since the Real Time
kernel is highly latency sensitive.
SMIs are usually not serviced by the Linux kernel, which typically does not
even know that they are occuring. SMIs are instead are set up by BIOS code
and are serviced by BIOS code, usually for "critical" events such as
management of thermal sensors and fans. Sometimes though, SMIs are used for
other tasks and those tasks can spend an inordinate amount of time in the
handler (sometimes measured in milliseconds). Obviously this is a problem if
you are trying to keep event service latencies down in the microsecond range.
The hardware latency detector works by hogging all of the cpus for configurable
amounts of time (by calling stop_machine()), polling the CPU Time Stamp Counter
for some period, then looking for gaps in the TSC data. Any gap indicates a
time when the polling was interrupted and since the machine is stopped and
interrupts turned off the only thing that could do that would be an SMI.
Note that the SMI detector should *NEVER* be used in a production environment.
It is intended to be run manually to determine if the hardware platform has a
problem with long system firmware service routines.
Loading the module hwlat_detector passing the parameter "enabled=1" (or by
setting the "enable" entry in "hwlat_detector" debugfs toggled on) is the only
step required to start the hwlat_detector. It is possible to redefine the
threshold in microseconds (us) above which latency spikes will be taken
into account (parameter "threshold=").
# modprobe hwlat_detector enabled=1 threshold=100
After the module is loaded, it creates a directory named "hwlat_detector" under
the debugfs mountpoint, "/debug/hwlat_detector" for this text. It is necessary
to have debugfs mounted, which might be on /sys/debug on your system.
The /debug/hwlat_detector interface contains the following files:
count - number of latency spikes observed since last reset
enable - a global enable/disable toggle (0/1), resets count
max - maximum hardware latency actually observed (usecs)
sample - a pipe from which to read current raw sample data
in the format <timestamp> <latency observed usecs>
(can be opened O_NONBLOCK for a single sample)
threshold - minimum latency value to be considered (usecs)
width - time period to sample with CPUs held (usecs)
must be less than the total window size (enforced)
window - total period of sampling, width being inside (usecs)
By default we will set width to 500,000 and window to 1,000,000, meaning that
we will sample every 1,000,000 usecs (1s) for 500,000 usecs (0.5s). If we
observe any latencies that exceed the threshold (initially 100 usecs),
then we write to a global sample ring buffer of 8K samples, which is
consumed by reading from the "sample" (pipe) debugfs file interface.
Using the Linux Kernel Latency Histograms
This document gives a short explanation how to enable, configure and use
latency histograms. Latency histograms are primarily relevant in the
context of real-time enabled kernels (CONFIG_PREEMPT/CONFIG_PREEMPT_RT)
and are used in the quality management of the Linux real-time
* Purpose of latency histograms
A latency histogram continuously accumulates the frequencies of latency
data. There are two types of histograms
- potential sources of latencies
- effective latencies
* Potential sources of latencies
Potential sources of latencies are code segments where interrupts,
preemption or both are disabled (aka critical sections). To create
histograms of potential sources of latency, the kernel stores the time
stamp at the start of a critical section, determines the time elapsed
when the end of the section is reached, and increments the frequency
counter of that latency value - irrespective of whether any concurrently
running process is affected by latency or not.
- Configuration items (in the Kernel hacking/Tracers submenu)
* Effective latencies
Effective latencies are actually occuring during wakeup of a process. To
determine effective latencies, the kernel stores the time stamp when a
process is scheduled to be woken up, and determines the duration of the
wakeup time shortly before control is passed over to this process. Note
that the apparent latency in user space may be somewhat longer, since the
process may be interrupted after control is passed over to it but before
the execution in user space takes place. Simply measuring the interval
between enqueuing and wakeup may also not appropriate in cases when a
process is scheduled as a result of a timer expiration. The timer may have
missed its deadline, e.g. due to disabled interrupts, but this latency
would not be registered. Therefore, the offsets of missed timers are
recorded in a separate histogram. If both wakeup latency and missed timer
offsets are configured and enabled, a third histogram may be enabled that
records the overall latency as a sum of the timer latency, if any, and the
wakeup latency. This histogram is called "timerandwakeup".
- Configuration items (in the Kernel hacking/Tracers submenu)
* Usage
The interface to the administration of the latency histograms is located
in the debugfs file system. To mount it, either enter
mount -t sysfs nodev /sys
mount -t debugfs nodev /sys/kernel/debug
from shell command line level, or add
nodev /sys sysfs defaults 0 0
nodev /sys/kernel/debug debugfs defaults 0 0
to the file /etc/fstab. All latency histogram related files are then
available in the directory /sys/kernel/debug/tracing/latency_hist. A
particular histogram type is enabled by writing non-zero to the related
variable in the /sys/kernel/debug/tracing/latency_hist/enable directory.
Select "preemptirqsoff" for the histograms of potential sources of
latencies and "wakeup" for histograms of effective latencies etc. The
histogram data - one per CPU - are available in the files
The histograms are reset by writing non-zero to the file "reset" in a
particular latency directory. To reset all latency data, use
if test -d $HISTDIR
for i in `find . | grep /reset$`
echo 1 >$i
* Data format
Latency data are stored with a resolution of one microsecond. The
maximum latency is 10,240 microseconds. The data are only valid, if the
overflow register is empty. Every output line contains the latency in
microseconds in the first row and the number of samples in the second
row. To display only lines with a positive latency count, use, for
grep -v " 0$" /sys/kernel/debug/tracing/latency_hist/preemptoff/CPU0
#Minimum latency: 0 microseconds.
#Average latency: 0 microseconds.
#Maximum latency: 25 microseconds.
#Total samples: 3104770694
#There are 0 samples greater or equal than 10240 microseconds
#usecs samples
0 2984486876
1 49843506
2 58219047
3 5348126
4 2187960
5 3388262
6 959289
7 208294
8 40420
9 4485
10 14918
11 18340
12 25052
13 19455
14 5602
15 969
16 47
17 18
18 14
19 1
20 3
21 2
22 5
23 2
25 1
* Wakeup latency of a selected process
To only collect wakeup latency data of a particular process, write the
PID of the requested process to
PIDs are not considered, if this variable is set to 0.
* Details of the process with the highest wakeup latency so far
Selected data of the process that suffered from the highest wakeup
latency that occurred in a particular CPU are available in the file
In addition, other relevant system data at the time when the
latency occurred are given.
The format of the data is (all in one line):
<PID> <Priority> <Latency> (<Timeroffset>) <Command> \
<- <PID> <Priority> <Command> <Timestamp>
The value of <Timeroffset> is only relevant in the combined timer
and wakeup latency recording. In the wakeup recording, it is
always 0, in the missed_timer_offsets recording, it is the same
as <Latency>.
When retrospectively searching for the origin of a latency and
tracing was not enabled, it may be helpful to know the name and
some basic data of the task that (finally) was switching to the
late real-tlme task. In addition to the victim's data, also the
data of the possible culprit are therefore displayed after the
"<-" symbol.
Finally, the timestamp of the time when the latency occurred
in <seconds>.<microseconds> after the most recent system boot
is provided.
These data are also reset when the wakeup histogram is reset.
--- arch/arm/mach-exynos/platsmp.c
+++ arch/arm/mach-exynos/platsmp.c
@@ -126,7 +126,7 @@
if (timeout == 0) {
printk(KERN_ERR "cpu1 power enable failed");
- spin_unlock(&boot_lock);
+ raw_spin_unlock(&boot_lock);
return -ETIMEDOUT;
--- arch/arm/mm/fault.c
+++ arch/arm/mm/fault.c
@@ -277,7 +277,7 @@
* If we're in an interrupt or have no user
* context, we must not take the fault..
- if (in_atomic() || !mm)
+ if (!mm || pagefault_disabled())
goto no_context;
if (user_mode(regs))
#ifndef _ASM_IRQ_WORK_H
#define _ASM_IRQ_WORK_H
#include <asm/processor.h>
static inline bool arch_irq_work_has_interrupt(void)
return cpu_has_apic;
#endif /* _ASM_IRQ_WORK_H */
--- drivers/gpu/drm/drm_irq.c
+++ drivers/gpu/drm/drm_irq.c
@@ -628,11 +628,6 @@
* code gets preempted or delayed for some reason.
for (i = 0; i < DRM_TIMESTAMP_MAXRETRIES; i++) {
- /* Disable preemption to make it very likely to
- * succeed in the first iteration even on PREEMPT_RT kernel.
- */
- preempt_disable();
/* Get system timestamp before query. */
stime = ktime_get();
@@ -644,8 +639,6 @@
if (!drm_timestamp_monotonic)
mono_time_offset = ktime_get_monotonic_offset();
- preempt_enable();
/* Return as no-op if scanout query unsupported or failed. */
if (!(vbl_status & DRM_SCANOUTPOS_VALID)) {
DRM_DEBUG("crtc %d : scanoutpos query failed [%d].\n",
--- drivers/gpu/drm/i915/i915_gem.c
+++ drivers/gpu/drm/i915/i915_gem.c
@@ -4449,7 +4467,7 @@
if (!mutex_is_locked(mutex))
return false;
-#if defined(CONFIG_SMP) && !defined(CONFIG_DEBUG_MUTEXES) && !defined(CONFIG_PREEMPT_RT_BASE)
+#if defined(CONFIG_SMP) && !defined(CONFIG_DEBUG_MUTEXES)
return mutex->owner == task;
/* Since UP may be pre-empted, we cannot assume that we own the lock */
--- drivers/misc/Makefile
+++ drivers/misc/Makefile
@@ -53,3 +53,4 @@
obj-$(CONFIG_VMWARE_VMCI) += vmw_vmci/
obj-$(CONFIG_LATTICE_ECP3_CONFIG) += lattice-ecp3-config.o
obj-$(CONFIG_SRAM) += sram.o
+obj-$(CONFIG_HWLAT_DETECTOR) += hwlat_detector.o
This diff is collapsed.
On branch odroidxu3-3.10.y-patch_3.10.93-rt101
Changes to be committed:
(use "git reset HEAD <file>..." to unstage)
new file: Documentation/hwlat_detector.txt
new file: Documentation/trace/histograms.txt
new file: arch/arm/mach-exynos/platsmp.c.rej
new file: arch/arm/mm/fault.c.rej
new file: arch/x86/include/asm/irq_work.h
new file: drivers/gpu/drm/drm_irq.c.rej
new file: drivers/gpu/drm/i915/i915_gem.c.rej
new file: drivers/misc/Makefile.rej
new file: drivers/misc/hwlat_detector.c
new file: include/asm-generic/irq_work.h
new file: include/linux/locallock.h
new file: include/linux/mutex.h.rej
new file: include/linux/mutex_rt.h
new file: include/linux/rwlock_rt.h
new file: include/linux/rwlock_types_rt.h
new file: include/linux/rwsem_rt.h
new file: include/linux/spinlock_rt.h
new file: include/linux/spinlock_types_nort.h
new file: include/linux/spinlock_types_raw.h
new file: include/linux/spinlock_types_rt.h
new file: include/linux/wait-simple.h
new file: include/linux/work-simple.h
new file: include/trace/events/hist.h
new file: include/trace/events/latency_hist.h
new file: kernel/fork.c.rej
new file: kernel/hrtimer.c.rej
new file: kernel/rt.c
new file: kernel/sched/core.c.rej
new file: kernel/sched/work-simple.c
new file: kernel/softirq.c.rej
new file: kernel/trace/latency_hist.c
new file: kernel/wait-simple.c
new file: localversion-rt
new file: mm/page_alloc.c.rej
Untracked files:
(use "git add <file>..." to include in what will be committed)
#ifndef __ASM_IRQ_WORK_H
#define __ASM_IRQ_WORK_H
static inline bool arch_irq_work_has_interrupt(void)
return false;
#endif /* __ASM_IRQ_WORK_H */
#include <linux/percpu.h>
#include <linux/spinlock.h>
# define LL_WARN(cond) WARN_ON(cond)
# define LL_WARN(cond) do { } while (0)
* per cpu lock based substitute for local_irq_*()
struct local_irq_lock {
spinlock_t lock;
struct task_struct *owner;
int nestcnt;
unsigned long flags;
#define DEFINE_LOCAL_IRQ_LOCK(lvar) \
DEFINE_PER_CPU(struct local_irq_lock, lvar) = { \
.lock = __SPIN_LOCK_UNLOCKED((lvar).lock) }
#define DECLARE_LOCAL_IRQ_LOCK(lvar) \
DECLARE_PER_CPU(struct local_irq_lock, lvar)
#define local_irq_lock_init(lvar) \
do { \
int __cpu; \
for_each_possible_cpu(__cpu) \
spin_lock_init(&per_cpu(lvar, __cpu).lock); \
} while (0)
static inline void __local_lock(struct local_irq_lock *lv)
if (lv->owner != current) {
lv->owner = current;
#define local_lock(lvar) \
do { __local_lock(&get_local_var(lvar)); } while (0)
static inline int __local_trylock(struct local_irq_lock *lv)
if (lv->owner != current && spin_trylock(&lv->lock)) {
lv->owner = current;
lv->nestcnt = 1;
return 1;
return 0;
#define local_trylock(lvar) \
({ \
int __locked; \
__locked = __local_trylock(&get_local_var(lvar)); \
if (!__locked) \
put_local_var(lvar); \
__locked; \
static inline void __local_unlock(struct local_irq_lock *lv)
LL_WARN(lv->nestcnt == 0);
LL_WARN(lv->owner != current);
if (--lv->nestcnt)
lv->owner = NULL;
#define local_unlock(lvar) \
do { \
__local_unlock(&__get_cpu_var(lvar)); \
put_local_var(lvar); \
} while (0)
static inline void __local_lock_irq(struct local_irq_lock *lv)
spin_lock_irqsave(&lv->lock, lv->flags);
lv->owner = current;
lv->nestcnt = 1;
#define local_lock_irq(lvar) \
do { __local_lock_irq(&get_local_var(lvar)); } while (0)
#define local_lock_irq_on(lvar, cpu) \
do { __local_lock_irq(&per_cpu(lvar, cpu)); } while (0)
static inline void __local_unlock_irq(struct local_irq_lock *lv)
LL_WARN(lv->owner != current);
lv->owner = NULL;
lv->nestcnt = 0;
#define local_unlock_irq(lvar) \
do { \
__local_unlock_irq(&__get_cpu_var(lvar)); \
put_local_var(lvar); \
} while (0)
#define local_unlock_irq_on(lvar, cpu) \
do { \
__local_unlock_irq(&per_cpu(lvar, cpu)); \
} while (0)
static inline int __local_lock_irqsave(struct local_irq_lock *lv)
if (lv->owner != current) {
return 0;
} else {
return 1;
#define local_lock_irqsave(lvar, _flags) \
do { \
if (__local_lock_irqsave(&get_local_var(lvar))) \
put_local_var(lvar); \
_flags = __get_cpu_var(lvar).flags; \
} while (0)
#define local_lock_irqsave_on(lvar, _flags, cpu) \
do { \
__local_lock_irqsave(&per_cpu(lvar, cpu)); \
_flags = per_cpu(lvar, cpu).flags; \
} while (0)
static inline int __local_unlock_irqrestore(struct local_irq_lock *lv,
unsigned long flags)
LL_WARN(lv->owner != current);
if (--lv->nestcnt)
return 0;
lv->owner = NULL;
spin_unlock_irqrestore(&lv->lock, lv->flags);
return 1;
#define local_unlock_irqrestore(lvar, flags) \
do { \
if (__local_unlock_irqrestore(&__get_cpu_var(lvar), flags)) \
put_local_var(lvar); \
} while (0)
#define local_unlock_irqrestore_on(lvar, flags, cpu) \
do { \
__local_unlock_irqrestore(&per_cpu(lvar, cpu), flags); \
} while (0)
#define local_spin_trylock_irq(lvar, lock) \
({ \
int __locked; \
local_lock_irq(lvar); \
__locked = spin_trylock(lock); \
if (!__locked) \
local_unlock_irq(lvar); \
__locked; \
#define local_spin_lock_irq(lvar, lock) \
do { \
local_lock_irq(lvar); \
spin_lock(lock); \
} while (0)
#define local_spin_unlock_irq(lvar, lock) \
do { \
spin_unlock(lock); \
local_unlock_irq(lvar); \
} while (0)
#define local_spin_lock_irqsave(lvar, lock, flags) \
do { \
local_lock_irqsave(lvar, flags); \
spin_lock(lock); \
} while (0)
#define local_spin_unlock_irqrestore(lvar, lock, flags) \
do { \
spin_unlock(lock); \
local_unlock_irqrestore(lvar, flags); \
} while (0)
#define get_locked_var(lvar, var) \
(*({ \
local_lock(lvar); \
&__get_cpu_var(var); \
#define put_locked_var(lvar, var) local_unlock(lvar)
#define local_lock_cpu(lvar) \
({ \
local_lock(lvar); \
smp_processor_id(); \
#define local_unlock_cpu(lvar) local_unlock(lvar)
#else /* PREEMPT_RT_BASE */
#define DEFINE_LOCAL_IRQ_LOCK(lvar) __typeof__(const int) lvar
#define DECLARE_LOCAL_IRQ_LOCK(lvar) extern __typeof__(const int) lvar
static inline void local_irq_lock_init(int lvar) { }
#define local_lock(lvar) preempt_disable()
#define local_unlock(lvar) preempt_enable()
#define local_lock_irq(lvar) local_irq_disable()
#define local_unlock_irq(lvar) local_irq_enable()
#define local_lock_irqsave(lvar, flags) local_irq_save(flags)
#define local_unlock_irqrestore(lvar, flags) local_irq_restore(flags)
#define local_spin_trylock_irq(lvar, lock) spin_trylock_irq(lock)
#define local_spin_lock_irq(lvar, lock) spin_lock_irq(lock)
#define local_spin_unlock_irq(lvar, lock) spin_unlock_irq(lock)
#define local_spin_lock_irqsave(lvar, lock, flags) \