CHROMIUM: watchdog: touch_nmi_watchdog should only touch local cpu not every one
This is a patch from Don Zickus. It's not been merged to the upstream yet. I ran into a scenario where while one cpu was stuck and should have panic'd because of the NMI watchdog, it didn't. The reason was another cpu was spewing stack dumps on to the console. Upon investigation, I noticed that when writing to the console and also when dumping the stack, the watchdog is touched. This causes all the cpus to reset their NMI watchdog flags and the 'stuck' cpu just spins forever. This change causes the semantics of touch_nmi_watchdog to be changed slightly. Previously, I accidentally changed the semantics and we noticed there was a codepath in which touch_nmi_watchdog could be touched from a preemtible area. That caused a BUG() to happen when CONFIG_DEBUG_PREEMPT was enabled. I believe it was the acpi code. My attempt here re-introduces the change to have the touch_nmi_watchdog() code only touch the local cpu instead of all of the cpus. But instead of using __get_cpu_var(), I use the __raw_get_cpu_var() version. This avoids the preemption problem. However my reasoning wasn't because I was trying to be lazy. Instead I rationalized it as, well if preemption is enabled then interrupts should be enabled to and the NMI watchdog will have no reason to trigger. So it won't matter if the wrong cpu is touched because the percpu interrupt counters the NMI watchdog uses should still be incrementing. BUG=chromium:305420 TEST=platform_KernelErrorPaths, verified on peppy Change-Id: I3791d76be7fc13c8bf6d3e21bf6a487ce1f50210 Signed-off-by: Don Zickus <dzickus@redhat.com> Signed-off-by: Ben Zhang <benzh@chromium.org> Reviewed-on: https://chromium-review.googlesource.com/179418 Reviewed-by: Sameer Nanda <snanda@chromium.org> [benzh: 3.14 rebase. Resolved trivial conflicts] Signed-off-by: Ben Zhang <benzh@chromium.org>
This commit is contained in:
@@ -163,14 +163,14 @@ void touch_all_softlockup_watchdogs(void)
|
||||
#ifdef CONFIG_HARDLOCKUP_DETECTOR
|
||||
void touch_nmi_watchdog(void)
|
||||
{
|
||||
if (watchdog_user_enabled) {
|
||||
unsigned cpu;
|
||||
|
||||
for_each_present_cpu(cpu) {
|
||||
if (per_cpu(watchdog_nmi_touch, cpu) != true)
|
||||
per_cpu(watchdog_nmi_touch, cpu) = true;
|
||||
}
|
||||
}
|
||||
/*
|
||||
* Using __raw here because some code paths have
|
||||
* preemption enabled. If preemption is enabled
|
||||
* then interrupts should be enabled too, in which
|
||||
* case we shouldn't have to worry about the watchdog
|
||||
* going off.
|
||||
*/
|
||||
__raw_get_cpu_var(watchdog_nmi_touch) = true;
|
||||
touch_softlockup_watchdog();
|
||||
}
|
||||
EXPORT_SYMBOL(touch_nmi_watchdog);
|
||||
|
||||
Reference in New Issue
Block a user