文章目录
1. Soft lockup和Hard lockup
1.1 定义
Lock up检测机制是内核中非常重要的机制,用于检测内核Lock up,也就是说CPU长时间执行在内核态的一种异常状态。如果说内核出现了以下警告,就与该问题相关:
watchdog: BUG: soft lockup - CPU#244 stuck for 26s!
pstate: 83400009 (Nzcv daif +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
pc : arm_smmu_cmdq_issue_cmdlist+0x178/0xa50
lr : arm_smmu_cmdq_issue_cmdlist+0x150/0xa50
sp : ffff8000d83ef290
x29: ffff8000d83ef290 x28: 000000003b9aca00 x27: 0000000000000000
x26: ffff8000d83ef3c0 x25: da86c0812194a0e8 x24: 0000000000000000
x23: 0000000000000040 x22: ffff8000d83ef340 x21: ffff0000c63980c0
x20: 0000000000000001 x19: ffff0000c6398080 x18: 0000000000000000
x17: 0000000000000000 x16: 0000000000000000 x15: ffff3000b4a3bbb0
x14: ffff3000b4a30888 x13: ffff3000b4a3cf60 x12: 0000000000000000
x11: 0000000000000000 x10: 0000000000000000 x9 : ffffc08120e4d6bc
x8 : 0000000000000000 x7 : 0000000000000000 x6 : 0000000000048cfa
x5 : 0000000000000000 x4 : 0000000000000001 x3 : 000000000000000a
x2 : 0000000080000000 x1 : 0000000000000000 x0 : 0000000000000001
Call trace:
arm_smmu_cmdq_issue_cmdlist+0x178/0xa50
__arm_smmu_tlb_inv_range+0x118/0x254
arm_smmu_tlb_inv_range_asid+0x6c/0x130
arm_smmu_mm_invalidate_range+0xa0/0xa4
__mmu_notifier_invalidate_range_end+0x88/0x120
unmap_vmas+0x194/0x1e0
unmap_region+0xb4/0x144
do_mas_align_munmap+0x290/0x490
do_mas_munmap+0xbc/0x124
__vm_munmap+0xa8/0x19c
__arm64_sys_munmap+0x28/0x50
invoke_syscall+0x78/0x11c
el0_svc_common.constprop.0+0x58/0x1c0
do_el0_svc+0x34/0x60
el0_svc+0x2c/0xd4
el0t_64_sync_handler+0x114/0x140
el0t_64_sync+0x1a4/0x1a8
首先定义两种Lock up。
术语 | 定义 | 可能原因 |
---|---|---|
Hard lockup | CPU停留在内核态超过10s,此种状态下不仅不能调度task,也不能响应任何中断 | 1. 硬件异常,如CPU停滞; 2. 中断子系统异常; 3. 死锁; 4. 关中断过久 |
Soft lockup | CPU停留在内核态超过20s,此种状态下没有给其他task任何调度机会,但此时能够响应中断。 | 1. 关抢占过久; 2. 中断(软中断、硬中断)处理程序执行过久; 3. 异常的锁行为,如spinlock持锁过久(原理就是关了抢占过久),死锁等; 4. 长时间的内核态操作,如在内核态中长时间循环,且未使用schedule/sleep等进行调度; |
1.2. Lockup的检测原理
在内核中Soft lockup和Hard lockup的检测拥有相同的入口,如下代码所示:
在内核中,lockup检测函数入口是一个cpuhp的回调函数,保证每一个CPU在online时调用到。
/* CPU热拔插回调 */
static struct cpuhp_step cpuhp_hp_states[] = {
/* ... */
[CPUHP_AP_WATCHDOG_ONLINE] = {
.name = "lockup_detector:online",
.startup.single = lockup_detector_online_cpu,
.teardown.single = lockup_detector_offline_cpu,
},
/* ... */
};
int lockup_detector_online_cpu(unsigned int cpu)
{
if (cpumask_test_cpu(cpu, &watchdog_allowed_mask))
watchdog_enable(cpu);
return 0;
}
Soft lockup和Hard lockup的检测入口都是watchdog_enable
,通过该函数注册一个hrtimer回调watchdog_timer_fn
,完成打点、软锁和硬锁检测的功能。
static void watchdog_enable(unsigned int cpu)
{
struct hrtimer *hrtimer = this_cpu_ptr(&watchdog_hrtimer);
struct completion *done = this_cpu_ptr(&softlockup_completion);
WARN_ON_ONCE(cpu != smp_processor_id());
init_completion(done);
complete(done);
/*
* Start the timer first to prevent the hardlockup watchdog triggering
* before the timer has a chance to fire.
*/
hrtimer_init(hrtimer, CLOCK_MONOTONIC, HRTIMER_MODE_REL_HARD);
hrtimer->function = watchdog_timer_fn;
hrtimer_start(hrtimer, ns_to_ktime(sample_period),
HRTIMER_MODE_REL_PINNED_HARD);
/* Initialize timestamp */
update_touch_ts();
/* Enable the hardlockup detector */
if (watchdog_enabled & WATCHDOG_HARDLOCKUP_ENABLED)
watchdog_hardlockup_enable(cpu);
}
/* watchdog kicker functions */
static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer)
{
unsigned long touch_ts, period_ts, now;
struct pt_regs *regs = get_irq_regs();
int duration;
int softlockup_all_cpu_backtrace = sysctl_softlockup_all_cpu_backtrace;
if (!watchdog_enabled)
return HRTIMER_NORESTART;
watchdog_hardlockup_kick();
/* kick the softlockup detector */
if (completion_done(this_cpu_ptr(&softlockup_completion))) {
reinit_completion(this_cpu_ptr(&softlockup_completion));
stop_one_cpu_nowait(smp_processor_id(),
softlockup_fn, NULL,
this_cpu_ptr(&softlockup_stop_work));
}
/* .. and repeat */
hrtimer_forward_now(hrtimer, ns_to_ktime(sample_period));
/*
* Read the current timestamp first. It might become invalid anytime
* when a virtual machine is stopped by the host or when the watchog
* is touched from NMI.
*/
now = get_timestamp();
/*
* If a virtual machine is stopped by the host it can look to
* the watchdog like a soft lockup. This function touches the watchdog.
*/
kvm_check_and_clear_guest_paused();
/*
* The stored timestamp is comparable with @now only when not touched.
* It might get touched anytime from NMI. Make sure that is_softlockup()
* uses the same (valid) value.
*/
period_ts = READ_ONCE(*this_cpu_ptr(&watchdog_report_ts));
/* Reset the interval when touched by known problematic code. */
if (period_ts == SOFTLOCKUP_DELAY_REPORT) {
if (unlikely(__this_cpu_read(softlockup_touch_sync))) {
/*
* If the time stamp was touched atomically
* make sure the scheduler tick is up to date.
*/
__this_cpu_write(softlockup_touch_sync, false);
sched_clock_tick();
}
update_report_ts();
return HRTIMER_RESTART;
}
/* Check for a softlockup. */
touch_ts = __this_cpu_read(watchdog_touch_ts);
duration = is_softlockup(touch_ts, period_ts, now);
if (unlikely(duration)) {
/*
* Prevent multiple soft-lockup reports if one cpu is already
* engaged in dumping all cpu back traces.
*/
if (softlockup_all_cpu_backtrace) {
if (test_and_set_bit_lock(0, &soft_lockup_nmi_warn))
return HRTIMER_RESTART;
}
/* Start period for the next softlockup warning. */
update_report_ts();
pr_emerg("BUG: soft lockup - CPU#%d stuck for %us! [%s:%d]\n",
smp_processor_id(), duration,
current->comm, task_pid_nr(current));
print_modules();
print_irqtrace_events(current);
if (regs)
show_regs(regs);
else
dump_stack();
if (softlockup_all_cpu_backtrace) {
trigger_allbutcpu_cpu_backtrace(smp_processor_id());
clear_bit_unlock(0, &soft_lockup_nmi_warn);
}
add_taint(TAINT_SOFTLOCKUP, LOCKDEP_STILL_OK);
if (softlockup_panic)
panic("softlockup: hung tasks");
}
return HRTIMER_RESTART;
}
1.2.1. Soft lockup检测原理
Soft lockup最核心的检测机制是利用高精度计时器hrtimer
和停机调度机制stop_machine
完成的。
Soft lockup会在每个核心上都会启用一个hrtimer,该hrtimer以4s(即1/5软锁阈值)为采样周期,产生一个硬中断,在该硬中断先后执行两件事:
- 先通过
stop_one_cpu_nowait
来请求本CPU的停机调度线程migration执行时间戳打点操作。 - 随后,硬中断检测距离上一次打点是否超过20s,如果超过则报Soft lockup警告。
为什么这么做?
首先了解migration进程是什么。
migration是一个停机调度进程,每个cpu都有一个,其是整个系统中最高优先级的进程,具备自停车(self parking)特性。虽然其有最高优先级,但不直接抢占进程,而是等待当前CPU进入下一个调度点时,按优先级调度该进程。其没有时间片的概念,只要不主动让出cpu,其将一直霸占cpu。
如果说migration这个最高优先级的进程都无法被正常调度,说明该cpu处于某种异常的状态,导致始终无法调度task。此时可能得原因有:
- 长时间关抢占,无法调度进程
- 中断执行久、中断嵌套多、中断风暴等,无法调度进程
- 锁异常,如spinlock持锁过久(也是关了抢占),死锁等
如果说hrtimer产生的硬中断或者migration进程任何一环出现了异常,导致打点或检测时机被延迟了,就会由内核报出Soft lockup异常。
调度类 | 调度策略 | 优先级 | 抢占能力 | 典型应用场景 |
---|---|---|---|---|
停机调度类(Stop Class) | N/A |
最高 | 不抢占,等待当前任务执行调度点触发后才会调度 | 内核管理任务、CPU 迁移、热插拔等 |
限期调度类(Deadline Class) | SCHED_DEADLINE |
较高 | 抢占所有实时调度类、公平调度类、空闲调度类,以及同类低优先级进程 | 工业控制、音视频处理、自动驾驶等 |
实时调度类(Real-Time Class) | SCHED_FIFO 、SCHED_RR |
中高 | 抢占所有公平调度类、空闲调度类、以及同类低优先级进程 | 实时音视频、低延迟任务、数据采集等 |
公平调度类(Fair Class) | SCHED_NORMAL 、SCHED_BATCH |
普通 | 只能抢占同类低优先级进程 | 普通用户应用程序、批处理任务等 |
空闲调度类(Idle Class) | SCHED_IDLE |
最低 | 无法抢占任何其他进程 | 系统空闲时执行的任务、后台维护任务 |
也就是说,CPU会定期执行下面的操作: