Sholck

不积跬步,无以至千里.不积小流,无以成江海

0%

sysdump学习

sysdump学习

简介

sysdump为dump system memory,也称内核转储。在内核发生异常时,通过kexec和kdump,将内存和寄存器信息保存,在重启以后的uboot阶段将信息转储到文件中。我们能够通过转储文件发现异常发生的根本原因。


内核异常发生情况

内核调用panic来进行内核转储,大部分情况分为以下几种:

  1. 代码简单逻辑异常直接或节间造成的panic调用, 比如空指针
  2. 代码逻辑异常造成的任务调度异常,其他任务无法抢占CPU
  3. 代码逻辑异常造成的中断关闭,比如中断处理中死锁
  4. 中断频率远超时钟中断
  5. 内存踩踏导致的非法地址访问

下面是一些常见的触发场景:

驱动调用

驱动调用panic,比如sysrq触发

1
2
echo 1 >/proc/sys/kernel/sysrq
echo c > /proc/sysrq-trigger

die函数

BUG/BUG_ON

通过调用非法指令(未定义指令),空指针等进入die调用,或者直接调用panic

page_fault

访问非法地址,可能发生内存踩踏,通过RIP异常抛出入手分析

soft lockup

每一个cpu上都有一个hrtime中断监控的watchdog线程,即hrimte触发的中断上下文会去检测watchdog线程,如果检测到该线程在设定时间内没有调度过,说明该cpu被长时间被占用,调度已经发生异常。原因:==进程上下文关闭抢占==。依赖中断,==如果中断被关闭,则此功能无法正常工作==

hard lockup

一个cpu如果检测相邻的==cpu中断被关闭==就主动panic,见function watchdog_check_hardlockup_other_cpu,因此在panic中异常中断关闭的cpu也无法处理处理器中断,因此无法刷新缓冲,==sysdump信息因此不可靠==

硬件喂狗

单线程喂狗

需要一个喂狗线程定期对硬件狗操作,如果对硬件狗的操作延误,会导致触发硬狗中断,cpu收到中断在中断上下文进行panic调用

  1. 如果硬狗中断被关闭,则功能失效。(==因此硬狗中断需要为不可屏蔽中断==)

  2. 如果硬狗中断被CPU响应,但是由于异常的cpu中断被关闭,因为异常的cpu缓存无法刷新,sysdump信息因此也不可靠

  3. 如果一个cpu异常,其他cpu依旧正常喂狗,则监测功能失效

这种设计存在以下缺陷:

  1. 假设CPU0异常,但是其他cpu一直在喂狗,因此无法监控到全部的cpu
  2. 异常cpu因为关闭中断导致无法处理panic发出的处理器中断,因此导致cpu缓存无法刷新

多线程绑定cpu喂狗

将cpu和线程绑定,cpu去操作一个全局变量,每一个cpu对应其中的一位,当线程醒来就置1,当全局置1的个数和cpu 正在使用的个数相等时,代表全部cpu都喂狗对应的狗,此时才会去喂硬狗,可以bitmap控制。

转储内核分析

在uboot将转储文件导出以后,我们需要通过这些文件配合内核符号映射表vmlinux进行异常现场分析

需要确定vmlinux和转储文件的kernel版本保持一致,编译环境可能是gcc,也有可能是clang

1
2
3
4
5
6
7
8
9
10
11
12
13
14
$ strings vmlinux | grep "Linux version"                
Linux version 4.14.199+-ab131 clang version 11.0.1

$ strings sysdump.core | grep "Linux version" >>sysdump.core为内核转储后合并的文件
Linux version 4.14.199+-ab131 clang version 11.0.1


或者gcc编译
➜ 5.17.0+ strings vmlinux | grep "Linux version"
Linux version 5.17.0+ (root@sholck) (gcc (Ubuntu 6.5.0-2ubuntu1~16.04) 6.5.0 20181026, GNU ld (GNU Binutils for Ubuntu) 2.26.1) #7 SMP PREEMPT_DYNAMIC Thu Mar 24 14:58:15 CST 2022

➜ 5.17.0+ strings vmcore | grep "Linux version"
Linux version 5.17.0+ (root@sholck) (gcc (Ubuntu 6.5.0-2ubuntu1~16.04) 6.5.0 20181026, GNU ld (GNU Binutils for Ubuntu) 2.26.1) #7 SMP PREEMPT_DYNAMIC Thu Mar 24 14:58:15 CST 2022

之后通过crash工具进行现场分析

crash

安装

  1. 官网:https://crash-utility.github.io

  2. android 项目自带,需要在vendor/xxx/tools目录下检查平台基线是否自带

  3. ubuntu本地:sudo apt-get install linux-crashdump (推荐)

启动

参数需要设置物理内存起始地址,内核符号映射表,转储内核

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
$ crash_arm64 -m phys_offset=0x80000000 vmlinux sysdump.core --cpus 8

KERNEL: vmlinux
DUMPFILE: all
CPUS: 8 [OFFLINE: 7]
DATE: Tue Dec 14 05:44:33 2021
UPTIME: 00:02:41
LOAD AVERAGE: 5.99, 2.96, 1.16
TASKS: 1930
NODENAME: localhost
RELEASE: 4.14.199+-ab131
VERSION: #1 SMP PREEMPT Tue Dec 14 03:15:03 CST 2021
MACHINE: aarch64 (unknown Mhz)
MEMORY: 4 GB
PANIC: "Kernel panic - not syncing: sysrq triggered crash" >>sysrq触发
PID: 7058
COMMAND: "sh"
TASK: ffffffc06a47da00 [THREAD_INFO: ffffffc06a47da00]
CPU: 3
STATE: TASK_RUNNING (PANIC)

常用命令

在启动的TUI窗口中,shell命令是可以执行的

ps

打印内核中进程的状态,包括tasklet等

1
2
3
crash_arm64> ps >ps.txt
crash_arm64> ps | grep -n "ffffffc06a47da00"
1930:> 7058 6808 3 ffffffc06a47da00 RU 0.0 10771096 2696 sh
bt

打印堆栈,默认打印crash cpu的堆栈

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
crash_arm64> bt
PID: 7058 TASK: ffffffc06a47da00 CPU: 3 COMMAND: "sh"
#0 [ffffff801582b950] sysdump_panic_event$8bfd56c0834fe7d208b7e7c52872c4e4 at ffffff80087a218c
#1 [ffffff801582b9b0] $x.25 at ffffff800826e600
#2 [ffffff801582bc50] panic at ffffff800823a00c
#3 [ffffff801582bcb0] sysrq_handle_crash$330e89e9e6de65c311d08fb99226844d at ffffff80087d3d14
#4 [ffffff801582bcc0] __handle_sysrq at ffffff80087d3768
#5 [ffffff801582bd10] $x.53 at ffffff80087d4dc0
#6 [ffffff801582bd60] proc_reg_write$5fc6da0b4e1b06391acfa8bd9d90410e at ffffff80084fa0a4
#7 [ffffff801582be10] __vfs_write at ffffff8008454d78
#8 [ffffff801582be40] vfs_write at ffffff80084551b0
#9 [ffffff801582be90] sys_write at ffffff800845540c
#10 [ffffff801582bff0] el0_svc_naked at ffffff80080844bc
PC: 00000075815348c8 LR: 0000005f4b89b63c SP: 0000007ff5c9a450
X29: 0000007ff5c9a4d0 X28: 0000007ff5c9a490 X27: 0000005f4b8c02e8
X26: 00000075816fe000 X25: 0000000000000063 X24: 0000005f4b89e99c
X23: 0000005f4b8bf640 X22: 0000007ff5c9a4b0 X21: 0000007ff5c9a4a8
X20: b4000073f13a5328 X19: 0000000000000002 X18: 0000007581866000
X17: 00000075815348c0 X16: 0000007581552ef8 X15: 000000000000002f
X14: 0000000000000072 X13: f790000000012102 X12: 0000000032d3dd57
X11: 00000000f13a53bc X10: b4000073f13a53a8 X9: b4000073f13a5328
X8: 0000000000000040 X7: 0000000000000000 X6: 0000000000000063
X5: b4000073813a469a X4: ffffffffffffffff X3: ffffffffffffffff
X2: 0000000000000002 X1: b4000073f13a5328 X0: 0000000000000001
ORIG_X0: 0000000000000001 SYSCALLNO: 40 PSTATE: 00001000
log

打印kernel最后log buffer保存的信息,一般也会包含堆栈和寄存器信息,分析可以对异常有一个大概的了解

1
crash_arm64> log >log.txt
runq

查看线程列表

1
2
3
4
5
6
7
crash_arm64> runq -c 3  >>指定cpu3
CPU 3 RUNQUEUE: ffffffc0ffec6700
CURRENT: PID: 7058 TASK: ffffffc06a47da00 COMMAND: "sh"
RT PRIO_ARRAY: ffffffc0ffec6868
[no tasks queued]
CFS RB_ROOT: ffffffc0ffec6790
[no tasks queued]
irq

查看中断数据

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
crash_arm64> irq
IRQ IRQ_DESC/_DATA IRQACTION NAME
0 (unused) (unused)
1 ffffffc0fa821600 (unused)
2 ffffffc0fa821800 ffffffc0faaa3d80 "/soc/aon/timer@64470000"
3 ffffffc0fa821a00 (unused)
4 ffffffc0fa821c00 ffffffc0faaa4180 "arch_timer" >>定时器中断

crash_arm64> irq 4
IRQ IRQ_DESC/_DATA IRQACTION NAME
4 ffffffc0fa821c00 ffffffc0faaa4180 "arch_timer"

crash_arm64> irqaction ffffffc0faaa4180
struct irqaction {
handler = 0xffffff8008f9a078 <arch_timer_handler_phys$5757c1f5416e78392ea0a8126822dd28.cfi_jt>, >>对应中断模块
dev_id = 0x0,
percpu_dev_id = 0xffffff8009761480,
next = 0x0,
thread_fn = 0x0,
thread = 0x0,
secondary = 0x0,
irq = 4,
flags = 17412,
thread_flags = 0,
thread_mask = 0,
name = 0xffffff8009304dca "arch_timer",
dir = 0x0
}

查看各中断触发在cpu上触发次数,也可以看出中断控制器驱动版本GICv3

1
2
3
4
crash_arm64> irq -s
CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 CPU6 CPU7
2: 5040 2761 2996 2917 4043 4225 999 1438 GICv3 /soc/aon/timer@64470000
4: 35885 27954 25117 23517 22156 20163 29762 28382 GICv3 arch_timer
struct

查看数据结构成员,并打印结构体大小,默认下直接输出结构名即可打印

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
crash_arm64> thread_struct  >>直接输入结构体即可
struct thread_struct {
struct cpu_context cpu_context;
unsigned long tp_value;
unsigned long tp2_value;
struct fpsimd_state fpsimd_state;
unsigned long fault_address;
unsigned long fault_code;
struct debug_info debug;
}
SIZE: 960

crash_arm64> struct thread_struct -o >>可以加选项-o(显示成员偏移)或者-x
struct thread_struct {
[0] struct cpu_context cpu_context;
[104] unsigned long tp_value;
[112] unsigned long tp2_value;
[128] struct fpsimd_state fpsimd_state;
[672] unsigned long fault_address;
[680] unsigned long fault_code;
[688] struct debug_info debug;
}
SIZE: 960
rd

打印指定内存地址的信息

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
crash_arm64> sysrq_crash_op
sysrq_crash_op = $23 = {
handler = 0xffffff8008f9a8fc <sysrq_handle_crash$330e89e9e6de65c311d08fb99226844d.cfi_jt>,
help_msg = 0xffffff8009364302 "crash(c)",
action_msg = 0xffffff800931823a "Trigger a crash",
enable_mask = 8
}
crash_arm64> sysrq_key_op -o
struct sysrq_key_op {
[0] void (*handler)(int);
[8] char *help_msg;
[16] char *action_msg;
[24] int enable_mask;
}
SIZE: 32


crash_arm64> rd 0xffffff8009364302 2
ffffff8009364302: 2963286873617263 28746f6f62657200 crash(c).reboot(

crash_arm64> rd 0xffffff800931823a 2
ffffff800931823a: 2072656767697254 0068736172632061 Trigger a crash.

crash_arm64> rd sysrq_crash_op 4
ffffff8009bbf798: ffffff8008f9a8fc ffffff8009364302 .........C6.....
ffffff8009bbf7a8: ffffff800931823a 0000000000000008 :.1.............

上面正确的解析为 0x63为c 0x72为r, 0x61为a,对应2963286873617263中的后三个字节

dis

反编译

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
crash_arm64> dis msleep -lx
/code/bsp/kernel/kernel4.14/kernel/time/timer.c: 1919
0xffffff80082e8bdc <$x.165>: stp x29, x30, [sp,#-32]!
0xffffff80082e8be0 <msleep+0x4>: stp x20, x19, [sp,#16]
0xffffff80082e8be4 <msleep+0x8>: mov x29, sp
0xffffff80082e8be8 <msleep+0xc>: mov w19, w0
0xffffff80082e8bec <msleep+0x10>: nop
/code/bsp/kernel/kernel4.14/kernel/time/time.c: 616
0xffffff80082e8bf0 <msleep+0x14>: mov w8, w19
/code/bsp/kernel/kernel4.14/kernel/time/timer.c: 1920
0xffffff80082e8bf4 <msleep+0x18>: cmp w19, #0x0
/code/bsp/kernel/kernel4.14/kernel/time/time.c: 616
0xffffff80082e8bf8 <msleep+0x1c>: add x8, x8, #0x3
0xffffff80082e8bfc <msleep+0x20>: mov x9, #0x3fffffffffffffff // #4611686018427387903
0xffffff80082e8c00 <msleep+0x24>: lsr x8, x8, #2
/code/bsp/kernel/kernel4.14/kernel/time/timer.c: 1920
0xffffff80082e8c04 <msleep+0x28>: csinc x0, x9, x8, lt
0xffffff80082e8c08 <msleep+0x2c>: mov w19, #0x2 // #2
0xffffff80082e8c0c <msleep+0x30>: mrs x20, sp_el0
g/code/bsp/kernel/kernel4.14/kernel/time/timer.c: 1793
0xffffff80082e8c10 <msleep+0x34>: str x19, [x20,#32]
/code/bsp/kernel/kernel4.14/kernel/time/timer.c: 1794
0xffffff80082e8c14 <msleep+0x38>: bl 0xffffff8008f7a4d4 <$x.136>
/code/bsp/kernel/kernel4.14/kernel/time/timer.c: 1922
0xffffff80082e8c18 <msleep+0x3c>: cbnz x0, 0xffffff80082e8c10 <msleep+0x34>
/code/bsp/kernel/kernel4.14/kernel/time/timer.c: 1924
0xffffff80082e8c1c <msleep+0x40>: ldp x20, x19, [sp,#16]
0xffffff80082e8c20 <msleep+0x44>: ldp x29, x30, [sp],#32
0xffffff80082e8c24 <msleep+0x48>: ret
vtop

虚拟地址到物理地址的映射

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
crash_arm64> vtop 0xffffff8009364302
VIRTUAL PHYSICAL
ffffff8009364302 81364302

PAGE DIRECTORY: ffffff800a063000
PGD: ffffff800a063000 => 17fffd003
PMD: ffffffc0ffffd248 => 81200791 页中间目录,二级页表索引中可以称为页表,存放PTE页表项
PAGE: 81200000 (2MB) 页表项+偏移

PTE PHYSICAL FLAGS
81200791 81200000 (VALID|RDONLY|SHARED|AF)

PAGE PHYSICAL MAPPING INDEX CNT FLAGS
ffffffbf0004d900 81364000 0 0 1 1000 reserved


crash_arm64> rd -p 81364302 打印物理地址内的内存信息
81364302: 2963286873617263 crash(c)
sym

功能:映射转化

打印全部的符号表:

1
2
crash_arm64> sym -l | grep -n "sysrq_crash_op"
265279:ffffff8009bbf798 (d) sysrq_crash_op

符号转化为虚拟地址

缩略语

  1. FIQ Fast Interrupt Request 快速中断模式
  2. NMI No-Maskable Interrupt 不可屏蔽中断
  3. IPI Inter-Process Interrupt 处理器中断
  4. GIC Generic Interrupt Controller 中断控制器
  5. SGI Software Generated Interrupt 软件触发中断,也称为IPI中断
  6. ISR interrupt service routine 中断服务程序
  7. FIQ 快速中断