sysdump学习
简介
sysdump为dump system memory,也称内核转储。在内核发生异常时,通过kexec和kdump,将内存和寄存器信息保存,在重启以后的uboot阶段将信息转储到文件中。我们能够通过转储文件发现异常发生的根本原因。
内核异常发生情况
内核调用panic来进行内核转储,大部分情况分为以下几种:
- 代码简单逻辑异常直接或节间造成的panic调用, 比如空指针
- 代码逻辑异常造成的任务调度异常,其他任务无法抢占CPU
- 代码逻辑异常造成的中断关闭,比如中断处理中死锁
- 中断频率远超时钟中断
- 内存踩踏导致的非法地址访问
- …
下面是一些常见的触发场景:
驱动调用
驱动调用panic,比如sysrq触发
1 2
| echo 1 >/proc/sys/kernel/sysrq echo c > /proc/sysrq-trigger
|
die函数
BUG/BUG_ON
通过调用非法指令(未定义指令),空指针等进入die调用,或者直接调用panic
page_fault
访问非法地址,可能发生内存踩踏,通过RIP异常抛出入手分析
soft lockup
每一个cpu上都有一个hrtime中断监控的watchdog线程,即hrimte触发的中断上下文会去检测watchdog线程,如果检测到该线程在设定时间内没有调度过,说明该cpu被长时间被占用,调度已经发生异常。原因:==进程上下文关闭抢占==。依赖中断,==如果中断被关闭,则此功能无法正常工作==
hard lockup
一个cpu如果检测相邻的==cpu中断被关闭==就主动panic,见function watchdog_check_hardlockup_other_cpu,因此在panic中异常中断关闭的cpu也无法处理处理器中断,因此无法刷新缓冲,==sysdump信息因此不可靠==
硬件喂狗
单线程喂狗
需要一个喂狗线程定期对硬件狗操作,如果对硬件狗的操作延误,会导致触发硬狗中断,cpu收到中断在中断上下文进行panic调用
如果硬狗中断被关闭,则功能失效。(==因此硬狗中断需要为不可屏蔽中断==)
如果硬狗中断被CPU响应,但是由于异常的cpu中断被关闭,因为异常的cpu缓存无法刷新,sysdump信息因此也不可靠
如果一个cpu异常,其他cpu依旧正常喂狗,则监测功能失效
这种设计存在以下缺陷:
- 假设CPU0异常,但是其他cpu一直在喂狗,因此无法监控到全部的cpu
- 异常cpu因为关闭中断导致无法处理panic发出的处理器中断,因此导致cpu缓存无法刷新
多线程绑定cpu喂狗
将cpu和线程绑定,cpu去操作一个全局变量,每一个cpu对应其中的一位,当线程醒来就置1,当全局置1的个数和cpu 正在使用的个数相等时,代表全部cpu都喂狗对应的狗,此时才会去喂硬狗,可以bitmap控制。
转储内核分析
在uboot将转储文件导出以后,我们需要通过这些文件配合内核符号映射表vmlinux进行异常现场分析
需要确定vmlinux和转储文件的kernel版本保持一致,编译环境可能是gcc,也有可能是clang
1 2 3 4 5 6 7 8 9 10 11 12 13 14
| $ strings vmlinux | grep "Linux version" Linux version 4.14.199+-ab131 clang version 11.0.1
$ strings sysdump.core | grep "Linux version" >>sysdump.core为内核转储后合并的文件 Linux version 4.14.199+-ab131 clang version 11.0.1
或者gcc编译 ➜ 5.17.0+ strings vmlinux | grep "Linux version" Linux version 5.17.0+ (root@sholck) (gcc (Ubuntu 6.5.0-2ubuntu1~16.04) 6.5.0 20181026, GNU ld (GNU Binutils for Ubuntu) 2.26.1) #7 SMP PREEMPT_DYNAMIC Thu Mar 24 14:58:15 CST 2022
➜ 5.17.0+ strings vmcore | grep "Linux version" Linux version 5.17.0+ (root@sholck) (gcc (Ubuntu 6.5.0-2ubuntu1~16.04) 6.5.0 20181026, GNU ld (GNU Binutils for Ubuntu) 2.26.1) #7 SMP PREEMPT_DYNAMIC Thu Mar 24 14:58:15 CST 2022
|
之后通过crash工具进行现场分析
crash
安装
官网:https://crash-utility.github.io
android 项目自带,需要在vendor/xxx/tools目录下检查平台基线是否自带
ubuntu本地:sudo apt-get install linux-crashdump (推荐)
启动
参数需要设置物理内存起始地址,内核符号映射表,转储内核
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
| $ crash_arm64 -m phys_offset=0x80000000 vmlinux sysdump.core --cpus 8
KERNEL: vmlinux DUMPFILE: all CPUS: 8 [OFFLINE: 7] DATE: Tue Dec 14 05:44:33 2021 UPTIME: 00:02:41 LOAD AVERAGE: 5.99, 2.96, 1.16 TASKS: 1930 NODENAME: localhost RELEASE: 4.14.199+-ab131 VERSION: #1 SMP PREEMPT Tue Dec 14 03:15:03 CST 2021 MACHINE: aarch64 (unknown Mhz) MEMORY: 4 GB PANIC: "Kernel panic - not syncing: sysrq triggered crash" >>sysrq触发 PID: 7058 COMMAND: "sh" TASK: ffffffc06a47da00 [THREAD_INFO: ffffffc06a47da00] CPU: 3 STATE: TASK_RUNNING (PANIC)
|
常用命令
在启动的TUI窗口中,shell命令是可以执行的
ps
打印内核中进程的状态,包括tasklet等
1 2 3
| crash_arm64> ps >ps.txt crash_arm64> ps | grep -n "ffffffc06a47da00" 1930:> 7058 6808 3 ffffffc06a47da00 RU 0.0 10771096 2696 sh
|
bt
打印堆栈,默认打印crash cpu的堆栈
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
| crash_arm64> bt PID: 7058 TASK: ffffffc06a47da00 CPU: 3 COMMAND: "sh" #0 [ffffff801582b950] sysdump_panic_event$8bfd56c0834fe7d208b7e7c52872c4e4 at ffffff80087a218c #1 [ffffff801582b9b0] $x.25 at ffffff800826e600 #2 [ffffff801582bc50] panic at ffffff800823a00c #3 [ffffff801582bcb0] sysrq_handle_crash$330e89e9e6de65c311d08fb99226844d at ffffff80087d3d14 #4 [ffffff801582bcc0] __handle_sysrq at ffffff80087d3768 #5 [ffffff801582bd10] $x.53 at ffffff80087d4dc0 #6 [ffffff801582bd60] proc_reg_write$5fc6da0b4e1b06391acfa8bd9d90410e at ffffff80084fa0a4 #7 [ffffff801582be10] __vfs_write at ffffff8008454d78 #8 [ffffff801582be40] vfs_write at ffffff80084551b0 #9 [ffffff801582be90] sys_write at ffffff800845540c #10 [ffffff801582bff0] el0_svc_naked at ffffff80080844bc PC: 00000075815348c8 LR: 0000005f4b89b63c SP: 0000007ff5c9a450 X29: 0000007ff5c9a4d0 X28: 0000007ff5c9a490 X27: 0000005f4b8c02e8 X26: 00000075816fe000 X25: 0000000000000063 X24: 0000005f4b89e99c X23: 0000005f4b8bf640 X22: 0000007ff5c9a4b0 X21: 0000007ff5c9a4a8 X20: b4000073f13a5328 X19: 0000000000000002 X18: 0000007581866000 X17: 00000075815348c0 X16: 0000007581552ef8 X15: 000000000000002f X14: 0000000000000072 X13: f790000000012102 X12: 0000000032d3dd57 X11: 00000000f13a53bc X10: b4000073f13a53a8 X9: b4000073f13a5328 X8: 0000000000000040 X7: 0000000000000000 X6: 0000000000000063 X5: b4000073813a469a X4: ffffffffffffffff X3: ffffffffffffffff X2: 0000000000000002 X1: b4000073f13a5328 X0: 0000000000000001 ORIG_X0: 0000000000000001 SYSCALLNO: 40 PSTATE: 00001000
|
log
打印kernel最后log buffer保存的信息,一般也会包含堆栈和寄存器信息,分析可以对异常有一个大概的了解
1
| crash_arm64> log >log.txt
|
runq
查看线程列表
1 2 3 4 5 6 7
| crash_arm64> runq -c 3 >>指定cpu3 CPU 3 RUNQUEUE: ffffffc0ffec6700 CURRENT: PID: 7058 TASK: ffffffc06a47da00 COMMAND: "sh" RT PRIO_ARRAY: ffffffc0ffec6868 [no tasks queued] CFS RB_ROOT: ffffffc0ffec6790 [no tasks queued]
|
irq
查看中断数据
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
| crash_arm64> irq IRQ IRQ_DESC/_DATA IRQACTION NAME 0 (unused) (unused) 1 ffffffc0fa821600 (unused) 2 ffffffc0fa821800 ffffffc0faaa3d80 "/soc/aon/timer@64470000" 3 ffffffc0fa821a00 (unused) 4 ffffffc0fa821c00 ffffffc0faaa4180 "arch_timer" >>定时器中断 crash_arm64> irq 4 IRQ IRQ_DESC/_DATA IRQACTION NAME 4 ffffffc0fa821c00 ffffffc0faaa4180 "arch_timer" crash_arm64> irqaction ffffffc0faaa4180 struct irqaction { handler = 0xffffff8008f9a078 <arch_timer_handler_phys$5757c1f5416e78392ea0a8126822dd28.cfi_jt>, >>对应中断模块 dev_id = 0x0, percpu_dev_id = 0xffffff8009761480, next = 0x0, thread_fn = 0x0, thread = 0x0, secondary = 0x0, irq = 4, flags = 17412, thread_flags = 0, thread_mask = 0, name = 0xffffff8009304dca "arch_timer", dir = 0x0 }
|
查看各中断触发在cpu上触发次数,也可以看出中断控制器驱动版本GICv3
1 2 3 4
| crash_arm64> irq -s CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 CPU6 CPU7 2: 5040 2761 2996 2917 4043 4225 999 1438 GICv3 /soc/aon/timer@64470000 4: 35885 27954 25117 23517 22156 20163 29762 28382 GICv3 arch_timer
|
struct
查看数据结构成员,并打印结构体大小,默认下直接输出结构名即可打印
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
| crash_arm64> thread_struct >>直接输入结构体即可 struct thread_struct { struct cpu_context cpu_context; unsigned long tp_value; unsigned long tp2_value; struct fpsimd_state fpsimd_state; unsigned long fault_address; unsigned long fault_code; struct debug_info debug; } SIZE: 960
crash_arm64> struct thread_struct -o >>可以加选项-o(显示成员偏移)或者-x struct thread_struct { [0] struct cpu_context cpu_context; [104] unsigned long tp_value; [112] unsigned long tp2_value; [128] struct fpsimd_state fpsimd_state; [672] unsigned long fault_address; [680] unsigned long fault_code; [688] struct debug_info debug; } SIZE: 960
|
rd
打印指定内存地址的信息
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
| crash_arm64> sysrq_crash_op sysrq_crash_op = $23 = { handler = 0xffffff8008f9a8fc <sysrq_handle_crash$330e89e9e6de65c311d08fb99226844d.cfi_jt>, help_msg = 0xffffff8009364302 "crash(c)", action_msg = 0xffffff800931823a "Trigger a crash", enable_mask = 8 } crash_arm64> sysrq_key_op -o struct sysrq_key_op { [0] void (*handler)(int); [8] char *help_msg; [16] char *action_msg; [24] int enable_mask; } SIZE: 32
crash_arm64> rd 0xffffff8009364302 2 ffffff8009364302: 2963286873617263 28746f6f62657200 crash(c).reboot(
crash_arm64> rd 0xffffff800931823a 2 ffffff800931823a: 2072656767697254 0068736172632061 Trigger a crash.
crash_arm64> rd sysrq_crash_op 4 ffffff8009bbf798: ffffff8008f9a8fc ffffff8009364302 .........C6..... ffffff8009bbf7a8: ffffff800931823a 0000000000000008 :.1.............
|
上面正确的解析为 0x63为c 0x72为r, 0x61为a,对应2963286873617263中的后三个字节
dis
反编译
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
| crash_arm64> dis msleep -lx /code/bsp/kernel/kernel4.14/kernel/time/timer.c: 1919 0xffffff80082e8bdc <$x.165>: stp x29, x30, [sp,#-32]! 0xffffff80082e8be0 <msleep+0x4>: stp x20, x19, [sp,#16] 0xffffff80082e8be4 <msleep+0x8>: mov x29, sp 0xffffff80082e8be8 <msleep+0xc>: mov w19, w0 0xffffff80082e8bec <msleep+0x10>: nop /code/bsp/kernel/kernel4.14/kernel/time/time.c: 616 0xffffff80082e8bf0 <msleep+0x14>: mov w8, w19 /code/bsp/kernel/kernel4.14/kernel/time/timer.c: 1920 0xffffff80082e8bf4 <msleep+0x18>: cmp w19, #0x0 /code/bsp/kernel/kernel4.14/kernel/time/time.c: 616 0xffffff80082e8bf8 <msleep+0x1c>: add x8, x8, #0x3 0xffffff80082e8bfc <msleep+0x20>: mov x9, #0x3fffffffffffffff // #4611686018427387903 0xffffff80082e8c00 <msleep+0x24>: lsr x8, x8, #2 /code/bsp/kernel/kernel4.14/kernel/time/timer.c: 1920 0xffffff80082e8c04 <msleep+0x28>: csinc x0, x9, x8, lt 0xffffff80082e8c08 <msleep+0x2c>: mov w19, #0x2 // #2 0xffffff80082e8c0c <msleep+0x30>: mrs x20, sp_el0 g/code/bsp/kernel/kernel4.14/kernel/time/timer.c: 1793 0xffffff80082e8c10 <msleep+0x34>: str x19, [x20,#32] /code/bsp/kernel/kernel4.14/kernel/time/timer.c: 1794 0xffffff80082e8c14 <msleep+0x38>: bl 0xffffff8008f7a4d4 <$x.136> /code/bsp/kernel/kernel4.14/kernel/time/timer.c: 1922 0xffffff80082e8c18 <msleep+0x3c>: cbnz x0, 0xffffff80082e8c10 <msleep+0x34> /code/bsp/kernel/kernel4.14/kernel/time/timer.c: 1924 0xffffff80082e8c1c <msleep+0x40>: ldp x20, x19, [sp,#16] 0xffffff80082e8c20 <msleep+0x44>: ldp x29, x30, [sp],#32 0xffffff80082e8c24 <msleep+0x48>: ret
|
vtop
虚拟地址到物理地址的映射
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
| crash_arm64> vtop 0xffffff8009364302 VIRTUAL PHYSICAL ffffff8009364302 81364302
PAGE DIRECTORY: ffffff800a063000 PGD: ffffff800a063000 => 17fffd003 PMD: ffffffc0ffffd248 => 81200791 页中间目录,二级页表索引中可以称为页表,存放PTE页表项 PAGE: 81200000 (2MB) 页表项+偏移
PTE PHYSICAL FLAGS 81200791 81200000 (VALID|RDONLY|SHARED|AF)
PAGE PHYSICAL MAPPING INDEX CNT FLAGS ffffffbf0004d900 81364000 0 0 1 1000 reserved
crash_arm64> rd -p 81364302 打印物理地址内的内存信息 81364302: 2963286873617263 crash(c)
|
sym
功能:映射转化
打印全部的符号表:
1 2
| crash_arm64> sym -l | grep -n "sysrq_crash_op" 265279:ffffff8009bbf798 (d) sysrq_crash_op
|
符号转化为虚拟地址
缩略语
- FIQ Fast Interrupt Request 快速中断模式
- NMI No-Maskable Interrupt 不可屏蔽中断
- IPI Inter-Process Interrupt 处理器中断
- GIC Generic Interrupt Controller 中断控制器
- SGI Software Generated Interrupt 软件触发中断,也称为IPI中断
- ISR interrupt service routine 中断服务程序
- FIQ 快速中断