Sholck

不积跬步,无以至千里.不积小流,无以成江海

0%

crash库的开源提交

crash库的开源提交

背景

在学习crash分析kernel 5.17.0+ dump时,遇到一些问题

解析内核转储时异常 crash工具coredump

使用github的crash最新包下载分析失败,提示如下错误

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
5.17.0+ crash vmlinux vmcore

crash 8.0.0
...
GNU gdb (GDB) 10.2
...
please wait... (gathering kmem slab cache data)
crash: invalid structure member offset: kmem_cache_s_num
FILE: memory.c LINE: 9619 FUNCTION: kmem_cache_init()

[/usr/bin/crash] error trace: 5390ae => 51123c => 5d2a2a => 5d299c

5d299c: OFFSET_verify.part.36+92
5d2a2a: OFFSET_verify+58
51123c: kmem_cache_init+316
5390ae: vm_init+7086

根据gdb断点设置,发现在PERCPU_KMALLOC_V2中对kmem_cache结构体计算num成员的偏移,内核中kmem_cache结构体在SLAB和SLUB中都有配置

1
2
3
4
5
6
7
8
9
//in kernel mm/slab.h:

#ifdef CONFIG_SLAB
#include <linux/slab_def.h>
#endif

#ifdef CONFIG_SLUB
#include <linux/slub_def.h>
#endif

但是SLAB中kmem_cache有成员num, 而SLUB没有,因此在MEMBER_OFFSET_INIT(kmem_cache_s_num, "kmem_cache", "num")OFFSET(kmem_cache_s_num)中计算offset_table.kmem_cache_s_num 为-1, 检查.config发现默认配置为CONFIG_SLUB=y,修改为CONFIG_SLAB=y,发现crash可以成功解析内核转储。

kmem -s/-S crash工具coredump

在执行kmem -s/kmem -S时(CONFIG_SLAB)遇到如下异常

1
2
3
4
5
6
7
8
9
kmem: invalid structure member offset: page_active
FILE: memory.c LINE: 12217 FUNCTION: verify_slab_overload_page()

[/usr/bin/crash] error trace: 52eb27 => 5323ed => 533401 => 60240b

60240b: OFFSET_verify+211
533401: verify_slab_overload_page+588
5323ed: do_slab_chain_slab_overload_page+1171
52eb27: dump_kmem_cache_percpu_v2+2779

追踪发现kernel提交 07f910f9b中已经将 s_mem, freelist, active等从page结构体中删除

问题处理

针对以上两个问题在github中提交一个issue invalid structure member offset: page_active #115,管理员给了两个已经入库的patch来给予解答,14f8c465f390ed

分析

patch1: 根据kernel中slab和slub下kmem_cache中成员的不同来区分两种情况,可以修复 invalid structure member offset: kmem_cache_s_num问题

patch2: 针对slub情况下因为linux page成员迁移到slab结构体的兼容

检查发现,crash包是延后github源代码时间点的,因此使用最新的源代码编译的crash(默认带symbol)解析内核转储成功(CONFIG_SLUB),但是kmem -s依旧失败

gdb调试分析

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
➜  rootfs gdb crash
(gdb) b memory.c:543
Breakpoint 1 at 0x50a355: file memory.c, line 543.
(gdb) b memory.c:12224
Breakpoint 2 at 0x533516: file memory.c, line 12225.

(gdb) run vmlinux_normal vmcore-normal

...
Thread 1 "crash" hit Breakpoint 1, vm_init () at memory.c:543
543 ANON_MEMBER_OFFSET_INIT(page_active, "page", "active");
(gdb) n
546 if (!VALID_STRUCT(kmem_slab_s) && VALID_STRUCT(slab_s)) {
(gdb) p offset_table.page_active
$1 = -1
(gdb) c
...
crash> kmem -s
[Detaching after fork from child process 28224]
CACHE OBJSIZE ALLOCATED TOTAL SLABS SSIZE NAME

Thread 1 "crash" hit Breakpoint 2, verify_slab_overload_page (si=0x7ffffffecb90,
last=18446637560712103560, s=0) at memory.c:12225
12225 active = UINT(page_buf + OFFSET(page_active));
(gdb) p offset_table.page_active
$2 = -1

(gdb) bt
#0 verify_slab_overload_page (si=0x7ffffffecb90, last=18446637560712103560, s=0) at memory.c:12225
#1 0x0000000000532526 in do_slab_chain_slab_overload_page (cmd=32768, si=0x7ffffffecb90)
at memory.c:11936
#2 0x000000000052ec60 in dump_kmem_cache_percpu_v2 (si=0x7ffffffecb90) at memory.c:10843
#3 0x0000000000519cc6 in cmd_kmem () at memory.c:5231
#4 0x00000000004f4719 in exec_command () at main.c:892
#5 0x00000000004f490a in main_loop () at main.c:839
#6 0x000000000081d66d in captured_main (data=data@entry=0x7fffffffd980) at main.c:1284
#7 gdb_main (args=args@entry=0x7fffffffd9a0) at main.c:1313
#8 0x000000000081d735 in gdb_main_entry (argc=<optimized out>, argv=argv@entry=0x7fffffffdb28)
at main.c:1338
#9 0x000000000059a4aa in gdb_main_loop (argc=<optimized out>, argc@entry=3,
argv=argv@entry=0x7fffffffdb28) at gdb_interface.c:81
#10 0x00000000004ed7a3 in main (argc=3, argv=0x7fffffffdb28) at main.c:720


(gdb) n

kmem: invalid structure member offset: page_active
FILE: memory.c LINE: 12225 FUNCTION: verify_slab_overload_page()

[/usr/bin/crash] error trace: 532526 => 53353a => 5e0a6a => 5e09dc
[Detaching after fork from child process 28299]

5e09dc: OFFSET_verify.part.36+92
[Detaching after fork from child process 28301]
5e0a6a: OFFSET_verify+58
[Detaching after fork from child process 28303]
53353a: verify_slab_overload_page+588
[Detaching after fork from child process 28305]
532526: do_slab_chain_slab_overload_page+1171

patch修复

针对CONFIG_SLAB 在kernel 5.17.0+ 下,未正确解析page_active 通过mailing list提交patch,通过check并更新合入,
crash-utility/crash@b89f9cc