本文结合异常T32栈回溯、Oops打印以及代码,分析打印log,加深对Oops的理解,有助于快速定位问题解决问题。
当内存访问异常时,触发__dabt_svc异常向量处理,进入do_DataAbort进行处理。
从_dabt_svc到do_DataAbort流程,可以参考do_DataAbort。
从do_DataAbort开始,fsr_fs()根据fsr找到fsr_info中的处理函数。
asmlinkage void __exceptiondo_DataAbort(unsignedlong addr, unsignedint fsr, struct pt_regs *regs){const struct fsr_info *inf = fsr_info + fsr_fs(fsr);struct siginfo info;if (!inf->fn(addr, fsr & ~FSR_LNX_PF, regs))------------------这里根据fsr从fsr_info中找打对应的操作函数。return;...}staticinlineintfsr_fs(unsignedint fsr){return (fsr & FSR_FS3_0) | (fsr & FSR_FS4) >> 6;}
fsr_info列出了全部的错误类型,主要包括四种类型:section translation fault、page translation fault、section permission fault、page permission fault。
下面是一个Section Translation Fault错误实例的T32栈:
fsr=0x805,即100000000101,所以经过fsr_fs()处理返回值为101。
所以inf->fn即为do_translation_fault。
static struct fsr_info fsr_info[] = {...{ do_translation_fault, SIGSEGV, SEGV_MAPERR, "section translation fault" },{ do_bad, SIGBUS, 0, "external abort on linefetch" },{ do_page_fault, SIGSEGV, SEGV_MAPERR, "page translation fault" },...}
可以看出此错误的栈回溯,do_DataAbort根据异常地址、fsr、pt_regs,来判断异常发生在内核还是用户空间,当前状态是用户模式还是非用户模式,fsr用于确定错误处理函数。
__dabt_svc->do_DataAbort ->do_translation_fault ->do_bad_area ->__do_kernel_fault ->die
Section Translation Fault类型的错误处理函数是do_translation_fault。
staticint __kprobesdo_translation_fault(unsignedlong addr, unsignedint fsr,struct pt_regs *regs){unsigned int index;pgd_t *pgd, *pgd_k;pud_t *pud, *pud_k;pmd_t *pmd, *pmd_k;if (addr < TASK_SIZE)-------------------------------------TASK_SIZE是用户空间地址的顶部,所以do_page_fault是用户空间处理函数。return do_page_fault(addr, fsr, regs);if (user_mode(regs))--------------------------------------至此的地址都是内核空间,如果regs显式为用户空间。说明两者冲突,进入bad_area。goto bad_area;index = pgd_index(addr);pgd = cpu_get_pgd() + index;pgd_k = init_mm.pgd + index;if (pgd_none(*pgd_k))-------------------------------------pgd_none()返回0,所以不会进入bad_area。goto bad_area;if (!pgd_present(*pgd))set_pgd(pgd, *pgd_k);pud = pud_offset(pgd, addr);pud_k = pud_offset(pgd_k, addr);if (pud_none(*pud_k))-------------------------------------pud_none()同样返回0,不会进入bad_area。goto bad_area;if (!pud_present(*pud))set_pud(pud, *pud_k);pmd = pmd_offset(pud, addr);pmd_k = pmd_offset(pud_k, addr);#ifdef CONFIG_ARM_LPAE/** Only one hardware entry per PMD with LPAE.*/index = 0;#else/** On ARM one Linux PGD entry contains two hardware entries (see page* tables layout in pgtable.h). We normally guarantee that we always* fill both L1 entries. But create_mapping() doesn't follow the rule.* It can create inidividual L1 entries, so here we have to call* pmd_none() check for the entry really corresponded to address, not* for the first of pair.*/index = (addr >> SECTION_SHIFT) & 1;#endifif (pmd_none(pmd_k[index]))------------------------------如果此时pmd_k[index]为0,则为异常进入bad_area。goto bad_area;copy_pmd(pmd, pmd_k);return 0;bad_area:do_bad_area(addr, fsr, regs);return 0;}
如果确实是异常,进入do_bad_area()进行处理。分为user_mode和非user_mode两种模式分别进行处理。
user_mode处理较简单,发送SIGSEGV信号即可。
voiddo_bad_area(unsignedlong addr, unsignedint fsr, struct pt_regs *regs){struct task_struct *tsk = current;struct mm_struct *mm = tsk->active_mm;/** If we are in kernel mode at this point, we* have no context to handle this fault with.*/if (user_mode(regs))__do_user_fault(tsk, addr, fsr, SIGSEGV, SEGV_MAPERR, regs);else __do_kernel_fault(mm, addr, fsr, regs);}
其它模式交给__do_kernel_fault进行处理,调用流程和打印结果如下。
do_kernel_fault的主要工作是打印pte、pt_regs、栈等信息,帮助发现问题根源,核心函数是die。
__do_kernel_fault ->show_pte----------------------------------------------------1->die->__die->print_modules-------------------------------------------2 ->__show_regs---------------------------------------------3 ->dump_mem------------------------------------------------4 ->dump_backtrace------------------------------------------5 ->dump_instr----------------------------------------------6->panic-----------------------------------------------------7
下面是打印结果,结合代码和打印信息进行分析如下:
<1>[153780.197326] Unable to handle kernel paging request at virtual address d8660000------0. 错误概述<1>[153780.204406] pgd = c287c000---------------------------------------------------------------------------1. show_pte,当前pgd地址0xc287c000<1>[153780.207183] [d8660000] *pgd=00000000-----------------------------------------------------------异常地址0xd8660000和其对应的pgd表项内容0x00000000,问题就出在这里。<0>[153780.210845] Internal error: Oops: 805 [#1] ARM--------------------------------- ----------------0. die<4>[153780.215362] Modules linked in:------------------------------------------------------------------------2. print_modules<4>[153780.218475] CPU: 0 Not tainted (3.4.110 #2)---------------------------------------------------3. __show_regs<4>[153780.223083] PC is at __mutex_lock_slowpath+0x34/0xb8<4>[153780.228118] LR is at dpm_prepare+0x58/0x1d0<4>[153780.232360] pc : [<c04ad5bc>] lr : [<c01a27a8>] psr: 80000013<4>[153780.232391] sp : c2d01e58 ip : 00000000 fp : c2cc6800<4>[153780.243988] r10: c0690bfc r9 : c0690c04 r8 : c3682c68<4>[153780.249298] r7 : c3682c64 r6 : c2c2c000 r5 : c3682c30 r4 : c3682c64<4>[153780.255889] r3 : d8660000 r2 : c2d01e5c r1 : 00000000 r0 : c3682c64<4>[153780.262512] Flags: Nzcv IRQs on FIQs on Mode SVC_32 ISA ARM Segment kernel---Nzcv大写表示置位;IRQ/FIQ都打开;处于SVC_32模式;架构是ARM;处于内核中。<4>[153780.269866] Control: 10c5383d Table: 2287c059 DAC: 00000015<4>[153780.275695] -----------------------------------------------------------------------------------------------------下面大段show_extra_register_data打印pt_regs前后128字节十六进制值<4>[153780.275695] PC: 0xc04ad53c:<4>[153780.280120] d53c 1afffffb e3510001 0afffff6 eaffffb9 e92d4008 e5b03004 e1530000 0a000001<4>[153780.288360] d55c e5930008 ebee2a57 e8bd8008 e3a03001 e1901f9f e180cf93 e33c0000 1afffffb<4>[153780.296630] d57c e3510000 012fff1e eafffff0 e92d41f0 e24dd010 e1a0200d e1a04000 e3c23d7f<4>[153780.304870] d59c e3c3303f e593600c e5903008 e28d2004 e2808004 e5802008 e58d8004 e58d3008<4>[153780.313110] d5bc e5832000 e58d600c e3e05000 e1903f9f e1802f95 e3320000 1afffffb e3530001<4>[153780.321350] d5dc 0a00000e e1903f9f e1802f95 e3320000 1afffffb e3530001 0a000008 e3a07002<4>[153780.329620] d5fc e5867000 eb000433 e1943f9f e1842f95 e3320000 1afffffb e3530001 1afffff7<4>[153780.337860] d61c e99d000c e5823004 e5832000 e5943004 e1580003 03a03000 05843000 e28dd010<4>[153780.346099]<4>[153780.346130] LR: 0xc01a2728:<4>[153780.350524] 2728 e5812090 e587308c eaffffd2 c0690bd8 c06e4e9c c067f0e8 c0690bf4 c01a1d0c<4>[153780.358795] 2748 c059e114 c06e4ea0 e92d4ff8 e59f81b8 e1a00008 e288a024 eb0c2bb6 e288902c<4>[153780.367034] 2768 ea000003 e37b000b 1a00005e e1a00005 ebffda19 e5984024 e154000a 0a000054<4>[153780.375274] 2788 e2445054 e2447020 e1a00005 ebffda09 e59f0174 eb0c2b71 e1a00007 eb0c2ba5<4>[153780.383544] 27a8 e5543004 e2131001 0a000002 e5941014 e2911000 13a01001 e59420a4 e5d43018<4>[153780.391784] 27c8 e3520000 e7c03011 e5c43018 0a000038 e5926000 e3560000 0a000027 e1a00005<4>[153780.400024] 27e8 e12fff36 e1a0b000 e1a01006 e1a0200b e59f0118 ebfff983 e1a00007 eb0c2b57<4>[153780.408264] 2808 e59f0104 eb0c2b8b e35b0000 1affffd4 e5943000 e5542004 e1540003 e3822004<4>[153780.416534]<4>[153780.416534] SP: 0xc2d01dd8:<4>[153780.420959] 1dd8 c06be940 c067ccb8 0000000a c2d01df8 c00190b0 c00193f4 60000013 0000000a<4>[153780.429199] 1df8 c04ad5bc 80000013 ffffffff c2d01e44 c3682c68 c0008cd8 c3682c64 00000000<4>[153780.437438] 1e18 c2d01e5c d8660000 c3682c64 c3682c30 c2c2c000 c3682c64 c3682c68 c0690c04<4>[153780.445709] 1e38 c0690bfc c2cc6800 00000000 c2d01e58 c01a27a8 c04ad5bc 80000013 ffffffff<4>[153780.453948] 1e58 00000010 c3682c68 d8660000 c07b2f78 c3682c84 c3682c30 00000000 c3682c64<4>[153780.462188] 1e78 c0690bd8 c01a27a8 00000000 00000002 00000000 00000003 000d6508 00000000<4>[153780.470458] 1e98 c06d01c8 c2d00000 c2cc6800 c01a292c c06d0748 c004092c 00000003 c04b4340<4>[153780.478698] 1eb8 00000000 000d6508 00000000 c0040d94 c06d0834 00000000 c06e6f4c c06e8f68<4>[153780.486938]<4>[153780.486938] FP: 0xc2cc6780:<4>[153780.491363] 6780 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000<4>[153780.499633] 67a0 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000<4>[153780.507873] 67c0 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000<4>[153780.516113] 67e0 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000<4>[153780.524353] 6800 c06d01c8 c2cba600 00000000 ffffffff 00000001 00000000 00000000 00000000<4>[153780.532623] 6820 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000<4>[153780.540863] 6840 00000000 00000000 00000000 00000001 00000001 c2cc6854 c2cc6854 c2cc6800<4>[153780.549102] 6860 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000<4>[153780.557373]<4>[153780.557373] R0: 0xc3682be4:<4>[153780.561798] 2be4 d8660000 d8660000 d8660000 d8660000 d8660000 d8660000 d8660000 d8660000<4>[153780.570037] 2c04 d8660000 d8660000 d8660000 d8660000 d8660000 d8660000 00000000 00000000<4>[153780.578277] 2c24 00000000 d8660000 d8660000 d8660000 d8660000 d8660000 d8660000 d8660000<4>[153780.586547] 2c44 d8660000 d8660000 d8660000 d8660000 d8660001 d8660000 d8660000 d8660000<4>[153780.594787] 2c64 ffffffff d8660000 c2d01e5c d8660000 d8660000 d8660000 d8660000 d8660000<4>[153780.603027] 2c84 d8660000 c0690bfc d8660000 d8660000 d8660000 d8660000 d8660000 d8660000<4>[153780.611297] 2ca4 d8660000 d8660000 d8660000 d8660000 d8660000 d8660000 d8660000 d8660000<4>[153780.619537] 2cc4 d8660000 d8660000 d8660000 d8660000 d8660000 d8660000 d8660000 d8660000<4>[153780.627777]<4>[153780.627777] R2: 0xc2d01ddc:<4>[153780.632202] 1ddc c067ccb8 0000000a c2d01df8 c00190b0 c00193f4 60000013 0000000a c04ad5bc<4>[153780.640472] 1dfc 80000013 ffffffff c2d01e44 c3682c68 c0008cd8 c3682c64 00000000 c2d01e5c<4>[153780.648712] 1e1c d8660000 c3682c64 c3682c30 c2c2c000 c3682c64 c3682c68 c0690c04 c0690bfc<4>[153780.656951] 1e3c c2cc6800 00000000 c2d01e58 c01a27a8 c04ad5bc 80000013 ffffffff 00000010<4>[153780.665191] 1e5c c3682c68 d8660000 c07b2f78 c3682c84 c3682c30 00000000 c3682c64 c0690bd8<4>[153780.673461] 1e7c c01a27a8 00000000 00000002 00000000 00000003 000d6508 00000000 c06d01c8<4>[153780.681701] 1e9c c2d00000 c2cc6800 c01a292c c06d0748 c004092c 00000003 c04b4340 00000000<4>[153780.689941] 1ebc 000d6508 00000000 c0040d94 c06d0834 00000000 c06e6f4c c06e8f68 c06e6f4c<4>[153780.698211]<4>[153780.698211] R3: 0xd865ff80:<4>[153780.702636] ff80<4>[153780.710876] ffa0<4>[153780.719116] ffc0<4>[153780.727386] ffe0<4>[153780.735626] 0000<4>[153780.743865] 0020<4>[153780.752136] 0040<4>[153780.760375] 0060<4>[153780.768615]<4>[153780.768615] R4: 0xc3682be4:<4>[153780.773040] 2be4 d8660000 d8660000 d8660000 d8660000 d8660000 d8660000 d8660000 d8660000<4>[153780.781280] 2c04 d8660000 d8660000 d8660000 d8660000 d8660000 d8660000 00000000 00000000<4>[153780.789550] 2c24 00000000 d8660000 d8660000 d8660000 d8660000 d8660000 d8660000 d8660000<4>[153780.797790] 2c44 d8660000 d8660000 d8660000 d8660000 d8660001 d8660000 d8660000 d8660000<4>[153780.806030] 2c64 ffffffff d8660000 c2d01e5c d8660000 d8660000 d8660000 d8660000 d8660000<4>[153780.814300] 2c84 d8660000 c0690bfc d8660000 d8660000 d8660000 d8660000 d8660000 d8660000<4>[153780.822540] 2ca4 d8660000 d8660000 d8660000 d8660000 d8660000 d8660000 d8660000 d8660000<4>[153780.830780] 2cc4 d8660000 d8660000 d8660000 d8660000 d8660000 d8660000 d8660000 d8660000<4>[153780.839050]<4>[153780.839050] R5: 0xc3682bb0:<4>[153780.843475] 2bb0 d8660000 d8660000 d8660000 d8660000 d8660000 d8660000 d8660000 d8660000<4>[153780.851715] 2bd0 d8660000 d8660000 d8660000 d8660000 d8660000 d8660000 d8660000 d8660000<4>[153780.859954] 2bf0 d8660000 d8660000 d8660000 d8660000 d8660000 d8660000 d8660000 d8660000<4>[153780.868225] 2c10 d8660000 d8660000 d8660000 00000000 00000000 00000000 d8660000 d8660000<4>[153780.876464] 2c30 d8660000 d8660000 d8660000 d8660000 d8660000 d8660000 d8660000 d8660000<4>[153780.884704] 2c50 d8660000 d8660001 d8660000 d8660000 d8660000 ffffffff d8660000 c2d01e5c<4>[153780.892944] 2c70 d8660000 d8660000 d8660000 d8660000 d8660000 d8660000 c0690bfc d8660000<4>[153780.901214] 2c90 d8660000 d8660000 d8660000 d8660000 d8660000 d8660000 d8660000 d8660000<4>[153780.909454]<4>[153780.909454] R6: 0xc2c2bf80:<4>[153780.913879] bf80 f0000188 000181a4 00000000 00000000 00000000 00000000 c04bb340 c06aafbc<4>[153780.922119] bfa0 c2c2bf00 c2c2b380 00000000 c3708000 00000000 00000000 00000001 00000000<4>[153780.930389] bfc0 00000000 c2c2bfc4 c2c2bfc4 66756208 666e695f 72a5006f 7ae75aad 5aa55aa5<4>[153780.938629] bfe0 5aa55ac5 52a75aa0 4a255aa5 5aa45aa5 5aa55aa5 1aa54aa5 08a55ae5 42a552ad<4>[153780.946868] c000 00000000 c2d00000 00000002 04208040 00000000 00000001 00000064 00000064<4>[153780.955139] c020 00000064 00000000 c04b4078 00000000 00015ab9 0000bd04 00000001 00000000<4>[153780.963378] c040 00000000 c2c2c044 c2c2c044 00000001 be05bcc5 00008bdc 02e98615 00000000<4>[153780.971618] c060 adc99ea7 00000105 01f7d876 00000000 00000000 00000000 00000000 00000000<4>[153780.979888]<4>[153780.979888] R7: 0xc3682be4:<4>[153780.984313] 2be4 d8660000 d8660000 d8660000 d8660000 d8660000 d8660000 d8660000 d8660000<4>[153780.992553] 2c04 d8660000 d8660000 d8660000 d8660000 d8660000 d8660000 00000000 00000000<4>[153781.000793] 2c24 00000000 d8660000 d8660000 d8660000 d8660000 d8660000 d8660000 d8660000<4>[153781.009063] 2c44 d8660000 d8660000 d8660000 d8660000 d8660001 d8660000 d8660000 d8660000<4>[153781.017303] 2c64 ffffffff d8660000 c2d01e5c d8660000 d8660000 d8660000 d8660000 d8660000<4>[153781.025543] 2c84 d8660000 c0690bfc d8660000 d8660000 d8660000 d8660000 d8660000 d8660000<4>[153781.033782] 2ca4 d8660000 d8660000 d8660000 d8660000 d8660000 d8660000 d8660000 d8660000<4>[153781.042053] 2cc4 d8660000 d8660000 d8660000 d8660000 d8660000 d8660000 d8660000 d8660000<4>[153781.050292]<4>[153781.050292] R8: 0xc3682be8:<4>[153781.054718] 2be8 d8660000 d8660000 d8660000 d8660000 d8660000 d8660000 d8660000 d8660000<4>[153781.062957] 2c08 d8660000 d8660000 d8660000 d8660000 d8660000 00000000 00000000 00000000<4>[153781.071228] 2c28 d8660000 d8660000 d8660000 d8660000 d8660000 d8660000 d8660000 d8660000<4>[153781.079467] 2c48 d8660000 d8660000 d8660000 d8660001 d8660000 d8660000 d8660000 ffffffff<4>[153781.087707] 2c68 d8660000 c2d01e5c d8660000 d8660000 d8660000 d8660000 d8660000 d8660000<4>[153781.095977] 2c88 c0690bfc d8660000 d8660000 d8660000 d8660000 d8660000 d8660000 d8660000<4>[153781.104217] 2ca8 d8660000 d8660000 d8660000 d8660000 d8660000 d8660000 d8660000 d8660000<4>[153781.112457] 2cc8 d8660000 d8660000 d8660000 d8660000 d8660000 d8660000 d8660000 d8660000<4>[153781.120727]<4>[153781.120727] R9: 0xc0690b84:<4>[153781.125152] 0b84 c019fcac 00000000 c05b339c 00000124 c019fc00 00000000 c05b33b0 00000124<4>[153781.133392] 0ba4 c019fb54 00000000 c05b33d0 000001a4 c019f8b0 c019fb00 00000000 c0690bc0<4>[153781.141632] 0bc4 c0690bc0 00000000 00000001 c0690bd0 c0690bd0 00000001 c0690bdc c0690bdc<4>[153781.149902] 0be4 c0690be4 c0690be4 c0690bec c0690bec c0690bf4 c0690bf4 c3682c84 c283fb04<4>[153781.158142] 0c04 c06907c4 c36b4454 c2ae03d4 c3411e44 c0690c14 c0690c14 00000000 c0690c20<4>[153781.166381] 0c24 c0690c20 00000005 00000100 c0690c30 c0690c30 c01a5354 c05b31c8 00000000<4>[153781.174621] 0c44 c0690cb8 00000000 00000000 c3413ac0 c01a5b3c 00000000 00000000 c01a5ae4<4>[153781.182891] 0c64 00000000 00000000 00000000 00000000 00000000 c348bf00 0000003c c05b5d00<4>[153781.191131]<4>[153781.191131] R10: 0xc0690b7c:<4>[153781.195648] 0b7c c05b3388 00000124 c019fcac 00000000 c05b339c 00000124 c019fc00 00000000<4>[153781.203887] 0b9c c05b33b0 00000124 c019fb54 00000000 c05b33d0 000001a4 c019f8b0 c019fb00<4>[153781.212158] 0bbc 00000000 c0690bc0 c0690bc0 00000000 00000001 c0690bd0 c0690bd0 00000001<4>[153781.220397] 0bdc c0690bdc c0690bdc c0690be4 c0690be4 c0690bec c0690bec c0690bf4 c0690bf4<4>[153781.228637] 0bfc c3682c84 c283fb04 c06907c4 c36b4454 c2ae03d4 c3411e44 c0690c14 c0690c14<4>[153781.236877] 0c1c 00000000 c0690c20 c0690c20 00000005 00000100 c0690c30 c0690c30 c01a5354<4>[153781.245147] 0c3c c05b31c8 00000000 c0690cb8 00000000 00000000 c3413ac0 c01a5b3c 00000000<4>[153781.253387] 0c5c 00000000 c01a5ae4 00000000 00000000 00000000 00000000 00000000 c348bf00<0>[153781.261627] Process suspend (pid: 755, stack limit = 0xc2d00268)--------------线程名是suspend,pid是755,栈的最底部是0xc2d00268,也即sp的指针不能小于此值。<0>[153781.267730] Stack: (0xc2d01e58 to 0xc2d02000)----------------------------------------------------------------------------------4. dump_mem,有前面可知栈的底部,8K对齐则是栈的顶部。<0>[153781.272155] 1e40: 00000010 c3682c68--------------------------------------------------从栈的底部开始dump,直到栈的顶部。<0>[153781.280395] 1e60: d8660000 c07b2f78 c3682c84 c3682c30 00000000 c3682c64 c0690bd8 c01a27a8<0>[153781.288635] 1e80: 00000000 00000002 00000000 00000003 000d6508 00000000 c06d01c8 c2d00000<0>[153781.296905] 1ea0: c2cc6800 c01a292c c06d0748 c004092c 00000003 c04b4340 00000000 000d6508<0>[153781.305145] 1ec0: 00000000 c0040d94 c06d0834 00000000 c06e6f4c c06e8f68 c06e6f4c c0690c1c<0>[153781.313385] 1ee0: 000d6508 c06e8f68 c06e6f4c c0690c1c 000d6508 c01a5390 c067eaf0 00000000<0>[153781.321655] 1f00: c2d01f9c c04ae1e0 00000000 c2c2c000 c067eaf0 386f67b6 1432efb3 00000000<0>[153781.329895] 1f20: c2d01f7c c003a910 895c6980 00000000 7bb36301 00000000 00000000 895c6980<0>[153781.338134] 1f40: 00000000 c2c2c000 c0690c2c c2cc79c0 00000000 c2cc6800 00000000 c002b7cc<0>[153781.346405] 1f60: 00000064 c2c2c000 c067eaf0 c2cc79c0 c2cc79d4 c2d00000 c2cc79d4 00000001<0>[153781.354644] 1f80: c06d01c8 00000002 c2cc6800 c002ba10 c06d01c4 c2cc79c0 c2cba600 c002bb28<0>[153781.362884] 1fa0: c002ba20 c06d01c4 00000000 c341fefc c2cba600 c002ba20 00000013 00000000<0>[153781.371124] 1fc0: 00000000 00000000 00000000 c0030144 00000000 00000000 c2cba600 00000000<0>[153781.379394] 1fe0: c2d01fe0 c2d01fe0 c341fefc c00300c0 c0009a20 c0009a20 00000000 00000000<4>[153781.387664] [<c04ad5bc>] (__mutex_lock_slowpath+0x34/0xb8) from [<c01a27a8>] (dpm_prepare+0x58/0x1d0)----5. dump_backtrace<4>[153781.396942] [<c01a27a8>] (dpm_prepare+0x58/0x1d0) from [<c01a292c>] (dpm_suspend_start+0xc/0x60)<4>[153781.405792] [<c01a292c>] (dpm_suspend_start+0xc/0x60) from [<c004092c>] (suspend_devices_and_enter+0x58/0x258)<4>[153781.415863] [<c004092c>] (suspend_devices_and_enter+0x58/0x258) from [<c0040d94>] (pm_suspend+0x268/0x2b0)<4>[153781.425598] [<c0040d94>] (pm_suspend+0x268/0x2b0) from [<c01a5390>] (suspend+0x3c/0xfc)--------------------dump_backtrace_entry负责打印每条信息,从右到左调用关系<4>[153781.433654] [<c01a5390>] (suspend+0x3c/0xfc) from [<c002b7cc>] (process_one_work+0x138/0x358)<4>[153781.442260] [<c002b7cc>] (process_one_work+0x138/0x358) from [<c002ba10>] (process_scheduled_works+0x24/0x34)<4>[153781.452239] [<c002ba10>] (process_scheduled_works+0x24/0x34) from [<c002bb28>] (rescuer_thread+0x108/0x19c)<4>[153781.462066] [<c002bb28>] (rescuer_thread+0x108/0x19c) from [<c0030144>] (kthread+0x84/0x90)<4>[153781.470489] [<c0030144>] (kthread+0x84/0x90) from [<c0009a20>] (kernel_thread_exit+0x0/0x8)<0>[153781.478881] Code: e2808004 e5802008 e58d8004 e58d3008 (e5832000) --------------------------------------------------6. dump_instr<4>[153781.485168] ---[ end trace 352bcf684b277880 ]---------------------------------------------------------------------------------------oops_exit打印信息<0>[153781.489746] Kernel panic - not syncing: Fatal exception---------------------------------------------------------------------------7. panic,
__do_kernel_fault主要打印pte页表内容,然后将工作交给die进行处理。
show_pte对pgd、pud、pmd、pte各项进行了检查。
/** Oops. The kernel tried to access some page that wasn't present.*/static void__do_kernel_fault(struct mm_struct *mm, unsigned long addr, unsigned int fsr,struct pt_regs *regs){/** Are we prepared to handle this kernel fault?*/if (fixup_exception(regs))return;/** No handler, we'll have to terminate things with extreme prejudice.*/bust_spinlocks(1);pr_alert("Unable to handle kernel %s at virtual address %08lx\n",(addr < PAGE_SIZE) ? "NULL pointer dereference" :"paging request", addr);-------------------------------------------用户空间地址显示"NULL pointer dereference",内核空间地址显示"paging request"。show_pte(mm, addr);-----------------------------------------------------打印页表项内容die("Oops", regs, fsr);-------------------------------------------------Oops die打印,包括modules、pt_regs、stack、backtrace、mem等信息。bust_spinlocks(0);do_exit(SIGKILL);}/** This is useful to dump out the page tables associated with* 'addr' in mm 'mm'.*/voidshow_pte(struct mm_struct *mm, unsignedlong addr){pgd_t *pgd;if (!mm)----------------------------------------------------------------如果当前mm为NULL,表示当前进程为内核线程,mm对应init_mm。mm = &init_mm;pr_alert("pgd = %p\n", mm->pgd);----------------------------------------打印pgd地址pgd = pgd_offset(mm, addr);pr_alert("[%08lx] *pgd=%08llx",addr, (long long)pgd_val(*pgd));--------------------------------打印问题地址和其地址对应的pgd值,注意这里的pgd已经根据地址进行了偏移。do {pud_t *pud;pmd_t *pmd;pte_t *pte;if (pgd_none(*pgd))break;if (pgd_bad(*pgd)) {pr_cont("(bad)");break;}pud = pud_offset(pgd, addr);if (PTRS_PER_PUD != 1)pr_cont(", *pud=%08llx", (long long)pud_val(*pud));if (pud_none(*pud))break;if (pud_bad(*pud)) {pr_cont("(bad)");break;}---------------------------------------------------------------------对于Linux二级页表映射,上面的判断都可以跳过。pmd = pmd_offset(pud, addr);if (PTRS_PER_PMD != 1)pr_cont(", *pmd=%08llx", (long long)pmd_val(*pmd));if (pmd_none(*pmd))------------------------------------------对于Linux二级页表映射,pmd=pud=pgd,所以*pmd=*pgd。因为实例中*pgd=0x0000,所以此处break。break;if (pmd_bad(*pmd)) {-----------------------------------------pmd值第2bit必须清零,#define pmd_bad(pmd) (pmd_val(pmd) & 2)pr_cont("(bad)");break;}/* We must not map this if we have highmem enabled */if (PageHighMem(pfn_to_page(pmd_val(*pmd) >> PAGE_SHIFT)))break;pte = pte_offset_map(pmd, addr);pr_cont(", *pte=%08llx", (long long)pte_val(*pte));#ifndef CONFIG_ARM_LPAEpr_cont(", *ppte=%08llx",(long long)pte_val(pte[PTE_HWTABLE_PTRS]));#endifpte_unmap(pte);} while(0);pr_cont("\n");}
die将主要工交给__die()打印信息,然后调用panic()执行halt或重启之类的操作。
void die(const char *str, struct pt_regs *regs, int err){struct thread_info *thread = current_thread_info();int ret;enum bug_trap_type bug_type = BUG_TRAP_TYPE_NONE;oops_enter();raw_spin_lock_irq(&die_lock);console_verbose();bust_spinlocks(1);if (!user_mode(regs))bug_type = report_bug(regs->ARM_pc, regs);if (bug_type != BUG_TRAP_TYPE_NONE)str = "Oops - BUG";ret = __die(str, err, thread, regs);if (regs && kexec_should_crash(thread->task))crash_kexec(regs);---------------------------------------加载并运行调试内核bust_spinlocks(0);add_taint(TAINT_DIE);raw_spin_unlock_irq(&die_lock);oops_exit();------------------------------------------------打印"...end trace...",表示Oops结束,进入panic阶段。if (in_interrupt())panic("Fatal exception in interrupt");if (panic_on_oops)panic("Fatal exception");if (ret != NOTIFY_STOP)do_exit(SIGSEGV);}
__die输出module信息、ARM寄存器、dump栈、回溯栈等信息。
__show_regs将pt_regs的寄存器打印,并将前后128字节dump出来。
dump_mem将stack二进制dump出来。
dump_backtrace回溯栈并打印出对应符号表信息。
static int __die(const char *str, int err, struct pt_regs *regs){struct task_struct *tsk = current;static int die_counter;int ret;pr_emerg("Internal error: %s: %x [#%d]" S_PREEMPT S_SMP S_ISA "\n",str, err, ++die_counter);/* trap and error numbers are mostly meaningless on ARM */ret = notify_die(DIE_OOPS, str, regs, err, tsk->thread.trap_no, SIGSEGV);if (ret == NOTIFY_STOP)return 1;print_modules();__show_regs(regs);pr_emerg("Process %.*s (pid: %d, stack limit = 0x%p)\n",TASK_COMM_LEN, tsk->comm, task_pid_nr(tsk), end_of_stack(tsk));-------end_of_stack是栈的底部。if (!user_mode(regs) || in_interrupt()) {dump_mem(KERN_EMERG, "Stack: ", regs->ARM_sp,THREAD_SIZE + (unsigned long)task_stack_page(tsk));---------------dump的范围是当前sp指针到栈的顶部,顶部可以通过task->stack获取,大小固定。sp指向底部。dump_backtrace(regs, tsk);dump_instr(KERN_EMERG, regs);}return 0;}voidprint_modules(void){struct module *mod;char buf[8];printk(KERN_DEFAULT "Modules linked in:");/* Most callers should already have preempt disabled, but make sure */preempt_disable();list_for_each_entry_rcu(mod, &modules, list) {if (mod->state == MODULE_STATE_UNFORMED)continue;pr_cont(" %s%s", mod->name, module_flags(mod, buf));}preempt_enable();if (last_unloaded_module[0])pr_cont(" [last unloaded: %s]", last_unloaded_module);pr_cont("\n");}void __show_regs(struct pt_regs *regs){unsigned long flags;char buf[64];show_regs_print_info(KERN_DEFAULT);print_symbol("PC is at %s\n", instruction_pointer(regs));----------PC指针指向的函数以及偏移print_symbol("LR is at %s\n", regs->ARM_lr);-----------------------LR指向的函数以及偏移printk("pc : [<%08lx>] lr : [<%08lx>] psr: %08lx\n"----------打印pt_regs各寄存器值。"sp : %08lx ip : %08lx fp : %08lx\n",regs->ARM_pc, regs->ARM_lr, regs->ARM_cpsr,regs->ARM_sp, regs->ARM_ip, regs->ARM_fp);printk("r10: %08lx r9 : %08lx r8 : %08lx\n",regs->ARM_r10, regs->ARM_r9,regs->ARM_r8);printk("r7 : %08lx r6 : %08lx r5 : %08lx r4 : %08lx\n",regs->ARM_r7, regs->ARM_r6,regs->ARM_r5, regs->ARM_r4);printk("r3 : %08lx r2 : %08lx r1 : %08lx r0 : %08lx\n",regs->ARM_r3, regs->ARM_r2,regs->ARM_r1, regs->ARM_r0);flags = regs->ARM_cpsr;------------------------------------------cpsr的NZCV标志位buf[0] = flags & PSR_N_BIT ? 'N' : 'n';buf[1] = flags & PSR_Z_BIT ? 'Z' : 'z';buf[2] = flags & PSR_C_BIT ? 'C' : 'c';buf[3] = flags & PSR_V_BIT ? 'V' : 'v';buf[4] = '\0';#ifndef CONFIG_CPU_V7Mprintk("Flags: %s IRQs o%s FIQs o%s Mode %s ISA %s Segment %s\n",buf, interrupts_enabled(regs) ? "n" : "ff",fast_interrupts_enabled(regs) ? "n" : "ff",processor_modes[processor_mode(regs)],isa_modes[isa_mode(regs)],get_fs() == get_ds() ? "kernel" : "user");#elseprintk("xPSR: %08lx\n", regs->ARM_cpsr);#endif#ifdef CONFIG_CPU_CP15{unsigned int ctrl;buf[0] = '\0';#ifdef CONFIG_CPU_CP15_MMU{unsigned int transbase, dac;asm("mrc p15, 0, %0, c2, c0\n\t""mrc p15, 0, %1, c3, c0\n": "=r" (transbase), "=r" (dac));snprintf(buf, sizeof(buf), " Table: %08x DAC: %08x",transbase, dac);}#endifasm("mrc p15, 0, %0, c1, c0\n" : "=r" (ctrl));printk("Control: %08x%s\n", ctrl, buf);-----------------------输出MMU相关信息}#endifshow_extra_register_data(regs, 128);------------------------------打印pt_regs寄存器地址的前后128字节十六进制}/** Dump out the contents of some memory nicely...*/staticvoiddump_mem(constchar *lvl, constchar *str, unsignedlong bottom,unsigned long top){unsigned long first;mm_segment_t fs;int i;/** We need to switch to kernel mode so that we can use __get_user* to safely read from kernel space. Note that we now dump the* code first, just in case the backtrace kills us.*/fs = get_fs();set_fs(KERNEL_DS);printk("%s%s(0x%08lx to 0x%08lx)\n", lvl, str, bottom, top);for (first = bottom & ~31; first < top; first += 32) {unsigned long p;char str[sizeof(" 12345678") * 8 + 1];memset(str, ' ', sizeof(str));str[sizeof(str) - 1] = '\0';for (p = first, i = 0; i < 8 && p < top; i++, p += 4) {if (p >= bottom && p < top) {unsigned long val;if (__get_user(val, (unsigned long *)p) == 0)sprintf(str + i * 9, " %08lx", val);elsesprintf(str + i * 9, " ????????");}}printk("%s%04lx:%s\n", lvl, first & 0xffff, str);}set_fs(fs);}staticinlinevoiddump_backtrace(struct pt_regs *regs, struct task_struct *tsk){unwind_backtrace(regs, tsk);}voiddump_backtrace_entry(unsignedlong where, unsignedlong from, unsignedlong frame){#ifdef CONFIG_KALLSYMSprintk("[<%08lx>] (%pS) from [<%08lx>] (%pS)\n", where, (void *)where, from, (void *)from);#elseprintk("Function entered at [<%08lx>] from [<%08lx>]\n", where, from);#endifif (in_exception_text(where))dump_mem("", "Exception stack", frame + 4, frame + 4 + sizeof(struct pt_regs));}staticvoiddump_instr(constchar *lvl, struct pt_regs *regs){unsigned long addr = instruction_pointer(regs);const int thumb = thumb_mode(regs);const int width = thumb ? 4 : 8;mm_segment_t fs;char str[sizeof("00000000 ") * 5 + 2 + 1], *p = str;int i;/** We need to switch to kernel mode so that we can use __get_user* to safely read from kernel space. Note that we now dump the* code first, just in case the backtrace kills us.*/fs = get_fs();set_fs(KERNEL_DS);for (i = -4; i < 1 + !!thumb; i++) {unsigned int val, bad;if (thumb)bad = __get_user(val, &((u16 *)addr)[i]);elsebad = __get_user(val, &((u32 *)addr)[i]);if (!bad)p += sprintf(p, i == 0 ? "(%0*x) " : "%0*x ",width, val);else {p += sprintf(p, "bad PC value");break;}}printk("%sCode: %s\n", lvl, str);set_fs(fs);}
panic()首先打印一条信息"Kernel panic...",然后执行一些清理操作。
最后执行panic_blink提示,执行重启操作。
voidpanic(constchar *fmt, ...){staticDEFINE_SPINLOCK(panic_lock);static char buf[1024];va_list args;long i, i_next = 0;int state = 0;/** Disable local interrupts. This will prevent panic_smp_self_stop* from deadlocking the first cpu that invokes the panic, since* there is nothing to prevent an interrupt handler (that runs* after the panic_lock is acquired) from invoking panic again.*/local_irq_disable();/** It's possible to come here directly from a panic-assertion and* not have preempt disabled. Some functions called from here want* preempt to be disabled. No point enabling it later though...** Only one CPU is allowed to execute the panic code from here. For* multiple parallel invocations of panic, all other CPUs either* stop themself or will wait until they are stopped by the 1st CPU* with smp_send_stop().*/if (!spin_trylock(&panic_lock))panic_smp_self_stop();console_verbose();bust_spinlocks(1);va_start(args, fmt);vsnprintf(buf, sizeof(buf), fmt, args);va_end(args);printk(KERN_EMERG "Kernel panic - not syncing: %s\n",buf);----------------------panic()的最后最后一条消息#ifdef CONFIG_DEBUG_BUGVERBOSE/** Avoid nested stack-dumping if a panic occurs during oops processing*/if (!test_taint(TAINT_DIE) && oops_in_progress <= 1)dump_stack();#endif/** If we have crashed and we have a crash kernel loaded let it handle* everything else.* Do we want to call this before we try to display a message?*/crash_kexec(NULL);-------------------------------------------------------------定义CONFIG_KEXEC的情况下,加载调试内核镜像,然后执行。/** Note smp_send_stop is the usual smp shutdown function, which* unfortunately means it may not be hardened to work in a panic* situation.*/smp_send_stop();---------------------------------------------------------------关闭SMP其它核。kmsg_dump(KMSG_DUMP_PANIC);----------------------------------------------------执行dump_list上的dumper。atomic_notifier_call_chain(&panic_notifier_list, 0, buf);----------------------执行panic_notifier_list链表上的notifier。bust_spinlocks(0);if (!panic_blink)panic_blink = no_blink;if (panic_timeout > 0) {-------------------------------------------------------如果panic_timeout不为0情况下,会在若干秒过后重启。/** Delay timeout seconds before rebooting the machine.* We can't use the "normal" timers since we just panicked.*/printk(KERN_EMERG "Rebooting in %d seconds..", panic_timeout);for (i = 0; i < panic_timeout * 1000; i += PANIC_TIMER_STEP) {touch_nmi_watchdog();if (i >= i_next) {i += panic_blink(state ^= 1);i_next = i + 3600 / PANIC_BLINK_SPD;}mdelay(PANIC_TIMER_STEP);}}if (panic_timeout != 0) {/** This will not be a clean reboot, with everything* shutting down. But if there is a chance of* rebooting the system it will be rebooted.*/emergency_restart();--------------------------------------------------------执行重启操作。}...}
Section Permission Fault(段权限错误),是指CPU访问的虚拟地址对应的段级页表项(PMD,1MB段映射)有效存在,但当前访问操作的权限(读/写/执行)与页表项中配置的硬件权限不匹配,MMU触发数据中止异常,属于权限类访问错误,核心区别于地址未映射的Translation Fault。
在ARM32架构中,段权限错误对应的FSR(Fault Status Register)状态位,经fsr_fs()处理后,会匹配到fsr_info中对应的条目,典型配置如下:
static struct fsr_info fsr_info[] = {...{ do_translation_fault, SIGSEGV, SEGV_MAPERR, "section translation fault" },{ do_page_fault, SIGSEGV, SEGV_ACCERR, "section permission fault" },{ do_bad, SIGBUS, 0, "external abort on linefetch" },{ do_page_fault, SIGSEGV, SEGV_MAPERR, "page translation fault" },{ do_page_fault, SIGSEGV, SEGV_ACCERR, "page permission fault" },...}
典型实例:段权限错误的典型fsr值为0x80d,二进制为100000001101,经fsr_fs()计算,低4位0b1101(十进制13),FSR_FS4位为0,最终返回值为13,匹配到上述section permission fault条目,inf->fn即为do_page_fault,错误类型为SEGV_ACCERR(权限错误,区别于Translation Fault的SEGV_MAPERR地址未映射)。
段权限错误的栈回溯流程与Section Translation Fault不同,核心处理函数为do_page_fault,典型栈回溯如下:
__dabt_svc->do_DataAbort->do_page_fault->do_bad_area->__do_user_fault(用户态异常,发送SIGSEGV信号)->__do_kernel_fault(内核态异常,触发Oops)->die
Section Permission Fault类型的错误处理入口是do_page_fault,对应3.4.110内核ARM架构的核心实现与解析如下,重点关注权限校验逻辑:
staticint __kprobesdo_page_fault(unsignedlong addr, unsignedint fsr, struct pt_regs *regs){struct task_struct *tsk;struct mm_struct *mm;struct vm_area_struct *vma;unsigned int flags;int fault;siginfo_t info;tsk = current;mm = tsk->mm;/* 中断上下文或内核线程无mm_struct,直接进入内核异常处理 */if (in_interrupt() || !mm)goto no_context;down_read(&mm->mmap_sem);vma = find_vma(mm, addr);if (!vma)goto bad_area;if (vma->vm_start <= addr)goto good_area;if (!(vma->vm_flags & VM_GROWSDOWN))goto bad_area;if (expand_stack(vma, addr))goto bad_area;/* 地址对应VMA合法,进入权限校验分支 */good_area:info.si_code = SEGV_ACCERR; // 权限错误固定标记为SEGV_ACCERRflags = 0;/* 解析FSR中的访问类型,校验VMA权限是否匹配 */if (fsr & FSR_WRITE) {flags |= FAULT_FLAG_WRITE;if (!(vma->vm_flags & VM_WRITE)) // 写访问,但VMA无写权限goto bad_area;} else if (fsr & FSR_EXEC) {flags |= FAULT_FLAG_EXEC;if (!(vma->vm_flags & VM_EXEC)) // 执行访问,但VMA无执行权限goto bad_area;} else {flags |= FAULT_FLAG_READ;if (!(vma->vm_flags & VM_READ)) // 读访问,但VMA无读权限goto bad_area;}/* 尝试修复权限异常,如COW写时复制场景 */fault = handle_mm_fault(mm, vma, addr, flags);if (unlikely(fault & VM_FAULT_ERROR)) {if (fault & VM_FAULT_OOM)goto out_of_memory;else if (fault & VM_FAULT_SIGBUS)goto do_sigbus;BUG();}if (fault & VM_FAULT_MAJOR)tsk->maj_flt++;elsetsk->min_flt++;up_read(&mm->mmap_sem);return 0;/* 地址无合法VMA或权限不匹配,进入异常处理 */bad_area:up_read(&mm->mmap_sem);bad_area_nosemaphore:/* 用户态异常直接发送SIGSEGV信号 */if (user_mode(regs)) {info.si_signo = SIGSEGV;info.si_errno = 0;info.si_addr = (void __user *)addr;force_sig_info(SIGSEGV, &info, tsk);return 0;}no_context:/* 内核态异常,进入Oops流程 */do_bad_area(addr, fsr, regs);return 0;out_of_memory:up_read(&mm->mmap_sem);if (user_mode(regs)) {pagefault_out_of_memory();return 0;}goto no_context;do_sigbus:up_read(&mm->mmap_sem);info.si_signo = SIGBUS;info.si_errno = 0;info.si_code = BUS_ADRERR;info.si_addr = (void __user *)addr;force_sig_info(SIGBUS, &info, tsk);return 0;}
代码核心逻辑解析:
上下文合法性判断:如果处于中断上下文,或是无mm_struct的内核线程,直接进入no_context分支,调用do_bad_area处理。
VMA查找与校验:查找异常地址对应的VMA,无匹配VMA、VMA地址不合法且无法扩展栈的场景,直接进入bad_area。
核心权限校验:根据FSR位判断访问类型(写/执行/读),检查VMA的vm_flags是否匹配对应权限,不匹配则直接进入异常分支。
异常修复尝试:调用handle_mm_fault尝试修复异常,比如COW写时复制场景,会在这里完成页表权限更新,修复后正常返回;无法修复的异常进入错误分支。
异常收尾:用户态异常发送SIGSEGV信号,内核态异常进入do_bad_area,最终触发Oops。
内核态的段权限错误,最终会进入__do_kernel_fault处理,核心调用流程与Section Translation Fault完全一致,核心差异在于地址映射存在但权限不匹配,在Oops打印和show_pte输出上有明确区分。
内核态段权限错误的核心调用流程:
__do_kernel_fault->show_pte->die->__die->print_modules->__show_regs->dump_mem->dump_backtrace->dump_instr->panic
下面是典型的Section Permission Fault Oops打印实例,结合代码分析如下:
<1>[23456.789012] Unable to handle kernel paging request at virtual address c0800000<1>[23456.789500] pgd = c287c000<1>[23456.789800] [c0800000] *pgd=0080040e, *pmd=0080040e<0>[23456.790200] Internal error: Oops: 80d [#1] ARM<4>[23456.790500] Modules linked in:<4>[23456.790800] CPU: 0 Not tainted (3.4.110 #2)<4>[23456.791100] PC is at test_write_ro_section+0x10/0x20<4>[23456.791400] LR is at test_driver_init+0x20/0x40<4>[23456.791700] pc : [<c04ad600>] lr : [<c01a2800>] psr: 80000013<4>[23456.792000] sp : c2d01e60 ip : 00000000 fp : c2cc6800<4>[23456.792300] r10: c0690bfc r9 : c0690c04 r8 : c3682c68<4>[23456.792600] r7 : c3682c64 r6 : c2c2c000 r5 : c3682c30 r4 : c0800000<4>[23456.792900] r3 : 00000000 r2 : 00000001 r1 : 00000000 r0 : c0800000<4>[23456.793200] Flags: Nzcv IRQs on FIQs on Mode SVC_32 ISA ARM Segment kernel<4>[23456.793500] Control: 10c5383d Table: 2287c059 DAC: 00000015...(后续寄存器dump、栈回溯、指令dump、panic流程与Section Translation Fault一致)
核心差异与根因分析:
错误类型标识:FSR值为0x80d,对应section permission fault,错误码为SEGV_ACCERR,而非Translation Fault的SEGV_MAPERR。
show_pte输出核心差异:异常地址0xc0800000对应的pgd、pmd均为非0有效值,说明段级页表映射存在,直接排除Translation Fault;段映射场景下不会走到PTE级打印,与页级错误明确区分。
权限不匹配根因:pmd值0x0080040e对应的段描述符,AP(访问权限)位配置为只读,而当前PC指向的test_write_ro_section函数,尝试对内核只读rodata段执行写操作,硬件权限与访问类型不匹配,触发段权限错误。
后续处理:die、__die、panic的流程与Section Translation Fault完全一致,最终触发内核panic。
Page Permission Fault(页权限错误),是指CPU访问的虚拟地址对应的页级页表项(PTE,4KB页映射)有效存在,PGD、PMD级页表均正常,但PTE中配置的硬件访问权限与当前操作的权限(读/写/执行)不匹配,MMU触发数据中止异常,是Linux系统中最常见的内存权限类异常,典型场景包括用户态写只读页、COW写时复制异常、执行不可执行页等。
对应fsr_info中的条目,页权限错误的FSR经fsr_fs()处理后,匹配到如下条目:
static struct fsr_info fsr_info[] = {...{ do_translation_fault, SIGSEGV, SEGV_MAPERR, "section translation fault" },{ do_page_fault, SIGSEGV, SEGV_ACCERR, "section permission fault" },{ do_bad, SIGBUS, 0, "external abort on linefetch" },{ do_page_fault, SIGSEGV, SEGV_MAPERR, "page translation fault" },{ do_page_fault, SIGSEGV, SEGV_ACCERR, "page permission fault" },...}
典型实例:页权限错误的典型fsr值为0x807,二进制为100000000111,经fsr_fs()计算,低4位0b0111(十进制7),FSR_FS4位为0,最终返回值为7,匹配到上述page permission fault条目,inf->fn为do_page_fault,错误类型为SEGV_ACCERR。
页权限错误的栈回溯流程与段权限错误一致,核心处理函数均为do_page_fault,典型栈回溯如下:
__dabt_svc->do_DataAbort->do_page_fault->handle_mm_fault(可修复场景,如COW,正常返回)->do_bad_area(不可修复场景)->__do_user_fault(用户态异常,发送SIGSEGV)->__do_kernel_fault(内核态异常,触发Oops)->die
Page Permission Fault与Section Permission Fault共享do_page_fault入口函数,核心差异在于页表遍历的层级:段权限错误在PMD级(段描述符)发现权限不匹配,页权限错误会遍历到PTE级,在页描述符中完成权限校验与异常处理。
针对页权限错误,do_page_fault的核心处理逻辑聚焦于handle_mm_fault分支,核心实现与解析如下:
// 承接3.2节do_page_fault代码,页权限错误核心处理分支good_area:info.si_code = SEGV_ACCERR;flags = 0;/* 解析访问类型,校验VMA逻辑权限 */if (fsr & FSR_WRITE) {flags |= FAULT_FLAG_WRITE;if (!(vma->vm_flags & VM_WRITE))goto bad_area;} else if (fsr & FSR_EXEC) {flags |= FAULT_FLAG_EXEC;if (!(vma->vm_flags & VM_EXEC))goto bad_area;} else {flags |= FAULT_FLAG_READ;if (!(vma->vm_flags & VM_READ))goto bad_area;}/* 页权限错误核心修复与校验逻辑 */fault = handle_mm_fault(mm, vma, addr, flags);
handle_mm_fault针对页权限错误的核心处理流程:
页表全链路遍历:从PGD -> PUD -> PMD -> PTE,逐级找到异常地址对应的PTE页表项,确认页表项有效存在,排除页转换错误。
硬件权限校验:对比PTE的硬件权限位与当前访问类型,同时结合VMA的逻辑权限,确认是否存在权限不匹配。
可修复场景处理(无Oops触发):
写时复制(COW)场景:fork之后父子进程共享只读页,子进程尝试写页时触发权限异常,handle_mm_fault会分配新物理页、复制原页内容、更新PTE为可写权限,完成异常修复。
按需分配场景:mmap申请的私有匿名页,首次访问时触发权限异常,完成物理页分配与页表映射,修复异常。
swap页换入:被换出到swap分区的页,访问时触发异常,完成页换入与页表更新,修复异常。
不可修复场景处理:VMA权限与访问类型不匹配、内核态非法访问用户只读页、写内核只读rodata段、执行XN(不可执行)位保护的页等,handle_mm_fault返回错误,进入bad_area分支,最终触发信号或Oops。
内核态的页权限错误,最终同样进入__do_kernel_fault处理,流程与前两种异常一致,核心差异在于页表遍历到PTE级,show_pte会打印出有效PTE值,异常根因集中在页级权限不匹配。
下面是典型的Page Permission Fault Oops打印实例,结合代码分析如下:
<1>[34567.890123] Unable to handle kernel paging request at virtual address b6f01000<1>[34567.890600] pgd = c287c000<1>[34567.890900] [b6f01000] *pgd=3287c067, *pmd=3267e067, *pte=00001075, *ppte=00000000<0>[34567.891300] Internal error: Oops: 807 [#1] ARM<4>[34567.891600] Modules linked in:<4>[34567.891900] CPU: 0 Not tainted (3.4.110 #2)<4>[34567.892200] PC is at copy_from_user+0x20/0x80<4>[34567.892500] LR is at test_ioctl+0x40/0x80<4>[34567.892800] pc : [<c03ad200>] lr : [<c02b3400>] psr: 80000013<4>[34567.893100] sp : c2d01e70 ip : 00000000 fp : c2cc6800<4>[34567.893400] r10: c0690bfc r9 : c0690c04 r8 : c3682c68<4>[34567.893700] r7 : c3682c64 r6 : 00000004 r5 : b6f01000 r4 : c3002000<4>[34567.894000] r3 : 00000000 r2 : 00000004 r1 : b6f01000 r0 : c3002000<4>[34567.894300] Flags: Nzcv IRQs on FIQs on Mode SVC_32 ISA ARM Segment kernel<4>[34567.894600] Control: 10c5383d Table: 2287c059 DAC: 00000015...(后续寄存器dump、栈回溯、指令dump、panic流程与前文一致)
核心差异与根因分析:
错误类型标识:FSR值为0x807,对应page permission fault,错误码为SEGV_ACCERR。
show_pte输出核心差异:异常地址0xb6f01000对应的pgd、pmd、*pte均为非0有效值,说明PGD/PMD/PTE三级页表映射均存在,排除转换错误;打印出了有效PTE值,说明是页级映射,与段级错误明确区分。
权限不匹配根因:PTE值0x00001075对应的页描述符,AP位配置为用户态只读、内核态不可写,当前PC指向copy_from_user函数,内核态错误地尝试对用户态只读地址执行写操作(应为copy_to_user,函数使用错误),硬件权限与访问类型不匹配,触发页权限错误。
异常收尾:该场景为非法权限访问,无修复可能,最终进入die流程,触发内核Oops和panic;用户态COW等可修复场景,会在handle_mm_fault中完成修复,不会触发Oops。
Linux内存管理系列文章:
Linux内存管理:内存检测技术(slub_debug/kmemleak/kasan)
原作者:ArnoldLu
原文地址:
https://www.cnblogs.com/arnoldlu/p/8672139.html