Discussion:
[AT91RM9200] further run-time problems (jffs2, Oops in __update_rq_clock, IPSec)
Guennadi Liakhovetski
2007-11-12 14:45:04 UTC
Permalink
Hi,

now that our at91rm9200-based system boots 2.6.23 and runs in principle,
we're getting further bad problems:

1. jffs2. After a few reboots we get lots of

JFFS2 notice: (708) jffs2_get_inode_nodes: Node header CRC failed at ...

they do not come under 2.6.11. Looks like under 2.6.11 this case is just
not tested, but a few first reboots under 2.6.23 come also clean through,
so, it is something, that happens later.

Using physmap:

physmap platform flash device: 01000000 at 10000000
physmap-flash.0: Found 1 x16 devices at 0x0 in 16-bit bank
Amd/Fujitsu Extended Query Table at 0x0040
physmap-flash.0: CFI does not contain boot bank location. Assuming top.
number of CFI chips: 1
cfi_cmdset_0002: Disabling erase-suspend-program due to code brokenness.


2. There has been an Oops once in vi...

Unable to handle kernel paging request at virtual address e5dcc3ec
pgd = c135c000
[e5dcc3ec] *pgd=00000000
Internal error: Oops: 5 [#1]
Modules linked in:
CPU: 0 Not tainted (2.6.23.1-ga63c3b88-dirty #52)
PC is at __update_rq_clock+0x4c/0x140
LR is at __update_rq_clock+0x28/0x140
pc : [<c0033e38>] lr : [<c0033e14>] psr: 60000093
sp : c1517b08 ip : e5dcc010 fp : c0117b3c
r10: c025125c r9 : 00000001 r8 : 00000000
r7 : 00000000 r6 : c1cc9720 r5 : e5dcc3ec r4 : e2be3800
r3 : 00989680 r2 : ffffd430 r1 : 00989665 r0 : e2be3800
Flags: nZCv IRQs off nt user
Control: c000717f Table: 2135c000 DAC: 00000015
Process vi (pid: 2017, stack limit = 0xc1516258)
Stack: (0xc1517b08 to 0xc1518000)
7b00: c0314974 c1517b18 c0046378 ffffd43a c1cc9860 c1cc9720
7b20: c02f4554 c1516000 00000001 c025125c c1517b64 c1517b40 c0250a74 c0033dfc
7b40: ffffd43a 0000000a c1517b68 c02f4554 00000000 c18e4ae0 c1517b9c c1517b68
7b60: c025125c c02509ec c0319ad8 c0319ad8 ffffd43a c004238c c1cc9720 c0319900
7b80: c1517f74 00000000 00000000 c1517f54 c1517e50 c1517ba0 c00888d4 c02511d0
7ba0: c1517f74 00000000 c1517e64 c1517e68 c1517e6c c1517e58 c1517e5c c1517e60
7bc0: c1516000 00000001 00000000 00000000 00000001 00000000 00000000 00000000
7be0: 00000001 c00883d4 00000000 00000000 00000002 c18e4ae0 00000000 c1cc9720
7c00: c0036814 c1cec528 c1cec528 c1cec528 c18e4ae0 00000000 c1cc9720 c0036814
7c20: c1cec520 c1cec520 c1cec520 c1587b54 40007000 00001000 c1517c94 c1517c48
7c40: c0069688 c002a89c 00000000 01312d00 ffffffff 00000000 c039c070 c03149c0
7c60: 00000000 c1517c98 00000000 c0314990 0131795e 01ca360d 000f4240 c1517cc0
7c80: c1517c8c c00358a0 c00342b4 c03149c0 01954dbe 00000000 c1cc9720 c0314974
7ca0: c039c040 00000000 00000000 00000000 00000003 c1517cd4 c02f8024 20000013
7cc0: c02f8018 00000000 c1517d1c c1517cd8 c00603dc c00601ac 00000044 00000044
7ce0: 000a00d2 00000000 00000001 00000000 c02f83f8 c02f83f8 c1375b10 000a00d2
7d00: 00000000 c02f83f4 00000017 c1517eb4 00000001 0000001a 00000014 c03938d4
7d20: c1517d58 c1517d30 c008e054 c0143158 c1517f08 00000107 c181ad70 41ed0000
7d40: c1517d70 c1517d50 c0036798 c0035208 80000093 00000001 c1cec520 00000000
7d60: c1cec520 c1517d80 c1517d74 c0036828 c0036784 c1517dac c1517d84 c0034768
7d80: c0036824 60000013 c0319728 0000000a 00000000 c1cec00a c1cec400 0000000a
7da0: c1517dc4 c1517db0 c00347d0 c0034740 00000000 c1cec400 c1517ddc c1517dc8
7dc0: c014f7b0 c00347b4 00000000 c0319728 c1517dec c1517de0 c0163644 c014f768
7de0: c1517e00 c1517df0 c003e7f8 c016363c 00000000 c1517e1c c1517e04 c003eba8
7e00: c003e928 c02f558c 60000013 c0319728 0000000a 00000000 c0024f44 00000002
7e20: 401cd394 00000000 00000004 00000004 400b1040 00000000 00000000 00000001
7e40: c1517e54 c1517fa4 c1517e54 c0088c24 c00884f4 00000001 00000000 00000000
7e60: 00000000 00000000 00000000 c003e7f8 c016363c 00000000 c1517ea0 c1517e88
7e80: c003eba8 c003e928 c02f558c 00000008 c0329a04 c1517eb0 c1517ea4 c003ec00
7ea0: c003eb00 c1517ed0 c1517eb4 c0024054 c003ebcc ffffffff fefff000 00000008
7ec0: 0000000a c1517f58 c1517ed4 c0024a24 c0024010 c19d6a84 00000002 00000000
7ee0: 00000000 c19d698c 00000002 c19d698c 0000000a c0024f44 00000002 401cd394
7f00: c1517f58 c19d6a84 c1517f1c c007be70 c00a5b58 60000013 ffffffff c0153b44
7f20: 0000000a 00000000 00000000 c19d6a84 c18dc0b8 00000002 c19d698c 0000000a
7f40: c0024f44 c1516000 401cd394 c1517f7c c1517f5c c1517e54 c1517e58 c1517e5c
7f60: c1517e60 c1517e64 c1517e68 00000000 000186a0 00000000 00000000 400b10c0
7f80: 400b1040 00000000 0000008e c0024f44 c1516000 400a7928 00000000 c1517fa8
7fa0: c0024da0 c008895c 400b10c0 400b1040 00000001 400b1040 00000000 00000000
7fc0: 400b10c0 400b1040 00000000 beb6ad90 beb6ad28 00000000 400a7928 0000000c
7fe0: 400a7b80 beb6acc4 40032928 401bccf8 20000010 00000001 00000000 00000000
Backtrace: frame pointer underflow
Backtrace aborted due to bad frame pointer <c0117b3c>
Code: e0c88005 e51bc034 e3580000 e28c5ff7 (e8950060)

__update_rq_clock:
@ args = 0, pretend = 0, frame = 12
@ frame_needed = 1, uses_anonymous_args = 0
mov ip, sp @,
stmfd sp!, {r4, r5, r6, r7, r8, r9, sl, fp, ip, lr, pc} @,
sub fp, ip, #4 @,,
sub sp, sp, #12 @,,
ldr r3, .L33+8 @ tmp108,
add r4, r0, #996 @ prev_raw, rq,
ldmia r4, {r4-r5} @ prev_raw
str r0, [fp, #-52] @ tmp12,
mov lr, pc
bx r3 @ tmp108
str r0, [fp, #-48] @, now
str r1, [fp, #-44] @, now
mov r8, r1 @ delta,
mov r7, r0 @ delta,
subs r7, r7, r4 @ delta, delta, prev_raw
sbc r8, r8, r5 @ delta, delta, prev_raw
ldr ip, [fp, #-52] @,
cmp r8, #0 @ delta,
add r5, ip, #988 @ clock.432, rq,
ldmia r5, {r5-r6} @ clock.432

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

bge .L19 @,
ldr r3, [ip, #1012] @ <variable>.clock_warps, <variable>.clock_warps mov sl, #0 @,
mov r9, #1 @ clock,
adds r9, r9, r5 @ clock, clock, clock.432
add r3, r3, #1 @ tmp111, <variable>.clock_warps,
adc sl, sl, r6 @ clock, clock, clock.432
str r3, [ip, #1012] @ tmp111, <variable>.clock_warps
b .L22 @

which is this:

u64 prev_raw = rq->prev_clock_raw;
u64 now = sched_clock();
s64 delta = now - prev_raw;
u64 clock = rq->clock;
^^^^^^^^^^^^^^^^^^^^^

Interestingly, a few instructions above rq has already been referenced,
so, this is very strange. No preemption.

3. IPSec doesn't work. This might or might not be related. First I would
consider it a separate problem.

The platform is otherwise absolutely non-spectacular. A standard board-*.c
file with just serial and flash devices. The only board-specific call is:

at91rm9200_initialize(18432000, AT91RM9200_BGA);

which is nothing extraordinary either. Any ideas?

Thanks
Guennadi
---
Guennadi Liakhovetski, Ph.D.

DENX Software Engineering GmbH, MD: Wolfgang Denk & Detlev Zundel
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: +49-8142-66989-0 Fax: +49-8142-66989-80 Email: ***@denx.de

-------------------------------------------------------------------
List admin: http://lists.arm.linux.org.uk/mailman/listinfo/linux-arm-kernel
FAQ: http://www.arm.linux.org.uk/mailinglists/faq.php
Etiquette: http://www.arm.linux.org.uk/mailinglists/etiquette.php
Russell King - ARM Linux
2007-11-12 15:19:57 UTC
Permalink
Post by Guennadi Liakhovetski
2. There has been an Oops once in vi...
Unable to handle kernel paging request at virtual address e5dcc3ec
pgd = c135c000
[e5dcc3ec] *pgd=00000000
Internal error: Oops: 5 [#1]
CPU: 0 Not tainted (2.6.23.1-ga63c3b88-dirty #52)
PC is at __update_rq_clock+0x4c/0x140
LR is at __update_rq_clock+0x28/0x140
pc : [<c0033e38>] lr : [<c0033e14>] psr: 60000093
sp : c1517b08 ip : e5dcc010 fp : c0117b3c
r10: c025125c r9 : 00000001 r8 : 00000000
r7 : 00000000 r6 : c1cc9720 r5 : e5dcc3ec r4 : e2be3800
r3 : 00989680 r2 : ffffd430 r1 : 00989665 r0 : e2be3800
Flags: nZCv IRQs off nt user
Control: c000717f Table: 2135c000 DAC: 00000015
Process vi (pid: 2017, stack limit = 0xc1516258)
Stack: (0xc1517b08 to 0xc1518000)
Backtrace: frame pointer underflow
^^^^^^^^^^^^^^ hint.
Post by Guennadi Liakhovetski
Backtrace aborted due to bad frame pointer <c0117b3c>
Code: e0c88005 e51bc034 e3580000 e28c5ff7 (e8950060)
@ args = 0, pretend = 0, frame = 12
@ frame_needed = 1, uses_anonymous_args = 0
mov lr, pc
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Notice where 'r5' comes from - 'ip', which comes from '[fp, #-52]'.
Now read through from the start of the function and see what value 'fp'
is supposed to have.

sp at the ldmia instruction is currently at 0xc1517b08. Add 12.
There's 11 registers pushed into the stack, so add 44 bytes. This
is the value of 'ip' and 'sp' after the first instruction. This
gives a value of 0xc1517b40.

To confirm this, here's the state which the stmfd instruction saved
onto the stack:

7b00: c0314974 c1517b18 c0046378 ffffd43a c1cc9860 c1cc9720
A B C r4 r5 r6
7b20: c02f4554 c1516000 00000001 c025125c c1517b64 c1517b40 c0250a74 c0033dfc
r7 r8 r9 sl fp ip lr pc

That means 'fp' after the first 'sub' instruction should be 0xc1517b3c.
However, it is actually 0xc0117b3c. Note that these two values look
very similar. The difference is only 0x01400000. Two bit errors in
SDRAM?

The other thing to consider is sched_clock() - the 'bx' instruction
you've marked above is calling that function. Is it messing up the
frame pointer?

Also note that r8, r1 and r5 values. The code immediately before which
and the state here after this cmp instruction is: r8 = 0, r5 = e5dcc3ec
r1 = 00989665. In my mind, 00989665 - e5dcc3ec is certainly not zero.

Finally, go back to the stack dump above and look at the values marked
A, B and C. A = [fp, #-52], B = [fp, #-48] (aka r0) C = [fp, #-44]
(aka r1). We can't really tell about 'A', but we can talk about 'B'
and 'C' since nothing's changed the values in those registers between
when they were stored and the point we oopsed. The conclusion I come
to is that r0 and r1 were not stored in these locations, so in all
probability, 'fp' had already been corrupted by the "str r0, [fp, #-48]"
instruction.

-------------------------------------------------------------------
List admin: http://lists.arm.linux.org.uk/mailman/listinfo/linux-arm-kernel
FAQ: http://www.arm.linux.org.uk/mailinglists/faq.php
Etiquette: http://www.arm.linux.org.uk/mailinglists/etiquette.php
Loading...