Wind River Support Network

HomeDefectsLIN6-10108
Fixed

LIN6-10108 : PREEMPT_RT kernel on T4xxx board will hang after some time

Created: Jul 7, 2015    Updated: Dec 3, 2018
Resolved Date: Sep 25, 2015
Found In Version: 6.0.0.18
Fix Version: 6.0.0.25
Severity: Severe
Applicable for: Wind River Linux 6
Component/s: Kernel

Description

On T4240 board with PREEMPT_RT kernel project, after running customer's test script for a while(within one hour), they system will beccame abormal:
1, active ssh,serical will still there, and can do input
2, new ssh/telnet will failed to access target anymore
3, run top/ps command  will not show any info on target
4, can ping target sucessfully
5, after some more time, system will hang totally, will not give feedback for any input.

After adding the following config in kernel:
/******************************************/
CONFIG_LOCKUP_DETECTOR
CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC
CONFIG_DETECT_HUNG_TASK
CONFIG_DEBUG_RT_MUTEXES
CONFIG_DEBUG_SPINLOCK
 
CONFIG_PROVE_LOCKING
CONFIG_LOCKDEP
CONFIG_DEBUG_LOCKDEP
CONFIG_DEBUG_LOCK_ALLOC
/******************************************/



When test script begin running, it will show the following log right now:
/*********************************************************/
root@fsl-t4xxx:/media/T4XX_test# sh teststartt4xx.sh 

=============================================
[ INFO: possible recursive locking detected ]
3.10.62-ltsi-rt55-WR6.0.0.18_preempt-rt #20 Not tainted
---------------------------------------------
telnetd/1497 is trying to acquire lock:
 (&ldata->output_lock){+.+...}, at: [<c000000000574c10>] .process_echoes+0x90/0x400

but task is already holding lock:
 (&ldata->output_lock){+.+...}, at: [<c000000000575858>] .n_tty_write+0x3d8/0x570

other info that might help us debug this:
 Possible unsafe locking scenario:

       CPU0
       ----
  lock(&ldata->output_lock);
  lock(&ldata->output_lock);

 *** DEADLOCK ***

 May be due to missing lock nesting notation

2 locks held by telnetd/1497:
 #0:  (&tty->atomic_write_lock){+.+.+.}, at: [<c000000000570d30>] .tty_write_lock+0x30/0xb0
 #1:  (&ldata->output_lock){+.+...}, at: [<c000000000575858>] .n_tty_write+0x3d8/0x570

stack backtrace:
CPU: 0 PID: 1497 Comm: telnetd Not tainted 3.10.62-ltsi-rt55-WR6.0.0.18_preempt-rt #20
Call Trace:
[c0000002f5ea3340] [c00000000000b160] .show_stack+0x170/0x290 (unreliable)
[c0000002f5ea3430] [c000000000a1c51c] .dump_stack+0x28/0x3c
[c0000002f5ea34a0] [c0000000000daadc] .__lock_acquire+0x162c/0x1fc0
[c0000002f5ea3620] [c0000000000dbdf8] .lock_acquire+0xb8/0x1b0
[c0000002f5ea36f0] [c000000000a0f2d0] ._mutex_lock+0x40/0x70
[c0000002f5ea3780] [c000000000574c10] .process_echoes+0x90/0x400
[c0000002f5ea3840] [c0000000005776a4] .n_tty_receive_buf+0x4f4/0x1250
[c0000002f5ea39f0] [c00000000057d32c] .flush_to_ldisc+0x1cc/0x230
[c0000002f5ea3aa0] [c00000000057f478] .pty_write+0xa8/0xc0
[c0000002f5ea3b30] [c000000000575884] .n_tty_write+0x404/0x570
[c0000002f5ea3c30] [c000000000570ed8] .tty_write+0x128/0x2e0
[c0000002f5ea3cf0] [c0000000001c98f8] .vfs_write+0xe8/0x250
[c0000002f5ea3d90] [c0000000001ca524] .SyS_write+0x64/0xe0
[c0000002f5ea3e30] [c000000000000624] syscall_exit+0x0/0x8c
Unable to handle kernel paging request for data at address 0x100000000040
Faulting instruction address: 0xc0000000000e3504
Oops: Kernel access of bad area, sig: 11 [#1]
PREEMPT SMP NR_CPUS=24 CoreNet Generic
Modules linked in: spidev(O+) boardcpld(O) corecpld(O)
CPU: 13 PID: 1525 Comm: insmod Tainted: G           O 3.10.62-ltsi-rt55-WR6.0.0.18_preempt-rt #20
task: c0000002f549bb40 ti: c0000002f3524000 task.ti: c0000002f3524000
NIP: c0000000000e3504 LR: c0000000000e34ac CTR: c000000000a0f290
REGS: c0000002f3527070 TRAP: 0300   Tainted: G           O  (3.10.62-ltsi-rt55-WR6.0.0.18_preempt-rt)
MSR: 0000000080029000 <CE,EE,ME>  CR: 24002284  XER: 20000000
SOFTE: 0
DEAR: 0000100000000040, ESR: 0000000000000000

GPR00: c0000000000e34ac c0000002f35272f0 c000000000f66558 c0000002f549bb40 
GPR04: 8000000000222498 c0000002f3527428 0000000000000078 0000000000000078 
GPR08: 0000000000000001 c0000002f3527460 0000100000000000 0000000000000000 
GPR12: 0000000084002282 c00000000fffa180 8000000000224000 c000000000d22c38 
GPR16: 0000000000000000 0000000000000124 c0000000017be501 c000000000eca9b8 
GPR20: c0000002f7eae130 0000000000000001 8000000000222800 ffffffffffffffed 
GPR24: 0000000000000000 0000000000000001 0000000000000000 c0000002f549c530 
GPR28: c000000000e744d0 8000000000222460 c0000002f549bb40 c0000002f3527420 
NIP [c0000000000e3504] .task_blocks_on_rt_mutex+0xf4/0x300
LR [c0000000000e34ac] .task_blocks_on_rt_mutex+0x9c/0x300
Call Trace:
[c0000002f35272f0] [c0000000000e34ac] .task_blocks_on_rt_mutex+0x9c/0x300 (unreliable)
[c0000002f35273b0] [c000000000a0e8d0] .rt_mutex_slowlock+0xb0/0x230
[c0000002f35274c0] [c000000000a0f2dc] ._mutex_lock+0x4c/0x70
[c0000002f3527550] [8000000000221984] .qoriq_spi_probe+0x104/0x260 [spidev]
[c0000002f3527610] [c0000000006486b4] .spi_drv_probe+0x44/0x90
[c0000002f3527690] [c0000000005b5b90] .really_probe+0xb0/0x2d0
[c0000002f3527730] [c0000000005b5f98] .__driver_attach+0x118/0x120
[c0000002f35277c0] [c0000000005b2de4] .bus_for_each_dev+0x94/0x100
[c0000002f3527860] [c0000000005b5494] .driver_attach+0x34/0x50
[c0000002f35278e0] [c0000000005b4e78] .bus_add_driver+0x288/0x380
[c0000002f3527980] [c0000000005b6a74] .driver_register+0x94/0x170
[c0000002f3527a00] [c000000000648de8] .spi_register_driver+0x78/0x90
[c0000002f3527a80] [8000000000221b74] .qoriq_spi_init+0x94/0x120 [spidev]
[c0000002f3527b10] [c000000000001bf4] .do_one_initcall+0x164/0x1c0
[c0000002f3527bc0] [c0000000000eb234] .load_module+0x1834/0x2370
[c0000002f3527d40] [c0000000000ebf30] .SyS_finit_module+0xa0/0xd0
[c0000002f3527e30] [c000000000000624] syscall_exit+0x0/0x8c
Instruction dump:
f8ff0018 90df0000 f8ff0020 80fe0038 f95f0030 f95f0038 f93f0040 90ff0028 
f93f0048 e95d0038 7fa45040 419e0190 <e92a0040> 7fa94a78 7d290074 7929d182 
/*******************************************************/

Steps to Reproduce

config project with:
configure --enable-board=fsl-t4xxx --enable-build=production --enable-kernel=preempt-rt --enable-rootfs=glibc_std --enable-jobs=4 --enable-parallel-pkgbuilds=4 --enable-reconfig --with-rcpl-version=0018

after target startup, customer will do their device test in a script, it  including:
1, spi
2, i2c
3,gpio
4, localbus
5, rtc
6, flash
etc, 
the test script have been put in attachment

I also put customer's kernel config in attachment too.

Other Downloads


Live chat
Online