Fixed
Created: Dec 4, 2017
Updated: Dec 3, 2018
Resolved Date: Dec 21, 2017
Found In Version: 8.0.0.22
Fix Version: 8.0.0.25
Severity: Standard
Applicable for: Wind River Linux 8
Component/s: Kernel
Seeing this panic on an Intel blade whenever the customer resets the switch blade on an ATCA frame.
general protection fault: 0000 1 PREEMPT SMP
Modules linked in: 8021q xt_DSCP xt_dscp tipc ip6_udp_tunnel udp_tunnel bonding ipmi_devintf ipmi_si ipmi_msghandler
CPU: 16 PID: 91 Comm: ksoftirqd/16 Not tainted 4.1.21-WR0.1_cgl #1
Hardware name: Intel Corporation RoseCity Platform/Romley EP, BIOS 1.3.02 X64 02/20/2013
task: ffff88081b9d0000 ti: ffff88081b9d8000 task.ti: ffff88081b9d8000
RIP: 0010:[<ffffffffa00035d8>] [<ffffffffa00035d8>] handle_new_recv_msgs+0x78/0x1c0 [ipmi_msghandler]
RSP: 0018:ffff88081b9dbd48 EFLAGS: 00010046
RAX: dead000000000200 RBX: ffff8808192ec000 RCX: dead000000000100
RDX: 0000000000005f5f RSI: 000000000000005f RDI: 0000000000000001
RBP: ffff88081b9dbd98 R08: 6572617764726168 R09: 0000000000000846
R10: 736120726f662072 R11: 0000000000000846 R12: 0000000000000000
R13: ffff8808192eccb8 R14: ffff880801779c00 R15: 0000000000000000
FS: 0000000000000000(0000) GS:ffff88081fc00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fa380c523e2 CR3: 0000000001e0c000 CR4: 00000000000406e0
Stack:
ffff8808192eccb4 0000000000000292 ffff8800bb61b108 ffff88081fc15fe8
ffff88081b9d0068 ffff8808192ec000 0000000000000000 ffff8808192eccf8
0000000000000006 0000000000000000 ffff88081b9dbdc8 ffffffffa0003828
Call Trace:
[<ffffffffa0003828>] smi_recv_tasklet+0xa8/0x130 [ipmi_msghandler]
[<ffffffff810ad5df>] tasklet_action+0xef/0x110
[<ffffffff810acb75>] __do_softirq+0xa5/0x310
[<ffffffff810ace01>] run_ksoftirqd+0x21/0x40
[<ffffffff810c9678>] smpboot_thread_fn+0x178/0x260
[<ffffffff810c9500>] ? sort_range+0x30/0x30
[<ffffffff810c6839>] kthread+0xc9/0xe0
[<ffffffff810c0303>] ? try_to_grab_pending+0xe3/0x150
[<ffffffff810c6770>] ? flush_kthread_worker+0x70/0x70
[<ffffffff8197c162>] ret_from_fork+0x42/0x70
[<ffffffff810c6770>] ? flush_kthread_worker+0x70/0x70
Code: 00 00 4d 39 ee 74 64 45 85 e4 0f 84 eb 00 00 00 4c 89 f6 48 89 df e8 e8 f0 ff ff 41 89 c7 41 83 ff 00 7f 47 49 8b 46 08 49 8b 0e <48> 89 41 08 48 89 08 48 b8 00 01 00 00 00 00 ad de 49 89 06 48
RIP [<ffffffffa00035d8>] handle_new_recv_msgs+0x78/0x1c0 [ipmi_msghandler]
RSP <ffff88081b9dbd48>
--[ end trace fe1be1760c26ab5d ]--
Kernel panic - not syncing: Fatal exception in interrupt
Shutting down cpus with NMI
IPMI message received with no owner. This
could be because of a malformed message, or
because of a hardware error. Contact your
hardware vender for assistance
The customer found that this issue is resolved with the following upstream mainline commit:
ae4ea9a2460c7fee2ae8feeb4dfe96f5f6c3e562
Which can be seen here:
https://github.com/torvalds/linux/commit/ae4ea9a2460c7fee2ae8feeb4dfe96f5f6c3e562#diff-690ab47282b4d3de1fd7aa4411160aa1
They are asking for integrating the commit into our next RCPL; they don't think I can reproduce on my side, however, if you look at the mainline kernel, you’ll notice updates in this part of the code have been very rare.
$installDir/configure --enable-addons=wr-cgp --with-layer=cgp --enable-board=intel-x86-64 --enable-kernel=cgl --enable-rootfs=glibc-cgl --enable-bootimage=iso --with-rcpl-version=0022
(from comments. I suspect that the kernel has customer specific patches)