Fixed
Created: Dec 15, 2016
Updated: Dec 3, 2018
Resolved Date: Jun 19, 2017
Found In Version: 6.0.0.23
Fix Version: 6.0.0.34
Severity: Standard
Applicable for: Wind River Linux 6
Component/s: Kernel
Architecture: IA64
While using take_down_cpu() on the non-boot CPU cores in order to enter S3 sleep state the above BUG is sometimes triggered. It may happen on any of the non-boot CPUs. I have never seen it happen on more that one CPU during the same take-down sequence. Here is a kernel log extract including the full BUG message plus a little context before and after.
Nov 1 05:30:20 mariner kernel: [299112.963325] PM: suspend of devices complete after 1311.321 msecs
Nov 1 05:30:20 mariner kernel: [299112.963458] turn_disk_power_off() -- port# = 1
Nov 1 05:30:20 mariner kernel: [299112.963462] turn_disk_power_off() -- port# = 0
Nov 1 05:30:20 mariner kernel: [299112.963639] PM: late suspend of devices complete after 0.310 msecs
Nov 1 05:30:20 mariner kernel: [299112.974962] dwc3-pci 0000:00:16.0: power state changed by ACPI to D3hot
Nov 1 05:30:20 mariner kernel: [299112.975038] xhci_hcd 0000:00:14.0: System wakeup enabled by ACPI
Nov 1 05:30:20 mariner kernel: [299112.996911] PM: noirq suspend of devices complete after 33.284 msecs
Nov 1 05:30:20 mariner kernel: [299112.996957] ACPI: Preparing to enter system sleep state S3
Nov 1 05:30:20 mariner kernel: [299112.997559] PM: Saving platform NVS memory
Nov 1 05:30:20 mariner kernel: [299113.002254] Disabling non-boot CPUs ...
Nov 1 05:30:20 mariner kernel: [299113.004011] smpboot: CPU 1 is now offline
Nov 1 05:30:20 mariner kernel: [299113.006270] BUG: sleeping function called from invalid context at kernel/rtmutex.c:797
Nov 1 05:30:20 mariner kernel: [299113.006272] in_atomic(): 1, irqs_disabled(): 1, pid: 37, name: migration/2
Nov 1 05:30:20 mariner kernel: [299113.006281] Preemption disabled at:[<ffffffff8106a9ac>] smpboot_thread_fn+0x1ec/0x360
Nov 1 05:30:20 mariner kernel: [299113.006282]
Nov 1 05:30:20 mariner kernel: [299113.006285] CPU: 2 PID: 37 Comm: migration/2 Tainted: P O 3.10.62 #1
Nov 1 05:30:20 mariner kernel: [299113.006287] Hardware name: Insyde explorer60 X64/6, BIOS explorer60.13.13 10/24/2016
Nov 1 05:30:20 mariner kernel: [299113.006292] ffffea0005c61a80 ffff88017a22ba90 ffffffff8163d782 ffff88017a22baa8
Nov 1 05:30:20 mariner kernel: [299113.006295] ffffffff8106e5ff ffff88017fd0d280 ffff88017a22bac0 ffffffff816420a0
Nov 1 05:30:20 mariner kernel: [299113.006299] ffff88017fd0d280 ffff88017a22bb18 ffffffff81103420 ffff88017a000c00
Nov 1 05:30:20 mariner kernel: [299113.006302] ffff88017a22bfd8 000000000000da60 ffff88014cb5db00 ffffea0005c61a80
Nov 1 05:30:20 mariner kernel: [299113.006305] 0000000000000001 0000000000000002 00000000fffffffe ffff88017937e0c0
Nov 1 05:30:20 mariner kernel: [299113.006308] ffff88017a22bb38 ffffffff81104bb4 ffffea0005c61a80 0000000000000001
Nov 1 05:30:20 mariner kernel: [299113.006312] ffff88017a22bb48 ffffffff81104e3e ffff88017a22bb90 ffffffff8113785a
Nov 1 05:30:20 mariner kernel: [299113.006315] 000000010010000e 0000000000000000 ffff88017a22bbf0 ffffea0001dc6d00
Nov 1 05:30:20 mariner kernel: [299113.006318] 000000000000da60 0000000000000097 ffff88017a001900 ffff88017a22bba8
Nov 1 05:30:20 mariner kernel: [299113.006321] ffffffff81137996 ffff88017a22bbf0 ffff88017a22bc68 ffffffff8163b038
Nov 1 05:30:20 mariner kernel: [299113.006324] ffff88017a22bfd8 ffff88017a22bfd8 00000000000000ff 0000000000000006
Nov 1 05:30:20 mariner kernel: [299113.006328] 0000000000000006 ffff88017fd0da60 ffffffff816424ae ffff88017a22bbf0
Nov 1 05:30:20 mariner kernel: [299113.006331] ffff88017a22bbf0 ffffffff81027e5a 00000000ffffffea ffff880176032240
Nov 1 05:30:20 mariner kernel: [299113.006334] 000000018010000f ffff880176032200 ffff88017a22bc68 ffffffff810287c0
Nov 1 05:30:20 mariner kernel: [299113.006337] ffff88017a22bfd8 ffff8800771b4400 ffff88017fd14950 ffff88017a22bfd8
Nov 1 05:30:20 mariner kernel: [299113.006341] ffff88017a001900 ffffea0001dc6d00 ffff88017a22bcb0 ffffffff811396a1
Nov 1 05:30:20 mariner kernel: [299113.006344] ffffffff810172ad 000000000cd4c70a ffff88017fd09a60 0000000000000002
Nov 1 05:30:20 mariner kernel: [299113.006347] 0000000000000018 0000000000000002 0000000000000000 ffff88017a22bcd0
Nov 1 05:30:20 mariner kernel: [299113.006350] ffffffff810172ad 00000000fffffffa ffffffff81c2f050 ffff88017a22bce0
Nov 1 05:30:20 mariner kernel: [299113.006353] ffffffff81631ea9 ffff88017a22bd18 ffffffff810681fe ffff8801722e5d70
Nov 1 05:30:20 mariner kernel: [299113.006356] 0000000000000003 ffff88017a22bf01 0000000000000202 ffffffff810ab470
Nov 1 05:30:20 mariner kernel: [299113.006360] ffff88017a22bd28 ffffffff810682be ffff88017a22bd38 ffffffff8103e593
Nov 1 05:30:20 mariner kernel: [299113.006363] ffff88017a22bd50 ffffffff8162c017 ffff8801722e5ce0 ffff88017a22bd80
Nov 1 05:30:20 mariner kernel: [299113.006366] ffffffff810ab50e ffff88017fd0ba60 ffff8801722e5c90 ffff88017a22bfd8
Nov 1 05:30:20 mariner kernel: [299113.006369] ffff8801722e5ce0 ffff88017a22be48 ffffffff810ab891 ffff88017a22bfd8
Nov 1 05:30:20 mariner kernel: [299113.006373] ffff88017fd0ba80 ffffffff810712bd 0000000000000001 ffff88017a22bed8
Nov 1 05:30:20 mariner kernel: [299113.006376] 0000000300000001 ffff88017a22bed8 0000000000000286 ffff88017a1a36b0
Nov 1 05:30:20 mariner kernel: [299113.006379] ffff88017a1a36b0 ffff88017a22bed0 0000000000000002 ffff88017fd0ba60
Nov 1 05:30:20 mariner kernel: [299113.006382] 0000000000000286 ffffffff81c371a0 0000000000000286 ffff88017a22be28
Nov 1 05:30:20 mariner kernel: [299113.006385] ffffffff816424ae ffff88017a1a36b0 ffff88017a002460 ffffffff81c371a0
Nov 1 05:30:20 mariner kernel: [299113.006388] 0000000000000002 ffff88017a22bfd8 ffff88017a22bea8 ffffffff8106a9ac
Nov 1 05:30:20 mariner kernel: [299113.006392] ffff88017a22bfd8 ffff88017a1a36b0 0000000000000000 ffffffff81640eb0
Nov 1 05:30:20 mariner kernel: [299113.006395] 0000000000000001 ffff88017a121cd8 ffff88017a002460 ffffffff8106a7c0
Nov 1 05:30:20 mariner kernel: [299113.006398] 0000000000000000 0000000000000000 ffff88017a22bf48 ffffffff81062156
Nov 1 05:30:20 mariner kernel: [299113.006401] 0000000000000001 ffff880100000002 ffff88017a002460 000000000000024c
Nov 1 05:30:20 mariner kernel: [299113.006404] dead4ead00004f4f ffff8801ffffffff ffffffffffffffff ffff88017a22bef0
Nov 1 05:30:20 mariner kernel: [299113.006408] ffff88017a22bef0 ffffffff00000000 dead4ead7fc10000 ffff8801ffffffff
Nov 1 05:30:20 mariner kernel: [299113.006411] ffffffffffffffff ffff88017a22bf20 ffff88017a22bf20 ffffffff81062090
Nov 1 05:30:20 mariner kernel: [299113.006414] 0000000000000000 0000000000000000 ffff88017a121cd8 ffffffff816432b8
Nov 1 05:30:20 mariner kernel: [299113.006417] 0000000000000000 0000000000000000 0000000000000000 0000000000000000
Nov 1 05:30:20 mariner kernel: [299113.006420] ffff88017a121cd8 ffffffff81062090 0000000000000000 0000000000000000
Nov 1 05:30:20 mariner kernel: [299113.006423] 0000000000000000 0000000000000000 0000000000000000 0000000000000000
Nov 1 05:30:20 mariner kernel: [299113.006426] 0000000000000000 0000000000000000 0000000000000000 ffffffffffffffff
Nov 1 05:30:20 mariner kernel: [299113.006429] 0000000000000000 0000000000000010 0000000000000202 ffff88017a22bf58
Nov 1 05:30:20 mariner kernel: [299113.006430] 0000000000000018
Nov 1 05:30:20 mariner kernel: [299113.006431] Call Trace:
Nov 1 05:30:20 mariner kernel: [299113.006438] [<ffffffff8163d782>] dump_stack+0x19/0x1b
Nov 1 05:30:20 mariner kernel: [299113.006441] [<ffffffff8106e5ff>] __might_sleep+0xef/0x170
Nov 1 05:30:20 mariner kernel: [299113.006445] [<ffffffff816420a0>] rt_spin_lock+0x20/0x50
Nov 1 05:30:20 mariner kernel: [299113.006449] [<ffffffff81103420>] __free_pages_ok.part.66+0x60/0x4e0
Nov 1 05:30:20 mariner kernel: [299113.006452] [<ffffffff81104bb4>] __free_pages+0x54/0x60
Nov 1 05:30:20 mariner kernel: [299113.006455] [<ffffffff81104e3e>] __free_memcg_kmem_pages+0xe/0x10
Nov 1 05:30:20 mariner kernel: [299113.006459] [<ffffffff8113785a>] __free_slab+0xba/0x1a0
Nov 1 05:30:20 mariner kernel: [299113.006463] [<ffffffff81137996>] free_delayed+0x56/0x70
Nov 1 05:30:20 mariner kernel: [299113.006466] [<ffffffff8163b038>] __slab_free+0x39d/0x554
Nov 1 05:30:20 mariner kernel: [299113.006469] [<ffffffff816424ae>] ? _raw_spin_unlock_irqrestore+0x1e/0x50
Nov 1 05:30:20 mariner kernel: [299113.006473] [<ffffffff81027e5a>] ? assign_irq_vector+0x4a/0x60
Nov 1 05:30:20 mariner kernel: [299113.006477] [<ffffffff810287c0>] ? __ioapic_set_affinity+0x80/0xd0
Nov 1 05:30:20 mariner kernel: [299113.006480] [<ffffffff811396a1>] kfree+0x1f1/0x210
Nov 1 05:30:20 mariner kernel: [299113.006484] [<ffffffff810172ad>] ? intel_pmu_cpu_dying+0x6d/0x80
Nov 1 05:30:20 mariner kernel: [299113.006487] [<ffffffff810172ad>] intel_pmu_cpu_dying+0x6d/0x80
Nov 1 05:30:20 mariner kernel: [299113.006491] [<ffffffff81631ea9>] x86_pmu_notifier+0xbb/0xc9
Nov 1 05:30:20 mariner kernel: [299113.006494] [<ffffffff810681fe>] notifier_call_chain+0x4e/0x70
Nov 1 05:30:20 mariner kernel: [299113.006498] [<ffffffff810ab470>] ? cpu_stop_create+0x30/0x30
Nov 1 05:30:20 mariner kernel: [299113.006501] [<ffffffff810682be>] __raw_notifier_call_chain+0xe/0x10
Nov 1 05:30:20 mariner kernel: [299113.006505] [<ffffffff8103e593>] cpu_notify+0x23/0x40
Nov 1 05:30:20 mariner kernel: [299113.006508] [<ffffffff8162c017>] take_cpu_down+0x27/0x40
Nov 1 05:30:20 mariner kernel: [299113.006511] [<ffffffff810ab50e>] stop_machine_cpu_stop+0x9e/0xc0
Nov 1 05:30:20 mariner kernel: [299113.006514] [<ffffffff810ab891>] cpu_stopper_thread+0xb1/0x190
Nov 1 05:30:20 mariner kernel: [299113.006518] [<ffffffff810712bd>] ? get_parent_ip+0xd/0x50
Nov 1 05:30:20 mariner kernel: [299113.006522] [<ffffffff816424ae>] ? _raw_spin_unlock_irqrestore+0x1e/0x50
Nov 1 05:30:20 mariner kernel: [299113.006526] [<ffffffff8106a9ac>] smpboot_thread_fn+0x1ec/0x360
Nov 1 05:30:20 mariner kernel: [299113.006529] [<ffffffff81640eb0>] ? schedule+0x30/0x90
Nov 1 05:30:20 mariner kernel: [299113.006532] [<ffffffff8106a7c0>] ? lg_local_unlock+0x30/0x30
Nov 1 05:30:20 mariner kernel: [299113.006535] [<ffffffff81062156>] kthread+0xc6/0xd0
Nov 1 05:30:20 mariner kernel: [299113.006539] [<ffffffff81062090>] ? kthread_worker_fn+0x1b0/0x1b0
Nov 1 05:30:20 mariner kernel: [299113.006543] [<ffffffff816432b8>] ret_from_fork+0x58/0x90
Nov 1 05:30:20 mariner kernel: [299113.006546] [<ffffffff81062090>] ? kthread_worker_fn+0x1b0/0x1b0
Nov 1 05:30:20 mariner kernel: [299113.006594] smpboot: CPU 2 is now offline
Nov 1 05:30:20 mariner kernel: [299113.008750] smpboot: CPU 3 is now offline
Nov 1 05:30:20 mariner kernel: [299113.009609] ACPI -- Sleep NOW
I have searched the current list of Defects for Linux 6 https://knowledge.windriver.com/en-us/000_Products/000/010/010/000_Defects_for_Linux_6 but not found this issue.
Our kernel is based on WR 6.0.0.23 with our own modifications, and built with PREEMPT_RT enabled.
To duplicate the problem with a non-customised WR kernel configured and running on a well-known supported reference board would take a lot of work, and I don't think it is necessary. All that's needed is simply to investigate the information given. A WR person who is familiar with the PREEMPT_RT patch could look at the execution path through our source code (already provided to WR) from the point where pre-emption was disabled to where we ended in a function that “might sleep”, and spot the mistake / mis-implementation. To assist resoIve this issue, I have just uploaded a document that maps the execution path, plus another fully symbolic disassembly listing of a slightly different kernel which helps greatly in tracing the execution path
This file maps the execution path in the source code (also uploaded) from just before the point where pre-emption was disabled up to the BUG
Compressed fully symbolic kernel disassembly listing of an unstripped rebuild of the kernel, where the absolute addresses of functions are different, but the source file names and line numbers are correct, as are the relative address offsets within each function.
'We don’t use the WR “project” environment. At some point in history we created a project for the WR 6 source, and copied the kernel into our own source tree, customized it and merged in WR patches up to RCPL 23. We use our own build scripts.
I’ve uploaded the exact bzImage that was booted, and a disassembly of it. I hope this will give you the information you need.'