Fixed
Created: Jun 13, 2016
Updated: Feb 11, 2019
Resolved Date: Nov 28, 2016
Found In Version: 6.0.0.20
Fix Version: 6.0.0.32
Severity: Standard
Applicable for: Wind River Linux 6
Component/s: Kernel
Customer report a problem with OVP 6.
They found that over a time qemu threads keep increasing and each new thread get stuck at the same futex call. While during this issue observed, qemu and guest functionality seems working fine.
Customer want to know if this is a kernel defect in futex operation.
Customer observed Qemu threads stuck at futex_wait_queue_me, see the following log:
----------------------------------------------
root@sys-2816a-node:~# top -H -p 3345
top - 15:59:25 up 9:03, 3 users, load average: 6.75, 6.89, 6.91
Tasks: 28 total, 6 running, 22 sleeping, 0 stopped, 0 zombie
Cpu(s): 75.4%us, 2.3%sy, 0.0%ni, 14.2%id, 7.9%wa, 0.0%hi, 0.1%si, 0.0%st
Mem: 65383652k total, 52750712k used, 12632940k free, 89612k buffers
Swap: 0k total, 0k used, 0k free, 105316k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
3392 root 20 0 51.7g 48g 26m R 100 77.5 539:22.42 qemu-system-x86
3393 root 20 0 51.7g 48g 26m R 100 77.5 541:11.79 qemu-system-x86
3394 root 20 0 51.7g 48g 26m R 100 77.5 541:22.02 qemu-system-x86
3395 root 20 0 51.7g 48g 26m R 100 77.5 540:43.50 qemu-system-x86
3397 root 20 0 51.7g 48g 26m R 100 77.5 541:31.21 qemu-system-x86
3396 root 20 0 51.7g 48g 26m R 100 77.5 540:03.27 qemu-system-x86
3345 root 20 0 51.7g 48g 26m S 8 77.5 29:24.87 qemu-system-x86
825 root 20 0 51.7g 48g 26m S 0 77.5 0:13.89 qemu-system-x86
4956 root 20 0 51.7g 48g 26m S 0 77.5 0:13.90 qemu-system-x86
20911 root 20 0 51.7g 48g 26m S 0 77.5 0:13.56 qemu-system-x86
26942 root 20 0 51.7g 48g 26m S 0 77.5 0:13.39 qemu-system-x86
29328 root 20 0 51.7g 48g 26m S 0 77.5 0:13.32 qemu-system-x86
30833 root 20 0 51.7g 48g 26m S 0 77.5 0:13.29 qemu-system-x86
28831 root 20 0 51.7g 48g 26m S 0 77.5 0:13.90 qemu-system-x86
30439 root 20 0 51.7g 48g 26m S 0 77.5 0:13.88 qemu-system-x86
31841 root 20 0 51.7g 48g 26m S 0 77.5 0:13.89 qemu-system-x86
4954 root 20 0 51.7g 48g 26m S 0 77.5 0:13.89 qemu-system-x86
17895 root 20 0 51.7g 48g 26m S 0 77.5 0:13.65 qemu-system-x86
root@sys-2816a-node:~# cat /proc/29326/stack
[<ffffffff810a49de>] futex_wait_queue_me+0xce/0x100
[<ffffffff810a585b>] futex_wait+0x17b/0x270
[<ffffffff810a7275>] do_futex+0xe5/0xb60
[<ffffffff810a7d61>] SyS_futex+0x71/0x160
[<ffffffff8196c5a6>] system_call_fastpath+0x1a/0x1f
[<ffffffffffffffff>] 0xffffffffffffffff
root@sys-2816a-node:~# cat /proc/20911/stack
[<ffffffff810a49de>] futex_wait_queue_me+0xce/0x100
[<ffffffff810a585b>] futex_wait+0x17b/0x270
[<ffffffff810a7275>] do_futex+0xe5/0xb60
[<ffffffff810a7d61>] SyS_futex+0x71/0x160
[<ffffffff8196c5a6>] system_call_fastpath+0x1a/0x1f
[<ffffffffffffffff>] 0xffffffffffffffff
root@sys-2816a-node:~# cat /proc/26942/stack
[<ffffffff810a49de>] futex_wait_queue_me+0xce/0x100
[<ffffffff810a585b>] futex_wait+0x17b/0x270
[<ffffffff810a7275>] do_futex+0xe5/0xb60
[<ffffffff810a7d61>] SyS_futex+0x71/0x160
[<ffffffff8196c5a6>] system_call_fastpath+0x1a/0x1f
[<ffffffffffffffff>] 0xffffffffffffffff
root@sys-2816a-node:~# cat /proc/31841/stack
[<ffffffff810a49de>] futex_wait_queue_me+0xce/0x100
[<ffffffff810a585b>] futex_wait+0x17b/0x270
[<ffffffff810a7275>] do_futex+0xe5/0xb60
[<ffffffff810a7d61>] SyS_futex+0x71/0x160
[<ffffffff8196c5a6>] system_call_fastpath+0x1a/0x1f
[<ffffffffffffffff>] 0xffffffffffffffff
root@sys-2816a-node:~# cat /proc/17895/stack
[<ffffffff810a49de>] futex_wait_queue_me+0xce/0x100
[<ffffffff810a585b>] futex_wait+0x17b/0x270
[<ffffffff810a7275>] do_futex+0xe5/0xb60
[<ffffffff810a7d61>] SyS_futex+0x71/0x160
[<ffffffff8196c5a6>] system_call_fastpath+0x1a/0x1f
[<ffffffffffffffff>] 0xffffffffffffffff
root@sys-2816a-node:~# cat /proc/4954/stack
[<ffffffff810a49de>] futex_wait_queue_me+0xce/0x100
[<ffffffff810a585b>] futex_wait+0x17b/0x270
[<ffffffff810a7275>] do_futex+0xe5/0xb60
[<ffffffff810a7d61>] SyS_futex+0x71/0x160
[<ffffffff8196c5a6>] system_call_fastpath+0x1a/0x1f
[<ffffffffffffffff>] 0xffffffffffffffff
root@sys-2816a-node:~# cat /proc/4956/stack
[<ffffffff810a49de>] futex_wait_queue_me+0xce/0x100
[<ffffffff810a585b>] futex_wait+0x17b/0x270
[<ffffffff810a7275>] do_futex+0xe5/0xb60
[<ffffffff810a7d61>] SyS_futex+0x71/0x160
[<ffffffff8196c5a6>] system_call_fastpath+0x1a/0x1f
[<ffffffffffffffff>] 0xffffffffffffffff
root@sys-2816a-node:~# cat /proc/825/stack
[<ffffffff810a49de>] futex_wait_queue_me+0xce/0x100
[<ffffffff810a585b>] futex_wait+0x17b/0x270
[<ffffffff810a7275>] do_futex+0xe5/0xb60
[<ffffffff810a7d61>] SyS_futex+0x71/0x160
[<ffffffff8196c5a6>] system_call_fastpath+0x1a/0x1f
[<ffffffffffffffff>] 0xffffffffffffffff
root@sys-2816a-node:~# cat /proc/3345/stack
[<ffffffff81183629>] poll_schedule_timeout+0x49/0x70
[<ffffffff81184be8>] do_sys_poll+0x3f8/0x4b0
[<ffffffff81184fd8>] SyS_ppoll+0x1c8/0x1e0
[<ffffffff8196c5a6>] system_call_fastpath+0x1a/0x1f
[<ffffffffffffffff>] 0xffffffffffffffff
root@sys-2816a-node:~# cat /proc/3396/stack
[<ffffffffffffffff>] 0xffffffffffffffff
root@sys-2816a-node:~#
------------------------------------------
They found a kernel patch and suspect it may related to this issue, but they are not sure about it:
http://stackoverflow.com/questions/9801256/app-hangs-on-futex-wait-queue-me-every-a-couple-of-minutes
https://github.com/torvalds/linux/commit/b0c29f79ecea0b6fbcefc999e70f2843ae8306db
https://github.com/torvalds/linux/commit/11d4616bd07f38d496bd489ed8fad1dc4d928823
This issue is very hard to reproduce, so currently they can't provide a test case to reproduce it.
This is a preempt support customer(Juniper), and they consist request to involve engineering team to help analyse this issue and the kernel patch.
They want to know, if the above patch should be integrated into OVP 6 kernel, and if other patches are needed for integrate the patches.
Please help to look into it.