Wind River Support Network

HomeDefectsOVP-1740
Fixed

OVP-1740 : WRL5: xfs workqueue deadlock

Created: Mar 20, 2014    Updated: Mar 11, 2016
Resolved Date: Jun 25, 2014
Found In Version: 5.0.1.11
Fix Version: 5.0.1.16
Severity: Severe
Applicable for: Wind River Linux 5
Component/s: Kernel

Description

We have a high-priority problem here. The issue is that operations in xfs-mounted filesystems may hang indefinetly (deadlock). Since xfs is used as the backing-storage for glusterfs here it
becomes a serious problem.

I have done an initial analysis and the cause seems to be that there is a stall in workqueue processing that causes the deadlock. I don't see this problem in a non-OVP kernel and there is a slight change in workqueue-processing between OVP and the non-OVP kernel (introduced in this commit: "28644b7 sched: Distangle worker accounting from rq-lock"). It seems that this change is incompatible with how the 3.4
xfs-code uses workqueues. You may also say that the change makes it easier to create workqueue-related deadlocks. I have not tested with any upstream bugfixes to the xfs-code and that should be investigated.

Currently here we temporarily revert to the mainline-type workqueue-processing as a temporary workaround. I don't think this is an acceptable workaround in the end since it affects all workqueue-processing and the reason for it is to release locks faster.

Steps to Reproduce

The deadlock can be easily reproduced with the steps below (reproduced
on qemu-system-x86_64 and HP ProLiant BL460c Gen8).

1. Run:

    configure \
        --enable-build=production \
        --enable-addons=wr-ovp \
        --enable-board=intel-xeon-core \
        --enable-rootfs=glibc_std \
        --enable-bootimage=ext3

2. cp linux-windriver_3.4.bbappend layers/local/recipes-kernel/linux-windriver/
3. cp hello.c layers/local/recipes-sample/hello/
4. make -C build hello.addpkg
5. make
6. <take coffea-break>
7. Boot the kernel and rootfs with kernel-parameter "maxcpus=1"
6. Mount an xfs-filesystem
7. touch <xfs mount-point>/foo
8. hello <xfs mount-point>/foo &
9. After a while you will see that a kworker and other related
processes has got stuck in the "D" state. Also trying to read the
"foo"-file will hang.
Live chat
Online