Wind River Support Network

HomeDefectsLIN5-19081
Fixed

LIN5-19081 : possible circular locking dependency when running netperf on quark

Created: Jul 16, 2014    Updated: Dec 19, 2017
Resolved Date: Aug 4, 2014
Found In Version: 5.0.1.16
Fix Version: 5.0.1.18
Severity: Standard
Applicable for: Wind River Linux 5
Component/s: BSP - Async

Description

Running netperf on 2 quark board will give the following trace.
The 2 quarks tested are: aiot-x1000 from AAEON and gwb-qx from ECS GEN2. On the AAEON unit the message will appear quite fast (after a few netperf runs) while on ECS unit it needs some time to reproduce (after more than ~10 netperf runs).

After the error, succeeding netperf runs will be ok and without printing the message again and I can see an increase in throughput.

Can it be run on other quark board to see if we can reproduce?


Here is the log:

root@localhost:~# netperf -H 192.168.12.87
TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.12.87 (192.168.12.87) port 0 AF_INET

======================================================
[ INFO: possible circular locking dependency detected ]
3.4.88-WR5.0.1.16_standard #1 Not tainted
-------------------------------------------------------
kmemleak/627 is trying to acquire lock:
(&(&dev->tx_global_lock)->rlock){+.-...}, at: [<c135b583>] stmmac_tx_clean+0x193/0x280

but task is already holding lock:
(&(&priv->tx_lock)->rlock){+.-...}, at: [<c135b413>] stmmac_tx_clean+0x23/0x280

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #2 (&(&priv->tx_lock)->rlock){+.-...}:
      [<c107966d>] lock_acquire+0x7d/0x110
      [<c14afd03>] _raw_spin_lock+0x43/0x50
      [<c135c621>] stmmac_xmit+0x61/0x390
      [<c14052b2>] dev_hard_start_xmit+0x382/0x670
      [<c1418cb0>] sch_direct_xmit+0xa0/0x200
      [<c140572a>] dev_queue_xmit+0x18a/0x6b0
      [<c1735e4a>] ip_auto_config+0x7c4/0xd9e
      [<c1001114>] do_one_initcall+0x34/0x170
      [<c1709865>] kernel_init+0x108/0x1d5
      [<c14b7b46>] kernel_thread_helper+0x6/0x10

-> #1 (_xmit_ETHER#2){+.-...}:
      [<c107966d>] lock_acquire+0x7d/0x110
      [<c14afd03>] _raw_spin_lock+0x43/0x50
      [<c1419388>] dev_deactivate_many+0xa8/0x210
      [<c141951d>] dev_deactivate+0x2d/0x40
      [<c1413318>] linkwatch_do_dev+0x78/0xc0
      [<c14135e3>] __linkwatch_run_queue+0xc3/0x1a0
      [<c14136e0>] linkwatch_event+0x20/0x30
      [<c10490fc>] process_one_work+0x17c/0x480
      [<c1049c63>] worker_thread+0x113/0x2a0
      [<c104dda1>] kthread+0x71/0x80
      [<c14b7b46>] kernel_thread_helper+0x6/0x10

-> #0 (&(&dev->tx_global_lock)->rlock){+.-...}:
      [<c1078491>] __lock_acquire+0xb21/0x1750
      [<c107966d>] lock_acquire+0x7d/0x110
      [<c14afd03>] _raw_spin_lock+0x43/0x50
      [<c135b583>] stmmac_tx_clean+0x193/0x280
      [<c135b767>] stmmac_poll+0x27/0x370
      [<c1404291>] net_rx_action+0x71/0x1e0
      [<c10355d0>] __do_softirq+0x80/0x1d0

other info that might help us debug this:

Chain exists of:
 &(&dev->tx_global_lock)->rlock --> _xmit_ETHER#2 --> &(&priv->tx_lock)->rlock

Possible unsafe locking scenario:

      CPU0                    CPU1
      ----                    ----
 lock(&(&priv->tx_lock)->rlock);
                              lock(_xmit_ETHER#2);
                              lock(&(&priv->tx_lock)->rlock);
 lock(&(&dev->tx_global_lock)->rlock);

*** DEADLOCK ***

2 locks held by kmemleak/627:
#0:  (scan_mutex){+.+...}, at: [<c1111243>] kmemleak_scan_thread+0x53/0xc0
#1:  (&(&priv->tx_lock)->rlock){+.-...}, at: [<c135b413>] stmmac_tx_clean+0x23/0x280

stack backtrace:
Pid: 627, comm: kmemleak Not tainted 3.4.88-WR5.0.1.16_standard #1
Call Trace:
[<c14a830e>] print_circular_bug+0x1bd/0x1c7
[<c1078491>] __lock_acquire+0xb21/0x1750
[<c1076b46>] ? mark_held_locks+0x66/0xf0
[<c107966d>] lock_acquire+0x7d/0x110
[<c135b583>] ? stmmac_tx_clean+0x193/0x280
[<c13f788d>] ? __kfree_skb+0x3d/0x90
[<c14afd03>] _raw_spin_lock+0x43/0x50
[<c135b583>] ? stmmac_tx_clean+0x193/0x280
[<c135b583>] stmmac_tx_clean+0x193/0x280
[<c135b767>] stmmac_poll+0x27/0x370
[<c1076b46>] ? mark_held_locks+0x66/0xf0
[<c140426e>] ? net_rx_action+0x4e/0x1e0
[<c1404291>] net_rx_action+0x71/0x1e0
[<c1076e0b>] ? trace_hardirqs_on+0xb/0x10
[<c1035550>] ? local_bh_enable_ip+0xc0/0xc0
[<c10355d0>] __do_softirq+0x80/0x1d0
[<c1035550>] ? local_bh_enable_ip+0xc0/0xc0
<IRQ>  [<c103593e>] ? irq_exit+0x7e/0xa0
[<c14b7c1d>] ? do_IRQ+0x4d/0xb0
[<c14b7b2e>] ? common_interrupt+0x2e/0x40
[<c14b0458>] ? _raw_spin_unlock_irqrestore+0x38/0x60
[<c111093e>] ? scan_gray_list+0x12e/0x180
[<c1110c2a>] ? kmemleak_scan+0x15a/0x490
[<c1110ad0>] ? kmemleak_seq_start+0x110/0x110
[<c11111f0>] ? kmemleak_write+0x290/0x290
[<c1111248>] ? kmemleak_scan_thread+0x58/0xc0
[<c104dda1>] ? kthread+0x71/0x80
[<c104dd30>] ? __init_kthread_worker+0x60/0x60
[<c14b7b46>] ? kernel_thread_helper+0x6/0x10
Recv   Send    Send                          
Socket Socket  Message  Elapsed              
Size   Size    Size     Time     Throughput  
bytes  bytes   bytes    secs.    10^6bits/sec  

87380  16384  16384    10.02       9.22  
root@localhost:~#

Workaround

I noticed that I can't reproduce if on the host side, ./netserver is run without any parameter. It seems that "-4" argument has something to do with it.



Usage: netserver [options] 

Options:
    -h                Display this text
    -d                Increase debugging output
    -L name,family    Use name to pick listen address and family for family
    -p portnum        Listen for connect requests on portnum.
    -4                Do IPv4
    -6                Do IPv6
    -v verbosity      Specify the verbosity level
    -V                Display version information and exit

Steps to Reproduce

I have build RCPL16 for intel-quark with glibc-std, standard kernel and netperf.
Create a usb or sd card connect a eth cable to the testing unit and boot it.

On the host computer I go to <proj_dir>/build/netperf-2.4.4-r3/image/usr/sbin and run ./netserver -4 -L <ip_address>

On the target run "netperf -H <ip> -T TCP_STREAM". Also try with UDP_STREAM.
After a few tries the trace from description will appear.

Other Downloads


Live chat
Online