Wind River Support Network

HomeOther DownloadsKernel panic in sctp_sendmsg with PNE-LE1.3
Recommended Type: Patch

Kernel panic in sctp_sendmsg with PNE-LE1.3

Released: Jun 17, 2009     Updated: Jun 17, 2009

Description

Customer is running stcp stack of PNE-LE1.3 in their product(Intel_mpcbl0040 based custom board). 
the sctp stack which they are using is the one which following 5 patches were already applied (these patches are provided by WindRiver)
sctp-2.6.15-upgrade.patch
sctp-2.6.15-security-fix.patch
sctp-2.6.15-security-fix2.patch
sctp-2.6.19.patch
sctp-2.6.19-backport.patch

besides above patches, they have not modified sctp stack at all according to them.


During the system operation with this stcp stack, following kernel panic has happend
it does not happen regularly



May  9 19:30:12 201 kernel: f8df3236
May  9 19:30:12 201 kernel: PREEMPT SMP
May  9 19:30:12 201 kernel: LTT NESTING LEVEL : 0
May  9 19:30:12 201 kernel: Modules linked in: gps50hz acs85xx_smbus fawkes ohci_hcd i2c_i801 i2c_core ehci_hcd uhci_hcd usbcore bonding ipmi_watchdog ipmi_si ipmi_devintf ipmi_msghandler sctp ipv6
May  9 19:30:12 201 kernel: CPU:    1
May  9 19:30:12 201 kernel: EIP:    0060:[]    Not tainted VLI
May  9 19:30:12 201 kernel: EFLAGS: 00010202   (2.6.14.7-selinux1)
May  9 19:30:12 201 kernel: EIP is at sctp_sendmsg+0x2fc/0x721 [sctp]
May  9 19:30:12 201 kernel: eax: ffffffff   ebx: d6845080   ecx: d6305320   edx: d62dc994
May  9 19:30:12 201 kernel: esi: d667a000   edi: d62dc980   ebp: dfd71c10   esp: dfd71b78
May  9 19:30:12 201 kernel: ds: 007b   es: 007b   ss: 0068
May  9 19:30:12 201 kernel: Process wag_oal (pid: 3964, threadinfo=dfd71000 task=dfc5f550)
May  9 19:30:12 201 kernel: Stack: 00000036 d7729550 00000000 d62dc994 d6305320 00001ce4 00000000 00000000
May  9 19:30:12 201 kernel:        00000000 00000000 d667a000 00000000 d693da80 dfd71dd0 d6845080 000000fa
May  9 19:30:12 201 kernel:        00000000 00000000 d66a11cc 00000000 00000000 00000000 00000000 00000000
May  9 19:30:12 201 kernel: Call Trace:
May  9 19:30:12 201 kernel:  [] show_stack+0x7c/0x92
May  9 19:30:12 201 kernel:  [] show_registers+0x153/0x1cb
May  9 19:30:12 201 kernel:  [] die+0x123/0x19e
May  9 19:30:12 201 kernel:  [] do_page_fault+0x7b5/0x4580
May  9 19:30:12 201 kernel:  [] error_code+0x4f/0x54
May  9 19:30:12 201 kernel:  [] inet_sendmsg+0x3e/0x4a
May  9 19:30:12 201 kernel:  [] sock_sendmsg+0x129/0xc0d
May  9 19:30:12 201 kernel:  [] sys_sendmsg+0x134/0x235
May  9 19:30:12 201 kernel:  [] sys_socketcall+0x9e3/0xa46
May  9 19:30:12 201 kernel:  [] sysenter_past_esp+0x54/0x75
May  9 19:30:12 201 kernel: Code: c2 0f 84 70 03 00 00 8b bd 74 ff ff ff 83 ef 14 89 f8 e8 ba 84 ff ff 8b 77 54 89 f0 8b 5e 18 e8 ce 73 ff ff 8b 47 1c f0 ff 43 18 <89> 58 08 c7 40 70 54 94 35 c0 8b 80 88 00 00 00 f0 01 43 64 8b
May  9 19:30:21 201 kernel:  <3>BUG: soft lockup detected on CPU#0!
May  9 19:30:21 201 kernel:
May  9 19:30:21 201 kernel: Pid: 29705, comm:              wag_oal
May  9 19:30:21 201 kernel: EIP: 0060:[] CPU: 0
May  9 19:30:21 201 kernel: EIP is at freeary+0x16/0x81
May  9 19:30:21 201 kernel:  EFLAGS: 00000282    Not tainted  (2.6.14.7-selinux1)
May  9 19:30:21 201 kernel: EAX: d6305320 EBX: d77cbb48 ECX: c0463e20 EDX: 38608061
May  9 19:30:21 201 kernel: ESI: 00000000 EDI: 38608061 EBP: d6335e48 DS: 007b ES: 007b
May  9 19:30:21 201 kernel: CR0: 8005003b CR2: b3eea000 CR3: 16a2c000 CR4: 000006d0


codes are executed with the following sequences.
and finally go to panic state
 
[user application]   sending sctp message with socket call.
[system call]         sys_socketcall -> sys_sendmsg -> sock_sendmsg -> inet_sendmsg
                            -> sctp_sendmsg
 
[part of sctp_sendmsg in net/sctp/socket.c]
 
        /* Now send the (possibly) fragmented message. */
        list_for_each(pos, &datamsg->chunks) {
                chunk = list_entry(pos, struct sctp_chunk, frag_list);
                sctp_datamsg_track(chunk);
 
                /* Do accounting for the write space.  */
                sctp_set_owner_w(chunk);      <==== (1)

                chunk->transport = chunk_tp;
 
                /* Send it to the lower layers.  Note:  all chunks
                 * must either fail or succeed.   The lower layer
                 * works that way today.  Keep it that way or this
                 * breaks.
                 */
                err = sctp_primitive_SEND(asoc, chunk);
                /* Did the lower layer accept the chunk? */
                if (err)
                        sctp_chunk_free(chunk);
                SCTP_DEBUG_PRINTK("We sent primitively.\n");
        }

 ====> from (1)
static inline void sctp_set_owner_w(struct sctp_chunk *chunk)
{
        struct sctp_association *asoc = chunk->asoc;
        struct sock *sk = asoc->base.sk;
 
        /* The sndbuf space is tracked per association.  */
        sctp_association_hold(asoc);
 
        skb_set_owner_w(chunk->skb, sk);   <===== (2)
 
        chunk->skb->destructor = sctp_wfree;
       :
       :
 
}
       
 =======> from (2)
[include/net/sock.h]
static inline void skb_set_owner_w(struct sk_buff *skb, struct sock *sk)
{
        sock_hold(sk);
        skb->sk = sk;      <== (3)  panic point
        skb->destructor = sock_wfree;
        atomic_add(skb->truesize, &sk->sk_wmem_alloc);
}

To investigate the exact point  of panic on C-source line, you can refer to the following
which is a part of de-assembled code for sctp kernel module
 
00010f3a :     <==== start address of sctp_sendmsg function
sctp_sendmsg():
net/sctp/socket.c:1370
   10f3a:       55                      push   %ebp
net/sctp/socket.c:1378
   10f3b:       31 c0                   xor    %eax,%eax
  :
  :
  :
 
net/sctp/socket.c:1722
   11212:       8b bd 74 ff ff ff       mov    0xffffff74(%ebp),%edi
   11218:       83 ef 14                sub    $0x14,%edi
net/sctp/socket.c:1723
   1121b:       89 f8                   mov    %edi,%eax
   1121d:       e8 fc ff ff ff          call   1121e
net/sctp/socket.c:143
   11222:       8b 77 54                mov    0x54(%edi),%esi
net/sctp/socket.c:147
   11225:       89 f0                   mov    %esi,%eax
net/sctp/socket.c:144
   11227:       8b 5e 18                mov    0x18(%esi),%ebx
net/sctp/socket.c:147
   1122a:       e8 fc ff ff ff          call   1122b
include/net/sock.h:1092
   1122f:       8b 47 1c                mov    0x1c(%edi),%eax
include/asm/atomic.h:103
   11232:       f0 ff 43 18             lock incl 0x18(%ebx)
include/net/sock.h:1094
   11236:       89 58 08                mov    %ebx,0x8(%eax)     <==== panic point (4)
include/net/sock.h:1095
   11239:       c7 40 70 00 00 00 00    movl   $0x0,0x70(%eax)

As you can see from the kernel panic message. (EIP is at sctp_sendmsg+0x2fc/0x721 [sctp])

11236  is the panic point (0x10f3a (start address of sctp_sendmsg) + 0x2fc = 0x11236)
 
here, you also can see, eax register points to skb pointer structure and it's value is 0xffffffff. (see the c source line pointed by (3) and deassembled code(4) )
It causes a kernel panic while it is trying to set value(here sk, see c source line pointed by (3) ) to a member of skb.
 
I just found the panic point, but I need to know what makes the pointer address of skb(eax register here) to 0xffffffff.

what I am suspecting is,
Heavy traffic blocks os to allocate memory for skb, or unstable sctp chunk handling mechanism in low version kernel.
 
Do we have another patch to make sctp chunk handling operation more stable?
or any workaround for this kind of problem.


Product Version

Linux Platforms 1.x

Downloads


Installation Notes

Installation Notes

1) configure --enable-board=intel_mpcbl0040 --enable-kernel=standard --enable-rootfs=glibc_std

2) make fs

3) run new kernel and rootfs on target


Live chat
Online