Wind River Support Network

HomeDefectsLIN6-13567
Fixed

LIN6-13567 : wrlinux6 dpaa IPsec Rekey Failure

Created: Sep 11, 2017    Updated: Dec 3, 2018
Resolved Date: Apr 22, 2018
Previous ID: LINCCM-1608
Found In Version: 6.0
Fix Version: 6.0.0.37
Severity: Severe
Applicable for: Wind River Linux 6
Component/s: Kernel

Description

We found a dpaa driver issue on wrlinux6, here is the description:
--------------------------
1.	root cause
Rekeying procedure consists of two sub-tasks (but still in one thread) of inbound and outbound both of which are processed partly by DPAA driver. Inbound comes firstly, and then outbound follows. When processing inbound, DPAA driver peels off a part of procedure into Linux kernel work queue for delay. Now, the delayed part runs concurrently along with our application process which brings out outbound procedure soon. Both of them are all requiring the access to PCD, which makes race condition occurring. Below demonstrates the details.
From our traces, we found the DPAA driver interface ‘dpa_ipsec_sa_rekeying’ returned -87 (-EUSERS). I demonstrate the function call chain as below.
execution path	function name	file name	line number (call in)	line number (call out)
inbound handle	dpa_ipsec_sa_rekeying	drivers/staging/fsl_dpa_offload/dpa_ipsec.c	4674	4887
	queue_delayed_work	drivers/staging/fsl_dpa_offload/dpa_ipsec.c	4887	5167
	sa_rekeying_work_func (delayed 100us)	drivers/staging/fsl_dpa_offload/dpa_ipsec.c	5167	5196
	sa_rekeying_inbound	drivers/staging/fsl_dpa_offload/dpa_ipsec.c	5052	5071
	remove_inbound_hash_entry	drivers/staging/fsl_dpa_offload/dpa_ipsec.c	2248	2254
	update_pre_sec_inbound_table	drivers/staging/fsl_dpa_offload/dpa_ipsec.c	2105	2215
	dpa_classif_table_delete_entry_by_ref	drivers/staging/fsl_dpa_offload/dpa_classifier.c	1564	1574
	Here reaches DPAA’s PCD operation which must be mutually exclusively accessed.
outbound handle	dpa_ipsec_sa_rekeying	drivers/staging/fsl_dpa_offload/dpa_ipsec.c	4674	4831
	update_outbound_policy	drivers/staging/fsl_dpa_offload/dpa_ipsec.c	1879	2092
	dpa_classif_table_modify_entry_by_ref	drivers/staging/fsl_dpa_offload/dpa_classifier.c	894	914
	Here reaches DPAA’s PCD operation which must be mutually exclusively accessed. But it returns -EBUSY which causes -EUSERS outside.
Yellow means kernel executing path which is concurrent and interleaved with process executing path (not highlighted).
 
When eNB device begins the rekey procedure, we’ll successively call the inbound and outbound handles as above. Now we see the outbound returns -EUSERS. After analyzing the function call chain, we suspect the DPAA’s PCD part attempts to be reentered because part of the inbound procedure is delayed and executed in kernel executing path which concurrently runs belong process path (outbound procedure). So, it caused the PCD EBUSY error, and be returned as EUSERS.

2.	SA (secure association) resource leakage
We’d thought out a solution based on retrying strategy. As long as getting the EBUSY error, we consider the race condition happened, and just go around to do it again. But in outbound processing, we have no idea to firstly release SA resource allocated the last time. By retrying every time, a new SA resource will be allocated, but never be released. Finally, all the SAs could be exhausted. Below demonstrates the details.
Today, while we were looking over the DPAA driver code to consider the retry strategy, we found a potential problem in function ‘dpa_ipsec_sa_rekeying’ when processing outbound SA. The main line of this function firstly calls ‘get_new_sa’ to allocate a new SA, and then enters one of the two branches for outbound or inbound. At most of error points (ret < 0), a goto statement jumps to the label ‘rekey_sa_err’ to call ‘rollback_rekeying_sa’ which will then call ‘put_sa’ to release the just newly allocated SA. That’s why you told us that inbound has the rollback facility to make it possible for upper layer code to retry.
Nevertheless, we consider outbound should also have such the rollback facility when ‘update_outbound_policy’ returns failure, especially -EUSERS meaning PCD busy. The upper layer application has no way to release the new SA before retrying unless DPAA driver settles it properly.
‘put_instance’ seems to only decrease the reference counter of ‘dpa_ipsec’, not to release SA. ‘put_instance’ matches with ‘get_instance’. But ‘get_new_sa’ matches with ‘put_sa’. The former allocates an SA from ‘dpa_ipsec->sa_mng.sa’ and increases ‘dpa_ipsec->num_used_sas’ by 1. The latter frees an SA back to ‘dpa_ipsec->sa_mng.sa’ and decreases ‘dpa_ipsec->num_used_sas’ by 1.




Other Downloads


Live chat
Online