Wind River Support Network

HomeDefectsLIN1021-636
Fixed

LIN1021-636 : marvell-cn96xx: Kernel crash with GHES enabled

Created: Jun 16, 2021    Updated: Oct 6, 2021
Resolved Date: Sep 17, 2021
Found In Version: 10.21.20.1
Fix Version: 10.21.20.5
Severity: Standard
Applicable for: Wind River Linux LTS 21
Component/s: BSP

Description

BSP: marvel-cn96xx

Kernel configured with CONFIG_OCTEONTX2_SDEI_GHES=y crashed at boot time with the following error:

[ 9.052479] -----------[ cut here ]-----------
[ 9.057101] WARNING: CPU: LIN1021-6361 PID: 1 at arch/arm64/kernel/acpi.c:269 acpi_os_ioremap+0x248/0x2e0
[ 9.065712] Modules linked in:
[ 9.068770] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 5.10.37-yocto-standard #1
[ 9.076077] Hardware name: Marvell OcteonTX CN96XX board (DT)
[ 9.081824] pstate: 40c00089 (nZcv daIf +PAN +UAO TCO BTYPE=-)
[ 9.087828] pc : acpi_os_ioremap+0x248/0x2e0
[ 9.092097] lr : acpi_os_read_memory+0xa0/0x140
[ 9.096625] sp : ffff800011c4f7b0
[ 9.099936] x29: ffff800011c4f7b0 x28: ffff000070020010
[ 9.105252] x27: ffffffdffe6175d0 x26: ffff000070021800
[ 9.110569] x25: 0000000000000000 x24: 0000000000000008
[ 9.115886] x23: ffff800011c4f920 x22: 0000000000000040
[ 9.121201] x21: ffff00010562163c x20: ffff0001005e0000
[ 9.126518] x19: 0000000070020000 x18: 0000000000000020
[ 9.131833] x17: 0000000000000001 x16: 0000000000000048
[ 9.137147] x15: ffffffffffffffff x14: ffffffffffffff00
[ 9.142464] x13: ffffffffffffffff x12: 0000000000000008
[ 9.147780] x11: 0101010101010101 x10: 7f7f7f7f7f7f7f7f
[ 9.153095] x9 : ffff8000106d8040 x8 : fefefefefefeff72
[ 9.158412] x7 : 0000000070020000 x6 : ffff800011618e60
[ 9.163727] x5 : 0000000000000000 x4 : 0000000000000004
[ 9.169042] x3 : 0000000000000000 x2 : ffff8000114e73d0
[ 9.174356] x1 : 0000000000000008 x0 : 0000000070020000
[ 9.179672] Call trace:
[ 9.182118] acpi_os_ioremap+0x248/0x2e0
[ 9.186038] acpi_os_read_memory+0xa0/0x140
[ 9.190220] apei_read+0xc0/0xd0
[ 9.193448] __ghes_peek_estatus.isra.0+0x3c/0xd4
[ 9.198151] ghes_proc+0x48/0x1ec
[ 9.201462] ghes_probe+0x158/0x430
[ 9.204949] platform_drv_probe+0x60/0xb4
[ 9.208958] really_probe+0xf4/0x4e0
[ 9.212532] driver_probe_device+0x7c/0x170
[ 9.216713] __device_attach_driver+0xac/0x130
[ 9.221156] bus_for_each_drv+0x84/0xd4
[ 9.224990] __device_attach+0xe0/0x1c0
[ 9.228825] device_initial_probe+0x20/0x30
[ 9.233007] bus_probe_device+0xa8/0xb0
[ 9.236841] device_add+0x350/0x760
[ 9.240327] platform_device_add+0x134/0x290
[ 9.244595] hest_parse_ghes+0xbc/0x10c
[ 9.248429] apei_hest_parse+0x9c/0x190
[ 9.252262] acpi_hest_init+0x130/0x1d8
[ 9.256097] sdei_ghes_driver_init+0x9ac/0xae0
[ 9.260541] do_one_initcall+0x6c/0x2d0
[ 9.264376] kernel_init_freeable+0x1f0/0x25c
[ 9.268730] kernel_init+0x20/0x124
[ 9.272216] ret_from_fork+0x10/0x18
[ 9.275791] --[ end trace 9fcc916c6cad1711 ]--
[ 9.280410] [Firmware Warn]: GHES: Failed to read error status block address for hardware error source: 0.
[ 9.290209] [Firmware Warn]: GHES: Failed to read error status block address for hardware error source: 1.
[ 9.300005] [Firmware Warn]: GHES: Failed to read error status block address for hardware error source: 2.

Issue is related to some kernel API change, where access from a firmware to kernel memory region is much stricter now. Because the Marvell GHES driver was designed on much older kernel version, it doesn't take this into account.
See https://lore.kernel.org/kernel-hardening/20200626155832.2323789-2-ardb@kernel.org/

Steps to Reproduce

Configure project with BSP: marvel-cn96xx

Configure kernel with CONFIG_OCTEONTX2_SDEI_GHES=y

Boot board
Live chat
Online