openEuler系统禁用华为NVME卡(V3)的NVME多路径

 在openEuler系统22.03-SP1系统中使用华为NVME卡(V3),存在NVME多路径软件识别问题,导致缓存卡性数据显示异常,本文通过禁用NVME多路径,解决相关问题。

一、问题现象

1、系统环境:

操作系统:openEuler系统22.03-SP1

2、系统nvme驱动模块的详细信息
[root@localhost ~]# modinfo nvme
filename:       /lib/modules/5.10.0-136.12.0.86.oe2203sp1.x86_64/kernel/drivers/nvme/host/nvme.ko.xz
version:        1.0
license:        GPL
author:         Matthew Wilcox <willy@linux.intel.com>
srcversion:     2D4C771ED8D5E2A71F5162F
alias:          pci:v*d*sv*sd*bc01sc08i02*
alias:          pci:v0000106Bd00002005sv*sd*bc*sc*i*
alias:          pci:v0000106Bd00002003sv*sd*bc*sc*i*
alias:          pci:v0000106Bd00002001sv*sd*bc*sc*i*
alias:          pci:v00002646d00002263sv*sd*bc*sc*i*
alias:          pci:v00002646d00002262sv*sd*bc*sc*i*
alias:          pci:v000015B7d00002001sv*sd*bc*sc*i*
alias:          pci:v00001C5Cd00001504sv*sd*bc*sc*i*
alias:          pci:v00001CC1d00008201sv*sd*bc*sc*i*
alias:          pci:v000010ECd00005762sv*sd*bc*sc*i*
alias:          pci:v00001D1Dd00002601sv*sd*bc*sc*i*
alias:          pci:v00001D1Dd00002807sv*sd*bc*sc*i*
alias:          pci:v00001D1Dd00001F1Fsv*sd*bc*sc*i*
alias:          pci:v00001B4Bd00001092sv*sd*bc*sc*i*
alias:          pci:v00001987d00005016sv*sd*bc*sc*i*
alias:          pci:v0000144Dd0000A822sv*sd*bc*sc*i*
alias:          pci:v0000144Dd0000A821sv*sd*bc*sc*i*
alias:          pci:v00001C5Fd00000540sv*sd*bc*sc*i*
alias:          pci:v00001C58d00000023sv*sd*bc*sc*i*
alias:          pci:v00001C58d00000003sv*sd*bc*sc*i*
alias:          pci:v00001BB1d00000100sv*sd*bc*sc*i*
alias:          pci:v0000126Fd00002263sv*sd*bc*sc*i*
alias:          pci:v00001B36d00000010sv*sd*bc*sc*i*
alias:          pci:v00008086d00005845sv*sd*bc*sc*i*
alias:          pci:v00008086d0000F1A6sv*sd*bc*sc*i*
alias:          pci:v00008086d0000F1A5sv*sd*bc*sc*i*
alias:          pci:v00008086d00000A55sv*sd*bc*sc*i*
alias:          pci:v00008086d00000A54sv*sd*bc*sc*i*
alias:          pci:v00008086d00000A53sv*sd*bc*sc*i*
alias:          pci:v00008086d00000953sv*sd*bc*sc*i*
depends:        nvme-core
retpoline:      Y
intree:         Y
name:           nvme
vermagic:       5.10.0-136.12.0.86.oe2203sp1.x86_64 SMP mod_unload modversions 
sig_id:         PKCS#7
signer:         openEuler kernel signing key
sig_key:        77:F3:6E:A0:8C:FA:87:F0:87:33:87:F6:05:A6:91:58:FD:D3:69:2E
sig_hashalgo:   sha256
signature:      57:DC:27:9D:DE:6E:94:BA:F9:EB:4C:77:74:78:9B:82:6A:CF:63:53:
                CF:1A:F0:E4:EE:3B:76:EA:26:A1:2A:05:5F:3E:BE:D4:02:64:58:F8:
                BC:27:6A:46:03:15:56:4A:39:CF:3B:75:23:C1:7E:95:4A:EA:27:D1:
                96:AD:C2:1B:3F:12:6A:0B:64:3C:2C:48:E1:35:4D:52:B2:20:A0:6B:
                C6:51:C9:F4:45:DE:BC:64:48:FC:B7:B4:67:87:AB:B8:28:A1:EC:8F:
                C1:22:6F:A2:CA:58:58:CF:CC:1F:41:A8:E8:4F:99:27:67:F8:93:1A:
                99:59:A7:50:8C:C2:45:A4:5E:55:7E:B4:24:2E:C3:E2:4A:C3:D8:06:
                C7:AE:76:F3:41:5F:0A:78:5B:00:A5:A7:A1:E9:01:BA:54:94:3A:42:
                E4:08:61:58:0F:03:03:49:DF:76:FB:DF:EF:96:6B:E7:7A:40:3C:1F:
                75:92:31:D1:03:74:F4:37:5E:DE:41:68:3D:53:ED:22:F0:E4:EB:83:
                55:27:F3:06:D3:21:4E:29:93:7F:36:1B:86:12:28:A6:53:55:0C:5F:
                84:00:47:8A:FC:99:39:EF:EE:03:FD:8A:32:DD:D0:00:DD:83:91:7B:
                EE:17:C4:F6:CB:AD:3C:3A:6B:6D:D4:B0:FA:F3:56:19:FE:25:B6:B4:
                A6:6C:87:11:E9:E4:BC:15:1A:C7:D4:FE:88:91:57:F1:5A:1D:74:0F:
                A2:54:5C:EA:F1:81:8F:71:2C:3F:FE:3D:A6:2F:D8:45:6F:FA:3A:58:
                C6:7C:FE:19:27:B1:78:9A:66:F9:FF:E4:80:E7:D1:73:4F:63:A7:CB:
                DC:52:18:30:49:46:09:D2:B1:CC:CC:BB:32:CF:87:30:83:28:2C:B2:
                5B:0B:94:91:FE:82:1C:1C:13:59:7D:BC:40:75:CC:63:EB:6B:31:D3:
                0D:EE:E1:F1:8F:9B:06:0C:47:35:E8:13:E0:53:62:F7:C5:CA:96:64:
                0D:47:64:33:37:19:0F:5E:71:9E:87:43:20:3A:10:5E:71:F5:83:41:
                D3:E3:58:1D:55:EF:BE:08:72:0A:BD:3A:5B:45:10:7F:1A:16:4B:3F:
                D1:B6:4C:E1:FF:37:B9:07:9F:6C:08:7F:F5:14:D3:32:DC:BB:5E:C6:
                CF:81:93:BF:16:7F:D0:BA:C4:86:C6:4B:C4:8B:8C:79:73:DE:20:15:
                7F:17:B2:B5:7D:8B:23:0B:A0:07:18:2D:93:8F:22:73:F8:73:DD:EE:
                90:5C:5C:99:4D:AE:7A:AA:90:30:D2:E0:AB:A9:FD:C1:55:95:9F:C9:
                68:9B:D8:53:67:D3:FA:A5:BD:33:96:FC
parm:           use_threaded_interrupts:int
parm:           use_cmb_sqes:use controller's memory buffer for I/O SQes (bool)
parm:           max_host_mem_size_mb:Maximum Host Memory Buffer (HMB) size per controller (in MiB) (uint)
parm:           sgl_threshold:Use SGLs when average request segment size is larger or equal to this size. Use 0 to disable SGLs. (uint)
parm:           io_queue_depth:set io queue depth, should >= 2
parm:           write_queues:Number of queues to use for writes. If not set, reads and writes will share a queue set.
parm:           poll_queues:Number of queues to use for polled IO.
parm:           noacpi:disable acpi bios quirks (bool)
3、nvme-cli版本信息
[root@localhost ~]# rpm -qa |grep nvme
nvme-cli-1.16-2.oe2203sp1.x86_64
4、查看nvme的加载情况 
[root@localhost ~]# lsmod |grep nvme
nvme                   49152  0
nvme_core             131072  5 nvme
t10_pi                 16384  2 sd_mod,nvme_core

可以看到nvme模块已加载,占用 131072 字节内存,依赖nvme_core模块

5、查看nvme硬盘状态
[root@localhost ~]# cat /proc/diskstats |grep nvme
   0       0 nvme0c0n1 16679303528 2385937 1853227015307 3641028437 101406753596 101658052 2934276399698 3657932308 0 987326426 3003993450 0 0 0 0 0 0
 259       1 nvme0n1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 259       2 nvme0n1p1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 259       3 nvme0n1p2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 259       4 nvme0n1p3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

 可以看到只有nvme0c0n1有性能数据,nvme0n1及3个分区均没有性能数据,这就是问题现象。

二、问题分析处理

1、查看nvme卡版本固件信息
[root@localhost ~]# nvme list
Node                  SN                   Model                                    Namespace Usage                      Format           FW Rev  
--------------------- -------------------- ---------------------------------------- --------- -------------------------- ---------------- --------
/dev/nvme0n1          032BEHFSH8001280     HWE36P43016M000N                         1           1.60  TB /   1.60  TB    512   B +  0 B   2.52 

 经排查固件版本为2.52, 版本比较老,是操作系统开启了 nvme 多路径,该节点上的 nvme 盘符变化导致。解决办法是通过修改 GRUB 配置全局禁用 nvme 多路径功能,解决多路径驱动存在问题的场景。 

2、解决办法

1、禁用nvme多路径指令

grubby --update-kernel=ALL --args="nvme_core.multipath=N"

 --------------------------------------------------------命令详解-------------------------------------------------------
grubby:Linux 系统中用于修改 GRUB 引导配置的工具,可安全更新内核参数,避免手动编辑配置文件(如 /etc/default/grub)的风险。
--update-kernel=ALL:指定对所有已安装的内核生效。若只想修改当前内核,可使用 --default-kernel。
--args="nvme_core.multipath=N":在每个内核的启动参数中追加 nvme_core.multipath=N,其中:
nvme_core.multipath:控制 NVMe 设备的多路径支持(Multi-Path I/O,MPIO)。N(或 0)表示禁用多路径,仅使用单路径访问 NVMe 设备;Y(或 1)表示启用。

------------------------------------------------------------------------------------------------------------------------------

该命令执行无回显,实际上是对/boot/grub2/grub.cfg启动文件进行了修改,修改信息如下:

 3、/boot/grub2/grub.cfg文件修改前后对比
[root@localhost ~]# diff /boot/grub2/grub.cfg /boot/grub2/grub.cfg.old
103c103
<       linux   /vmlinuz-5.10.0-136.12.0.86.oe2203sp1.x86_64 root=UUID=a32ea823-ac80-4335-8d06-64412d9d4ce8 ro cgroup_disable=files apparmor=0 crashkernel=512M rhgb quiet nvme_core.multipath=N
---
>       linux   /vmlinuz-5.10.0-136.12.0.86.oe2203sp1.x86_64 root=UUID=a32ea823-ac80-4335-8d06-64412d9d4ce8 ro cgroup_disable=files apparmor=0 crashkernel=512M rhgb quiet
119c119
<       linux   /vmlinuz-0-rescue-a062805e0d6a479db7c88998bf2dc586 root=UUID=a32ea823-ac80-4335-8d06-64412d9d4ce8 ro cgroup_disable=files apparmor=0 crashkernel=512M rhgb quiet nvme_core.multipath=N
---
>       linux   /vmlinuz-0-rescue-a062805e0d6a479db7c88998bf2dc586 root=UUID=a32ea823-ac80-4335-8d06-64412d9d4ce8 ro cgroup_disable=files apparmor=0 crashkernel=512M rhgb quiet
4、检查当前启动命令行中nvme内核多路径状态
[root@localhost ~]# cat /proc/cmdline | grep nvme_core.multipath

可以看到是默认值无定义(无定义即为启用) 

5、重启服务器

重启的目的是使grubby命令对/boot/grub2/grub.cfg的修改生效。

6、再次检查当前启动命令行中nvme内核多路径状态
[root@localhost ~]# cat /proc/cmdline | grep nvme_core.multipath
BOOT_IMAGE=/vmlinuz-5.10.0-136.12.0.86.oe2203sp1.x86_64 root=UUID=a32ea823-ac80-4335-8d06-64412d9d4ce8 ro cgroup_disable=files apparmor=0 crashkernel=512M rhgb quiet nvme_core.multipath=N

可以看到nvme_core.multipath=N配置已生效,nvme多路径已停用 

7、检查nvme缓存盘性能数据
[root@localhost ~]#  cat /proc/diskstats |grep nvme
 259       0 nvme0n1 1271556 34988 136785541 754177 16211742 83105 603946699 725758 0 1793303 1479935 0 0 0 0 0 0
 259       1 nvme0n1p1 469978 55 17476760 98704 76957 394 69807387 30952 0 422101 129657 0 0 0 0 0 0
 259       2 nvme0n1p2 800111 34933 117972653 653577 9094310 82706 463236840 545896 0 1313903 1199473 0 0 0 0 0 0
 259       3 nvme0n1p3 1386 0 1333320 1885 7040475 5 70902472 148908 0 1751225 150794 0 0 0 0 0 0

可以看到各项数据均能正常显示了,问题解决。

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

代先生.重庆

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值