NPUDoublePsEmbFunctor Init Done. Table: UnirecEmbedding [E compiler_depend.ts:280] call aclnnMatmul failed, detail:EZ9903: [PID: 543476] 2025-06-16-10:46:41.752.975 rtKernelLaunchWithFlagV2 failed: 107003 Solution: In this scenario, collect the plog when the fault occurs and locate the fault based on the plog. TraceBack (most recent call last): Kernel launch failed, stream is not in current ctx, stream_id=3.[FUNC:KernelLaunch][FILE:api_impl.cc][LINE:465] rtKernelLaunchWithFlagV2 execute failed, reason=[stream not in current context][FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:53] rtKernelLaunchWithFlagV2 failed: 107003 #### KernelLaunch failed: /usr/local/Ascend/ascend-toolkit/8.0.T60/opp/built-in/op_impl/ai_core/tbe//kernel/ascend910b/mem_set/MemSet_1a6864193b99ef93ef38616f04a712ab_high_performance.o Memset Output Tensor failed Kernel Run failed. opType: 3, MatMulV2 launch failed for MatMulV2, errno:361001. [ERROR] 2025-06-16-10:46:41 (PID:543476, Device:0, RankID:-1) ERR01100 OPS call acl api failed Exception raised from operator() at build/CMakeFiles/torch_npu.dir/compiler_depend.ts:84 (most recent call first): frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x57 (0x7ff5a3392617 in /usr/local/miniconda3/envs/torchrec/lib/python3.11/site-packages/torch/lib/libc10.so) frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x64 (0x7ff5a334d98d in /usr/local/miniconda3/envs/torchrec/lib/python3.11/site-packages/torch/lib/libc10.so) frame #2: <unknown function> + 0xed1162 (0x7ff4fcad1162 in /usr/local/miniconda3/envs/torchrec/lib/python3.11/site-packages/torch_npu/lib/libtorch_npu.so) frame #3: <unknown function> + 0x14c27bc (0x7ff4fd0c27bc in /usr/local/miniconda3/envs/torchrec/lib/python3.11/site-packages/torch_npu/lib/libtorch_npu.so) frame #4: <unknown function> + 0x7365b7 (0x7ff4fc3365b7 in /usr/local/miniconda3/envs/torchrec/lib/python3.11/site-packages/torch_npu/lib/libtorch_npu.so) frame #5: <unknown function> + 0x736a76 (0x7ff4fc336a76 in /usr/local/miniconda3/envs/torchrec/lib/python3.11/site-packages/torch_npu/lib/libtorch_npu.so) frame #6: <unknown function> + 0x73468f (0x7ff4fc33468f in /usr/local/miniconda3/envs/torchrec/lib/python3.11/site-packages/torch_npu/lib/libtorch_npu.so) frame #7: <unknown function> + 0xdbbf4 (0x7ff5736c7bf4 in /usr/local/miniconda3/lib/libstdc++.so.6) frame #8: <unknown function> + 0x8937a (0x7ff5bcc8b37a in /lib64/libc.so.6) frame #9: <unknown function> + 0x109e4c (0x7ff5bcd0be4c in /lib64/libc.so.6) [W compiler_depend.ts:286] Warning: (function ExecFunc) Traceback (most recent call last): File "/workspace/doupleps_emb_test.py", line 258, in <module> test_simple_trainning() File "/workspace/doupleps_emb_test.py", line 158, in test_simple_trainning my_out0, my_loss, my_out1 = _training_loop(my_model, lr, "unirec") ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/workspace/doupleps_emb_test.py", line 119, in _training_loop out = model(index) ^^^^^^^^^^^^ File "/usr/local/miniconda3/envs/torchrec/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/miniconda3/envs/torchrec/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/workspace/doupleps_emb_test.py", line 44, in forward x0 = self.linear1(emb_out) ^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/miniconda3/envs/torchrec/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/miniconda3/envs/torchrec/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/miniconda3/envs/torchrec/lib/python3.11/site-packages/torch/nn/modules/linear.py", line 114, in forward return F.linear(input, self.weight, self.bias) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ RuntimeError: The Inner error is reported as above. The process exits for this inner error, and the current working operator name is aclnnMatmul. Since the operator is called asynchronously, the stacktrace may be inaccurate. If you want to get the accurate stacktrace, pleace set the environment variable ASCEND_LAUNCH_BLOCKING=1. [ERROR] 2025-06-16-10:46:41 (PID:543476, Device:0, RankID:-1) ERR00100 PTA call acl api failed [W compiler_depend.ts:305] Warning: NPU warning, error code is 107003[Error]: [Error]: The stream is not in the current context. Check whether the context where the stream is located is the same as the current context. EE9999: Inner Error! EE9999: [PID: 543476] 2025-06-16-10:46:41.760.771 Stream destroy failed, stream is not in current ctx, stream_id=3.[FUNC:StreamDestroy][FILE:api_impl.cc][LINE:1104] TraceBack (most recent call last): rtStreamDestroyForce execute failed, reason=[stream not in current context][FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:53] destroy stream force failed, runtime result = 107003[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161] (function operator()) I0616 10:46:42.707563 543476 hbm_table_manager.h:75] NpuHbmSparseTableManager destructor called Segmentation fault (core dumped)
时间: 2025-06-17 18:45:54 浏览: 7
### 解决 aclnnMatmul 调用失败的问题
错误代码 `EZ9903` 和 `107003` 通常与 NPU 环境配置、上下文管理以及矩阵乘法操作的参数设置有关。以下是对问题的详细分析和解决方案。
#### 错误代码 EZ9903 的原因
错误代码 `EZ9903` 表示在调用 `aclnnMatmul` 时,矩阵乘法操作未能成功完成。这可能是由于以下几个原因:
- 输入张量的形状不匹配或数据类型不支持[^1]。
- 设备内存不足,无法分配足够的资源来执行矩阵运算。
- 当前上下文未正确设置,导致 NPU 流不在当前上下文中[^2]。
为了解决此问题,建议检查以下内容:
- 确保输入张量的维度符合矩阵乘法的要求。
- 使用工具(如 `nvidia-smi` 或 Ascend 的相关工具)监控设备内存使用情况,确保有足够的可用内存。
- 检查是否正确设置了上下文和设备,例如通过调用 `acl.rt.set_context` 和 `acl.rt.set_device`[^1]。
#### 错误代码 107003 的原因
错误代码 `107003` 通常与 NPU 流的上下文管理有关。具体原因可能包括:
- 当前流未正确绑定到上下文。
- 上下文初始化失败或未正确释放。
- 内核启动时传递的参数超出硬件限制。
为了解决此问题,建议执行以下操作:
- 确保在调用 `aclnnMatmul` 之前正确初始化上下文和流。
- 检查是否正确释放了上下文资源,避免资源泄漏。
- 根据引用内容[^2],调整 `.bashrc` 文件中的环境变量设置,确保仅加载必要的库文件,避免冲突。
#### 示例代码:验证 MatMulV2 操作的上下文设置
以下是一个简单的代码示例,用于验证 `MatMulV2` 操作的上下文设置是否正确:
```python
import numpy as np
from ascend import aclrt, aclnn
# 初始化上下文和流
context = aclrt.create_context(0)
stream = aclrt.create_stream()
# 定义输入张量
input_a = np.random.rand(3, 4).astype(np.float32)
input_b = np.random.rand(4, 5).astype(np.float32)
# 创建输出张量
output = np.zeros((3, 5), dtype=np.float32)
try:
# 设置上下文和流
aclrt.set_context(context)
aclrt.set_current_stream(stream)
# 调用 MatMulV2
aclnn.matmul(input_a, input_b, output)
print("MatMulV2 调用成功")
except Exception as e:
print(f"MatMulV2 调用失败: {e}")
finally:
# 释放资源
aclrt.destroy_stream(stream)
aclrt.destroy_context(context)
```
#### 驱动版本和库兼容性
如果问题仍然存在,可能是驱动版本或运行时库与硬件不兼容。根据引用内容[^3],建议升级 NPU 驱动版本至最新版本,并重新构建系统以确保兼容性。
###
阅读全文
相关推荐


















