又发生了报错: Traceback (most recent call last): File "<string>", line 1, in <module> File "D:\Miniconda3\envs\robocup\Lib\multiprocessing\spawn.py", line 122, in spawn_main exitcode = _main(fd, parent_sentinel) ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\Miniconda3\envs\robocup\Lib\multiprocessing\spawn.py", line 131, in _main prepare(preparation_data) File "D:\Miniconda3\envs\robocup\Lib\multiprocessing\spawn.py", line 246, in prepare _fixup_main_from_path(data['init_main_from_path']) File "D:\Miniconda3\envs\robocup\Lib\multiprocessing\spawn.py", line 297, in _fixup_main_from_path main_content = runpy.run_path(main_path, ^^^^^^^^^^^^^^^^^^^^^^^^^ File "<frozen runpy>", line 287, in run_path File "<frozen runpy>", line 98, in _run_module_code File "<frozen runpy>", line 88, in _run_code File "D:\local_ai_projects\RoboCUP\train.py", line 6, in <module> results = model.train( ^^^^^^^^^^^^ File "D:\Miniconda3\envs\robocup\Lib\site-packages\ultralytics\engine\model.py", line 797, in train self.trainer.train() File "D:\Miniconda3\envs\robocup\Lib\site-packages\ultralytics\engine\trainer.py", line 227, in train self._do_train(world_size) File "D:\Miniconda3\envs\robocup\Lib\site-packages\ultralytics\engine\trainer.py", line 348, in _do_train self._setup_train(world_size) File "D:\Miniconda3\envs\robocup\Lib\site-packages\ultralytics\engine\trainer.py", line 307, in _setup_train self.train_loader = self.get_dataloader( ^^^^^^^^^^^^^^^^^^^^ File "D:\Miniconda3\envs\robocup\Lib\site-packages\ultralytics\models\yolo\detect\train.py", line 89, in get_dataloader return build_dataloader(dataset, batch_size, workers, shuffle, rank) # return dataloader ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\Miniconda3\envs\robocup\Lib\site-packages\ultralytics\data\build.py", line 182, in build_dataloader return InfiniteDataLoader( ^^^^^^^^^^^^^^^^^^^ File "D:\Miniconda3\envs\robocup\Lib\site-packages\ultralytics\data\build.py", line 58, in __init__ self.iterator = super().__iter__() ^^^^^^^^^^^^^^^^^^ File "D:\Miniconda3\envs\robocup\Lib\site-packages\torch\utils\data\dataloader.py", line 493, in __iter__ return self._get_iterator() ^^^^^^^^^^^^^^^^^^^^ File "D:\Miniconda3\envs\robocup\Lib\site-packages\torch\utils\data\dataloader.py", line 424, in _get_iterator return _MultiProcessingDataLoaderIter(self) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\Miniconda3\envs\robocup\Lib\site-packages\torch\utils\data\dataloader.py", line 1171, in __init__ w.start() File "D:\Miniconda3\envs\robocup\Lib\multiprocessing\process.py", line 121, in start self._popen = self._Popen(self) ^^^^^^^^^^^^^^^^^ File "D:\Miniconda3\envs\robocup\Lib\multiprocessing\context.py", line 224, in _Popen return _default_context.get_context().Process._Popen(process_obj) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\Miniconda3\envs\robocup\Lib\multiprocessing\context.py", line 337, in _Popen return Popen(process_obj) ^^^^^^^^^^^^^^^^^^ File "D:\Miniconda3\envs\robocup\Lib\multiprocessing\popen_spawn_win32.py", line 46, in __init__ prep_data = spawn.get_preparation_data(process_obj._name) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\Miniconda3\envs\robocup\Lib\multiprocessing\spawn.py", line 164, in get_preparation_data _check_not_importing_main() File "D:\Miniconda3\envs\robocup\Lib\multiprocessing\spawn.py", line 140, in _check_not_importing_main raise RuntimeError(''' RuntimeError: An attempt has been made to start a new process before the current process has finished its bootstrapping phase. This probably means that you are not using fork to start your child processes and you have forgotten to use the proper idiom in the main module: if __name__ == '__main__': freeze_support() ... The "freeze_support()" line can be omitted if the program is not going to be frozen to produce an executable. To fix this issue, refer to the "Safe importing of main module" section in https://docs.python.org/3/library/multiprocessing.html
时间: 2025-06-09 18:28:01 浏览: 21
### 修复 Ultralytics YOLOv8 训练过程中因 multiprocessing 和 torchvision::nms CUDA 后端引发的 RuntimeError
在使用 Ultralytics YOLOv8 模型进行训练时,如果遇到 `torchvision::nms` 不支持 `CUDA` 后端的问题,以及与 `multiprocessing` 相关的初始化错误,可以通过以下方法解决。
---
#### 1. 解决 `torchvision::nms` 不支持 `CUDA` 后端问题
此问题通常是由 `torchvision` 版本与 PyTorch 版本不匹配引起的。确保安装的 `torchvision` 版本与当前使用的 PyTorch 版本兼容,并且支持 CUDA 后端。
- **检查版本兼容性**:
使用以下命令检查当前安装的 PyTorch 和 `torchvision` 版本是否匹配[^1]。
```bash
pip show torch torchvision
```
如果版本不匹配,卸载并重新安装正确的版本:
```bash
pip uninstall torchvision
pip install torchvision==<compatible_version> -f https://download.pytorch.org/whl/torch_stable.html
```
- **验证 CUDA 支持**:
确保 CUDA 已正确安装并且被 PyTorch 检测到。运行以下代码验证:
```python
import torch
print(torch.cuda.is_available()) # 应返回 True
print(torch.version.cuda) # 应显示 CUDA 版本号
```
- **从源码编译 `torchvision`**:
如果无法找到与 CUDA 版本匹配的预编译二进制文件,可以从源码编译 `torchvision` 并启用 CUDA 支持:
```bash
git clone https://github.com/pytorch/vision.git
cd vision
python setup.py install
```
---
#### 2. 解决 `multiprocessing` 初始化问题
如果在训练过程中出现 `RuntimeError`,提示新进程启动前主进程未完成初始化,则需要确保代码块受到 `if __name__ == '__main__':` 的保护。
- **修改训练脚本**:
在 `train.py` 文件中,将训练逻辑包裹在 `if __name__ == '__main__':` 中:
```python
if __name__ == '__main__':
model = YOLO('yolov8m.pt') # 加载模型
results = model.train(data='data.yaml', epochs=600, imgsz=640)
```
- **禁用 DataParallel**:
如果训练过程中识别到多个 GPU 并尝试并行计算,但只有一个 GPU 可用,则可能导致问题。可以通过注释掉 `DataParallel` 部分来避免此问题[^2]:
```python
# DP mode if cuda and rank == -1 and torch.cuda.device_count() > 1:
# model = torch.nn.DataParallel(model)
```
---
#### 3. 替代方案:使用 CPU 或第三方实现
如果上述方法均无效,可以考虑以下替代方案:
- **切换至 CPU**:
将数据从 GPU 移动到 CPU 上运行 `nms` 函数:
```python
boxes = boxes.cpu()
scores = scores.cpu()
indices = torchvision.ops.nms(boxes, scores, iou_threshold=0.5)
```
- **使用第三方 NMS 实现**:
替换为支持 CUDA 的第三方 NMS 实现,例如 `detectron2` 提供的高效实现:
```python
from detectron2.layers import nms
boxes = torch.tensor([[0, 0, 10, 10], [1, 1, 11, 11]], dtype=torch.float32, device='cuda')
scores = torch.tensor([0.9, 0.8], dtype=torch.float32, device='cuda')
iou_threshold = 0.5
indices = nms(boxes, scores, iou_threshold)
print(indices)
```
---
#### 4. 其他注意事项
- **环境配置**:
确保所有依赖项已正确安装,并且 CUDA 路径已正确配置。可以尝试重新创建虚拟环境以排除潜在冲突。
- **日志记录**:
使用详细的日志记录工具(如 `logging` 模块)捕获训练过程中的异常信息,以便进一步排查问题。
---
### 示例代码
以下是一个完整的训练脚本示例,包含必要的保护措施和替代实现:
```python
import torch
from ultralytics import YOLO
if __name__ == '__main__':
# 加载模型
model = YOLO('yolov8m.pt')
try:
# 尝试在 GPU 上训练
results = model.train(data='data.yaml', epochs=600, imgsz=640, device='cuda')
except NotImplementedError:
# 如果 CUDA 后端不可用,切换至 CPU
print("Switching to CPU due to CUDA backend issue.")
results = model.train(data='data.yaml', epochs=600, imgsz=640, device='cpu')
```
---
###
阅读全文
相关推荐


















