ERROR 03-08 16:32:07 engine.py:400] CUDA out of memory. Tried to allocate 896.00 MiB. GPU 0 has a total capacity of 23.59 GiB of which 307.06 MiB is free. Including non-PyTorch memory, this process has 23.28 GiB memory in use. Of the allocated memory 22.99 GiB is allocated by PyTorch, and 1.57 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

时间: 2025-06-14 13:08:07 浏览: 22
<think>好的,我现在遇到了PyTorch的CUDA内存不足的问题,需要解决。用户提到了设置PYTORCH_CUDA_ALLOC_CONF但无效,还有expandable_segments的配置。首先,我需要理解这个错误的原因。错误信息显示显存总容量是4GB,已经分配了6.68GB,但实际总容量只有4GB,这看起来可能有问题,或者是不是显存被其他进程占用了?或者是不是PyTorch的内存管理有问题? 根据引用[1],PyTorch会保留一部分内存,如果保留的内存远大于已分配的,可能需要设置max_split_size_mb来减少内存碎片。而用户可能已经尝试过设置这个环境变量但没用。引用[2]和[3]建议设置expandable_segments为True,这可能帮助PyTorch更灵活地分配内存,减少碎片。那用户可能没有正确设置这个环境变量,或者在Windows和Linux下的设置方式不同? 另外,引用[4]中的代码示例里,用户显式地使用del释放中间变量,可能有助于释放显存。同时,循环中的变量如果没有及时释放,可能导致累积占用。引用[5]提到共享GPU内存的问题,可能可以调整专用内存和共享内存的使用比例? 接下来,我需要整理解决这个问题的步骤。首先检查是否有其他进程占用显存,然后尝试减小batch size,或者使用更小的模型。然后正确设置环境变量,比如在Linux下用export,在Windows下用set。还可以在代码中手动清空缓存,或者使用with torch.no_grad()来减少内存占用。另外,检查是否有内存泄漏,比如不必要地累积梯度,或者没有释放不需要的张量。 可能需要分步骤来解答:1. 检查显存占用情况;2. 调整环境变量;3. 优化代码中的内存使用;4. 其他方法如使用混合精度训练,或者梯度累积。同时,要注意用户是否在推理阶段,这时候可以禁用梯度计算。另外,是否可以考虑使用内存更高效的数据类型,比如float16?或者使用checkpointing技术? 用户的问题是关于如何配置expandable_segments的,所以需要详细说明如何正确设置这个环境变量,不同的操作系统下的方法。同时,需要结合其他方法,比如及时释放中间变量,减小输入尺寸等。可能还需要解释为什么设置expandable_segments有效,比如它允许PyTorch动态扩展内存段,减少碎片。 另外,引用[4]中的例子,用户可能在循环中反复推理,但没有正确释放输出,导致显存累积。使用del语句后加上torch.cuda.empty_cache()可能会帮助释放。不过需要注意,del只是减少引用计数,而empty_cache才会真正释放缓存,但PyTorch的内存管理可能不会立即释放,所以这个方法可能效果有限。 总结可能的解决方案步骤,并确保每个步骤都有对应的引用支持。同时,要提醒用户综合使用多种方法,因为单一方法可能不足以解决问题。比如,设置环境变量加上减小batch size,再加上及时释放变量,可能更有效。</think>要解决PyTorch的`CUDA out of memory`错误,需结合显存优化策略和环境变量调整。以下是分步解决方案: --- ### 1. **检查显存占用状态** - 运行`nvidia-smi`(Linux)或通过任务管理器(Windows)查看显存占用,确认是否有其他进程占用显存。若存在,关闭无关进程。 - 若显存不足是模型/数据导致,需进一步优化代码。 --- ### 2. **设置环境变量`PYTORCH_CUDA_ALLOC_CONF`** - **作用**:通过允许内存段动态扩展,减少显存碎片[^2][^3]。 - **操作方法**: - **Linux**:在终端执行: ```bash export PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True ``` - **Windows**:在命令行或代码中设置: ```cmd set PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True ``` - **注意**:需在启动Python脚本前设置此变量,且对部分版本PyTorch可能需更新至最新版[^3]。 --- ### 3. **优化代码中的显存使用** - **减小批量大小(Batch Size)**: - 将输入数据的批量大小从`8`降至`4`或更低,例如: ```python in_ten = torch.randn(4, 3, 640, 640).cuda() # 原为8 ``` - **释放无用变量**: - 在循环中手动删除中间结果并清空缓存: ```python for i in range(8): out, out16, out32 = net(in_ten) # 释放显存 del out, out16, out32 torch.cuda.empty_cache() # 强制清空缓存[^4] ``` - **禁用梯度计算(推理阶段)**: - 使用`torch.no_grad()`减少显存占用: ```python with torch.no_grad(): out, out16, out32 = net(in_ten) ``` --- ### 4. **其他优化方法** - **混合精度训练**:使用`torch.cuda.amp`自动选择`float16`或`float32`,减少显存占用。 - **梯度检查点(Gradient Checkpointing)**:通过时间换空间,减少反向传播的中间变量[^5]。 - **共享显存配置**:调整专用显存与共享显存的比例(需结合硬件支持)[^5]。 --- ### 5. **验证是否解决** - 若仍报错,尝试逐步减小模型输入尺寸或简化模型结构,定位显存瓶颈。 --- ### 示例代码优化对比 ```python # 优化前(显存占用高) in_ten = torch.randn(8, 3, 640, 640).cuda() # 优化后(显存占用降低) in_ten = torch.randn(4, 3, 320, 320).cuda() # 减小批量和输入尺寸 with torch.no_grad(): # 禁用梯度 for i in range(8): out, out16, out32 = net(in_ten) del out, out16, out32 torch.cuda.empty_cache() ``` ---
阅读全文

相关推荐

INFO 07-06 22:09:23 __init__.py:207] Automatically detected platform cuda. (VllmWorkerProcess pid=10542) INFO 07-06 22:09:23 multiproc_worker_utils.py:229] Worker ready; awaiting tasks (VllmWorkerProcess pid=10542) INFO 07-06 22:09:24 cuda.py:178] Cannot use FlashAttention-2 backend for Volta and Turing GPUs. (VllmWorkerProcess pid=10542) INFO 07-06 22:09:24 cuda.py:226] Using XFormers backend. (VllmWorkerProcess pid=10542) INFO 07-06 22:09:25 utils.py:916] Found nccl from library libnccl.so.2 (VllmWorkerProcess pid=10542) INFO 07-06 22:09:25 pynccl.py:69] vLLM is using nccl==2.21.5 INFO 07-06 22:09:25 utils.py:916] Found nccl from library libnccl.so.2 INFO 07-06 22:09:25 pynccl.py:69] vLLM is using nccl==2.21.5 INFO 07-06 22:09:25 custom_all_reduce_utils.py:206] generating GPU P2P access cache in /root/.cache/vllm/gpu_p2p_access_cache_for_0,1.json INFO 07-06 22:09:39 custom_all_reduce_utils.py:244] reading GPU P2P access cache from /root/.cache/vllm/gpu_p2p_access_cache_for_0,1.json (VllmWorkerProcess pid=10542) INFO 07-06 22:09:39 custom_all_reduce_utils.py:244] reading GPU P2P access cache from /root/.cache/vllm/gpu_p2p_access_cache_for_0,1.json WARNING 07-06 22:09:39 custom_all_reduce.py:145] Custom allreduce is disabled because your platform lacks GPU P2P capability or P2P test failed. To silence this warning, specify disable_custom_all_reduce=True explicitly. (VllmWorkerProcess pid=10542) WARNING 07-06 22:09:39 custom_all_reduce.py:145] Custom allreduce is disabled because your platform lacks GPU P2P capability or P2P test failed. To silence this warning, specify disable_custom_all_reduce=True explicitly. INFO 07-06 22:09:39 shm_broadcast.py:258] vLLM message queue communication handle: Handle(connect_ip='127.0.0.1', local_reader_ranks=[1], buffer_handle=(1, 4194304, 6, 'psm_b99a0acb'), local_subscribe_port=57279, remote_subscribe_port=None) INFO 07-06 22:09:39 model_runner.py:1110] Starting to load model /home/user/Desktop/DeepSeek-R1-Distill-Qwen-32B... (VllmWorkerProcess pid=10542) INFO 07-06 22:09:39 model_runner.py:1110] Starting to load model /home/user/Desktop/DeepSeek-R1-Distill-Qwen-32B... ERROR 07-06 22:09:39 engine.py:400] CUDA out of memory. Tried to allocate 136.00 MiB. GPU 0 has a total capacity of 21.49 GiB of which 43.50 MiB is free. Including non-PyTorch memory, this process has 21.44 GiB memory in use. Of the allocated memory 21.19 GiB is allocated by PyTorch, and 20.25 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables) ERROR 07-06 22:09:39 engine.py:400] Traceback (most recent call last): ERROR 07-06 22:09:39 engine.py:400] File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/engine/multiprocessing/engine.py", line 391, in run_mp_engine ERROR 07-06 22:09:39 engine.py:400] engine = MQLLMEngine.from_engine_args(engine_args=engine_args, ERROR 07-06 22:09:39 engine.py:400] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 07-06 22:09:39 engine.py:400] File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/engine/multiprocessing/engine.py", line 124, in from_engine_args ERROR 07-06 22:09:39 engine.py:400] return cls(ipc_path=ipc_path, ERROR 07-06 22:09:39 engine.py:400] ^^^^^^^^^^^^^^^^^^^^^^ ERROR 07-06 22:09:39 engine.py:400] File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/engine/multiprocessing/engine.py", line 76, in __init__ ERROR 07-06 22:09:39 engine.py:400] self.engine = LLMEngine(*args, **kwargs) ERROR 07-06 22:09:39 engine.py:400] ^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 07-06 22:09:39 engine.py:400] File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/engine/llm_engine.py", line 273, in __init__ ERROR 07-06 22:09:39 engine.py:400] self.model_executor = executor_class(vllm_config=vllm_config, ) ERROR 07-06 22:09:39 engine.py:400] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 07-06 22:09:39 engine.py:400] File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/executor/executor_base.py", line 271, in __init__ ERROR 07-06 22:09:39 engine.py:400] super().__init__(*args, **kwargs) ERROR 07-06 22:09:39 engine.py:400] File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/executor/executor_base.py", line 52, in __init__ ERROR 07-06 22:09:39 engine.py:400] self._init_executor() ERROR 07-06 22:09:39 engine.py:400] File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/executor/mp_distributed_executor.py", line 125, in _init_executor ERROR 07-06 22:09:39 engine.py:400] self._run_workers("load_model", ERROR 07-06 22:09:39 engine.py:400] File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/executor/mp_distributed_executor.py", line 185, in _run_workers ERROR 07-06 22:09:39 engine.py:400] driver_worker_output = run_method(self.driver_worker, sent_method, ERROR 07-06 22:09:39 engine.py:400] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 07-06 22:09:39 engine.py:400] File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/utils.py", line 2196, in run_method ERROR 07-06 22:09:39 engine.py:400] return func(*args, **kwargs) ERROR 07-06 22:09:39 engine.py:400] ^^^^^^^^^^^^^^^^^^^^^ ERROR 07-06 22:09:39 engine.py:400] File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/worker/worker.py", line 183, in load_model ERROR 07-06 22:09:39 engine.py:400] self.model_runner.load_model() ERROR 07-06 22:09:39 engine.py:400] File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/worker/model_runner.py", line 1112, in load_model ERROR 07-06 22:09:39 engine.py:400] self.model = get_model(vllm_config=self.vllm_config) ERROR 07-06 22:09:39 engine.py:400] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 07-06 22:09:39 engine.py:400] File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/model_executor/model_loader/__init__.py", line 14, in get_model ERROR 07-06 22:09:39 engine.py:400] return loader.load_model(vllm_config=vllm_config) ERROR 07-06 22:09:39 engine.py:400] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 07-06 22:09:39 engine.py:400] File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/model_executor/model_loader/loader.py", line 406, in load_model ERROR 07-06 22:09:39 engine.py:400] model = _initialize_model(vllm_config=vllm_config) ERROR 07-06 22:09:39 engine.py:400] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 07-06 22:09:39 engine.py:400] File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/model_executor/model_loader/loader.py", line 125, in _initialize_model ERROR 07-06 22:09:39 engine.py:400] return model_class(vllm_config=vllm_config, prefix=prefix) ERROR 07-06 22:09:39 engine.py:400] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 07-06 22:09:39 engine.py:400] File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/model_executor/models/qwen2.py", line 453, in __init__ ERROR 07-06 22:09:39 engine.py:400] self.model = Qwen2Model(vllm_config=vllm_config, ERROR 07-06 22:09:39 engine.py:400] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 07-06 22:09:39 engine.py:400] File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/compilation/decorators.py", line 151, in __init__ ERROR 07-06 22:09:39 engine.py:400] old_init(self, vllm_config=vllm_config, prefix=prefix, **kwargs) ERROR 07-06 22:09:39 engine.py:400] File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/model_executor/models/qwen2.py", line 307, in __init__ ERROR 07-06 22:09:39 engine.py:400] self.start_layer, self.end_layer, self.layers = make_layers( ERROR 07-06 22:09:39 engine.py:400] ^^^^^^^^^^^^ ERROR 07-06 22:09:39 engine.py:400] File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/model_executor/models/utils.py", line 557, in make_layers ERROR 07-06 22:09:39 engine.py:400] [PPMissingLayer() for _ in range(start_layer)] + [ ERROR 07-06 22:09:39 engine.py:400] ^ ERROR 07-06 22:09:39 engine.py:400] File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/model_executor/models/utils.py", line 558, in ERROR 07-06 22:09:39 engine.py:400] maybe_offload_to_cpu(layer_fn(prefix=f"{prefix}.{idx}")) ERROR 07-06 22:09:39 engine.py:400] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 07-06 22:09:39 engine.py:400] File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/model_executor/models/qwen2.py", line 309, in <lambda> ERROR 07-06 22:09:39 engine.py:400] lambda prefix: Qwen2DecoderLayer(config=config, ERROR 07-06 22:09:39 engine.py:400] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 07-06 22:09:39 engine.py:400] File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/model_executor/models/qwen2.py", line 220, in __init__ ERROR 07-06 22:09:39 engine.py:400] self.mlp = Qwen2MLP( ERROR 07-06 22:09:39 engine.py:400] ^^^^^^^^^ ERROR 07-06 22:09:39 engine.py:400] File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/model_executor/models/qwen2.py", line 82, in __init__ ERROR 07-06 22:09:39 engine.py:400] self.down_proj = RowParallelLinear( ERROR 07-06 22:09:39 engine.py:400] ^^^^^^^^^^^^^^^^^^ ERROR 07-06 22:09:39 engine.py:400] File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/model_executor/layers/linear.py", line 1062, in __init__ ERROR 07-06 22:09:39 engine.py:400] self.quant_method.create_weights( ERROR 07-06 22:09:39 engine.py:400] File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/model_executor/layers/linear.py", line 129, in create_weights ERROR 07-06 22:09:39 engine.py:400] weight = Parameter(torch.empty(sum(output_partition_sizes), ERROR 07-06 22:09:39 engine.py:400] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 07-06 22:09:39 engine.py:400] File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/torch/utils/_device.py", line 106, in __torch_function__ ERROR 07-06 22:09:39 engine.py:400] return func(*args, **kwargs) ERROR 07-06 22:09:39 engine.py:400] ^^^^^^^^^^^^^^^^^^^^^ ERROR 07-06 22:09:39 engine.py:400] torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 136.00 MiB. GPU 0 has a total capacity of 21.49 GiB of which 43.50 MiB is free. Including non-PyTorch memory, this process has 21.44 GiB memory in use. Of the allocated memory 21.19 GiB is allocated by PyTorch, and 20.25 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables) Process SpawnProcess-1: INFO 07-06 22:09:39 multiproc_worker_utils.py:128] Killing local vLLM worker processes Traceback (most recent call last): File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/multiprocessing/process.py", line 314, in _bootstrap self.run() File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/multiprocessing/process.py", line 108, in run self._target(*self._args, **self._kwargs) File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/engine/multiprocessing/engine.py", line 402, in run_mp_engine raise e File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/engine/multiprocessing/engine.py", line 391, in run_mp_engine engine = MQLLMEngine.from_engine_args(engine_args=engine_args, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/engine/multiprocessing/engine.py", line 124, in from_engine_args return cls(ipc_path=ipc_path, ^^^^^^^^^^^^^^^^^^^^^^ File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/engine/multiprocessing/engine.py", line 76, in __init__ self.engine = LLMEngine(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/engine/llm_engine.py", line 273, in __init__ self.model_executor = executor_class(vllm_config=vllm_config, ) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/executor/executor_base.py", line 271, in __init__ super().__init__(*args, **kwargs) File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/executor/executor_base.py", line 52, in __init__ self._init_executor() File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/executor/mp_distributed_executor.py", line 125, in _init_executor self._run_workers("load_model", File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/executor/mp_distributed_executor.py", line 185, in _run_workers driver_worker_output = run_method(self.driver_worker, sent_method, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/utils.py", line 2196, in run_method return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/worker/worker.py", line 183, in load_model self.model_runner.load_model() File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/worker/model_runner.py", line 1112, in load_model self.model = get_model(vllm_config=self.vllm_config) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/model_executor/model_loader/__init__.py", line 14, in get_model return loader.load_model(vllm_config=vllm_config) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/model_executor/model_loader/loader.py", line 406, in load_model model = _initialize_model(vllm_config=vllm_config) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/model_executor/model_loader/loader.py", line 125, in _initialize_model return model_class(vllm_config=vllm_config, prefix=prefix) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/model_executor/models/qwen2.py", line 453, in __init__ self.model = Qwen2Model(vllm_config=vllm_config, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/compilation/decorators.py", line 151, in __init__ old_init(self, vllm_config=vllm_config, prefix=prefix, **kwargs) File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/model_executor/models/qwen2.py", line 307, in __init__ self.start_layer, self.end_layer, self.layers = make_layers( ^^^^^^^^^^^^ File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/model_executor/models/utils.py", line 557, in make_layers [PPMissingLayer() for _ in range(start_layer)] + [ ^ File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/model_executor/models/utils.py", line 558, in maybe_offload_to_cpu(layer_fn(prefix=f"{prefix}.{idx}")) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/model_executor/models/qwen2.py", line 309, in <lambda> lambda prefix: Qwen2DecoderLayer(config=config, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/model_executor/models/qwen2.py", line 220, in __init__ self.mlp = Qwen2MLP( ^^^^^^^^^ File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/model_executor/models/qwen2.py", line 82, in __init__ self.down_proj = RowParallelLinear( ^^^^^^^^^^^^^^^^^^ File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/model_executor/layers/linear.py", line 1062, in __init__ self.quant_method.create_weights( File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/model_executor/layers/linear.py", line 129, in create_weights weight = Parameter(torch.empty(sum(output_partition_sizes), ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/torch/utils/_device.py", line 106, in __torch_function__ return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 136.00 MiB. GPU 0 has a total capacity of 21.49 GiB of which 43.50 MiB is free. Including non-PyTorch memory, this process has 21.44 GiB memory in use. Of the allocated memory 21.19 GiB is allocated by PyTorch, and 20.25 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables) [rank0]:[W706 22:09:40.345098611 ProcessGroupNCCL.cpp:1250] Warning: WARNING: process group has NOT been destroyed before we destruct ProcessGroupNCCL. On normal program exit, the application should call destroy_process_group to ensure that any pending NCCL operations have finished in this process. In rare cases this process can exit before this point and block the progress of another member of the process group. This constraint has always been present, but this warning has only been added since PyTorch 2.4 (function operator()) Traceback (most recent call last): File "/root/miniconda3/envs/deepseek_vllm/bin/vllm", line 8, in <module> sys.exit(main()) ^^^^^^ File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/entrypoints/cli/main.py", line 73, in main args.dispatch_function(args) File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/entrypoints/cli/serve.py", line 34, in cmd uvloop.run(run_server(args)) File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/uvloop/__init__.py", line 105, in run return runner.run(wrapper()) ^^^^^^^^^^^^^^^^^^^^^ File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/asyncio/runners.py", line 118, in run return self._loop.run_until_complete(task) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/uvloop/__init__.py", line 61, in wrapper return await main ^^^^^^^^^^ File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 947, in run_server async with build_async_engine_client(args) as engine_client: File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/contextlib.py", line 210, in __aenter__ return await anext(self.gen) ^^^^^^^^^^^^^^^^^^^^^ File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 139, in build_async_engine_client async with build_async_engine_client_from_engine_args( File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/contextlib.py", line 210, in __aenter__ return await anext(self.gen) ^^^^^^^^^^^^^^^^^^^^^ File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 233, in build_async_engine_client_from_engine_args raise RuntimeError( RuntimeError: Engine process failed to start. See stack trace for the root cause. (deepseek_vllm) root@user-X99:/home/user/Desktop# /root/miniconda3/envs/deepseek_vllm/lib/python3.11/multiprocessing/resource_tracker.py:254: UserWarning: resource_tracker: There appear to be 1 leaked shared_memory objects to clean up at shutdown warnings.warn('resource_tracker: There appear to be %d ' (deepseek_vllm) root@user-X99:/home/user/Desktop#

大家在看

recommend-type

MATLAB 2019A 中文文档.pdf

文件包含2019年最新版本的matlab 2019a 的中文参考文档,本文档未超级清晰版本,可以供大家学习matlab参考。
recommend-type

KYN61-40.5安装维护手册

KYN61-40.5安装维护手册
recommend-type

Local Dimming LED TV 背光驱动整体方案

目前,液晶电视的使用越来越广泛,在人们的日常生活中占据越来越重要的位置,而其消耗的能量也越来越引起人们的关注。因此,各个电视、液晶厂商都投入极大的物力、人力、财力加大研发力量去降低功耗,从技术发展趋势上来说,如何降低背光的功耗受到关注。因为背光源是的能量消耗者,降低了背光的功耗,也就大大降低了整机的功耗。这其中的技术包括改善背光源的驱动电路,改善LED的发光效率,开发新的LED种类,目前,Local Dimming是这些技术中易于实现,效果明显的一项技术。尤其是直下式LED背光搭配Local Dimming技术,可大幅度降低电量、提高显示画面对比值、灰阶数、及减少残影等。   1. Loca
recommend-type

ISO/IEC 27005:2022 英文原版

ISO/IEC 27005:2022 英文原版 ISO/IEC 27005:2022 — Information security, cybersecurity and privacy protection — Guidance on managing information security risks (fourth edition) ISO/IEC 27005:2022 — 信息安全、网络安全和隐私保护 — 管理信息安全风险指南(第四版)
recommend-type

Sublime Text 3.1.1 build 3176

Sublime Text 3.1.1 build 3176 64位绿色最新版,在 Windows 平台上,下载后直接解压,双击解压包内的邮件菜单.reg即可加入邮件菜单。

最新推荐

recommend-type

Screenshot_20250709_163758_com.tencent.tmgp.pubgmhd.jpg

Screenshot_20250709_163758_com.tencent.tmgp.pubgmhd.jpg
recommend-type

射击.cpp

射击
recommend-type

基于EasyX图形库的动画设计与C语言课程改革.docx

基于EasyX图形库的动画设计与C语言课程改革.docx
recommend-type

网络爬虫源代码.doc

网络爬虫源代码.doc
recommend-type

llcom-硬件开发资源

LuaTCPMQTT
recommend-type

飞思OA数据库文件下载指南

根据给定的文件信息,我们可以推断出以下知识点: 首先,从标题“飞思OA源代码[数据库文件]”可以看出,这里涉及的是一个名为“飞思OA”的办公自动化(Office Automation,简称OA)系统的源代码,并且特别提到了数据库文件。OA系统是用于企事业单位内部办公流程自动化的软件系统,它旨在提高工作效率、减少不必要的工作重复,以及增强信息交流与共享。 对于“飞思OA源代码”,这部分信息指出我们正在讨论的是OA系统的源代码部分,这通常意味着软件开发者或维护者拥有访问和修改软件底层代码的权限。源代码对于开发人员来说非常重要,因为它是软件功能实现的直接体现,而数据库文件则是其中的一个关键组成部分,用来存储和管理用户数据、业务数据等信息。 从描述“飞思OA源代码[数据库文件],以上代码没有数据库文件,请从这里下”可以分析出以下信息:虽然文件列表中提到了“DB”,但实际在当前上下文中,并没有提供包含完整数据库文件的下载链接或直接说明,这意味着如果用户需要获取完整的飞思OA系统的数据库文件,可能需要通过其他途径或者联系提供者获取。 文件的标签为“飞思OA源代码[数据库文件]”,这与标题保持一致,表明这是一个与飞思OA系统源代码相关的标签,而附加的“[数据库文件]”特别强调了数据库内容的重要性。在软件开发中,标签常用于帮助分类和检索信息,所以这个标签在这里是为了解释文件内容的属性和类型。 文件名称列表中的“DB”很可能指向的是数据库文件。在一般情况下,数据库文件的扩展名可能包括“.db”、“.sql”、“.mdb”、“.dbf”等,具体要看数据库的类型和使用的数据库管理系统(如MySQL、SQLite、Access等)。如果“DB”是指数据库文件,那么它很可能是以某种形式的压缩文件或包存在,这从“压缩包子文件的文件名称列表”可以推测。 针对这些知识点,以下是一些详细的解释和补充: 1. 办公自动化(OA)系统的构成: - OA系统由多个模块组成,比如工作流管理、文档管理、会议管理、邮件系统、报表系统等。 - 系统内部的流程自动化能够实现任务的自动分配、状态跟踪、结果反馈等。 - 通常,OA系统会提供用户界面来与用户交互,如网页形式的管理界面。 2. 数据库文件的作用: - 数据库文件用于存储数据,是实现业务逻辑和数据管理的基础设施。 - 数据库通常具有数据的CRUD(创建、读取、更新、删除)功能,是信息检索和管理的核心组件。 - 数据库文件的结构和设计直接关系到系统的性能和可扩展性。 3. 数据库文件类型: - 根据数据库管理系统不同,数据库文件可以有不同格式。 - 例如,MySQL数据库的文件通常是“.frm”文件存储表结构,“.MYD”存储数据,“.MYI”存储索引。 - 对于SQLite,数据库就是一个单独的“.sqlite”文件。 4. 数据库设计和管理: - 数据库设计需要遵循一定的规范和最佳实践,如范式化以减少数据冗余。 - 管理数据库包括数据备份、恢复、性能调优、安全性管理等。 5. OA系统开发及源代码维护: - 开发一个OA系统涉及需求分析、系统设计、编码实现、测试、部署和维护等多个阶段。 - OA系统源代码的维护工作包括代码审查、重构、版本控制和更新等。 综上所述,我们可以知道“飞思OA源代码[数据库文件]”所提供的信息包括了对飞思OA系统及其数据库文件的基础介绍和可能的获取途径。对于开发者而言,需要了解如何管理和操作这些数据库文件以确保系统的正常运行和数据的完整性。此外,对于那些希望进行系统定制或二次开发的用户,源代码的可访问性是非常重要的,这可以允许用户根据自己的需求来修改和增强系统功能。
recommend-type

Qt信号与槽优化:提升系统性能与响应速度的实战技巧

# 摘要 本文系统地探讨了Qt框架中信号与槽机制的原理、高级特性、性能优化以及在大型项目中的应用。首先,概述了信号与槽的基本概念和工作原理,包括信号的发射、槽函数的绑定与调用过程。随后,本文深入分析了信号与槽的内部机制,如数据类型兼容性和连接类型,并探讨了优化信
recommend-type

D8流向算法

<think>我们被要求解释或实现D8流向算法。D8流向算法是一种用于水文分析的数字高程模型(DEM)处理方法,用于确定地表水流的方向。每个像元的水流方向被指定为指向周围8个相邻像元中坡度最陡的那个方向。 ### 算法原理 在D8算法中,每个像元的水流方向被定义为指向其8个相邻像元(包括对角线方向)中坡度最大的方向。坡度由高程差除以距离计算,其中相邻像元的距离为1(水平和垂直方向)或√2(对角线方向)。具体步骤如下: 1. 对于中心像元,计算其与8个相邻像元的高程差(中心像元高程减去相邻像元高程,得到正值表示下坡)。 2. 计算每个相邻方向的坡度:坡度 = 高程差 / 距离(水平/垂直方向
recommend-type

精选36个精美ICO图标免费打包下载

在当今的软件开发和应用程序设计中,图标作为图形用户界面(GUI)的一个重要组成部分,承担着向用户传达信息、增加美观性和提高用户体验的重要角色。图标不仅仅是一个应用程序或文件的象征,它还是品牌形象在数字世界中的延伸。因此,开发人员和设计师往往会对默认生成的图标感到不满意,从而寻找更加精美和个性化的图标资源。 【标题】中提到的“精美ICO图标打包下载”,指向用户提供的是一组精选的图标文件,这些文件格式为ICO。ICO文件是一种图标文件格式,主要被用于Windows操作系统中的各种文件和应用程序的图标。由于Windows系统的普及,ICO格式的图标在软件开发中有着广泛的应用。 【描述】中提到的“VB、VC编写应用的自带图标很难看,换这些试试”,提示我们这个ICO图标包是专门为使用Visual Basic(VB)和Visual C++(VC)编写的应用程序准备的。VB和VC是Microsoft公司推出的两款编程语言,其中VB是一种主要面向初学者的面向对象编程语言,而VC则是更加专业化的C++开发环境。在这些开发环境中,用户可以选择自定义应用程序的图标,以提升应用的视觉效果和用户体验。 【标签】中的“.ico 图标”直接告诉我们,这些打包的图标是ICO格式的。在设计ICO图标时,需要注意其独特的尺寸要求,因为ICO格式支持多种尺寸的图标,例如16x16、32x32、48x48、64x64、128x128等像素尺寸,甚至可以包含高DPI版本以适应不同显示需求。此外,ICO文件通常包含多种颜色深度的图标,以便在不同的背景下提供最佳的显示效果。 【压缩包子文件的文件名称列表】显示了这些精美ICO图标的数量,即“精美ICO图标36个打包”。这意味着该压缩包内包含36个不同的ICO图标资源。对于软件开发者和设计师来说,这意味着他们可以从这36个图标中挑选适合其应用程序或项目的图标,以替代默认的、可能看起来不太吸引人的图标。 在实际应用中,将这些图标应用到VB或VC编写的程序中,通常需要编辑程序的资源文件或使用相应的开发环境提供的工具进行图标更换。例如,在VB中,可以通过资源编辑器选择并替换程序的图标;而在VC中,则可能需要通过设置项目属性来更改图标。由于Windows系统支持在编译应用程序时将图标嵌入到可执行文件(EXE)中,因此一旦图标更换完成并重新编译程序,新图标就会在程序运行时显示出来。 此外,当谈及图标资源时,还应当了解图标制作的基本原则和技巧,例如:图标设计应简洁明了,以传达清晰的信息;色彩运用需考虑色彩搭配的美观性和辨识度;图标风格要与应用程序的整体设计风格保持一致,等等。这些原则和技巧在选择和设计图标时都非常重要。 总结来说,【标题】、【描述】、【标签】和【压缩包子文件的文件名称列表】共同勾勒出了一个为VB和VC编程语言用户准备的ICO图标资源包。开发者通过下载和使用这些图标,能够有效地提升应用程序的外观和用户体验。在这一过程中,了解和应用图标设计与应用的基本知识至关重要。
recommend-type

【Qt数据库融合指南】:MySQL与Qt无缝集成的技巧

# 摘要 本文全面探讨了Qt数据库集成的基础知识与进阶应用,从Qt与MySQL的基础操作讲起,深入到Qt数据库编程接口的配置与使用,并详细介绍了数据模型和视图的实现。随着章节的深入,内容逐渐从基础的数据操作界面构建过渡到高级数据库操作实践,涵盖了性能优化、安全性策略和事务管理。本文还特别针对移动设备上的数据库集成进行了讨