活动介绍

我使用以下指令 ./bin/ollama create qwen2.5_3b -f Modelfile 但卻出現以下錯誤 Error: path or modelfile are required 為了檢視錯誤, 我嘗試在Modelfile移除 FROM ./path to model 這段後 , 直接顯示 Error: no FROM line 表示其實有抓到modelfile的內容, 但為何還會出現Error: path or modelfile are required呢?

时间: 2025-07-19 09:30:45 浏览: 2
<think>我们正在解决用户在使用Ollama创建模型时遇到的错误:`error: path or modelfile are required`。用户已经确认在移除Modelfile中的`FROM`行时,错误信息会变成缺少`FROM`指令,这说明Ollama确实读取了Modelfile,但为什么还会出现路径或Modelfile不存在的错误呢? ### 问题分析 根据Ollama的命令行使用规范,`ollama create`命令需要两个关键信息: 1. 模型名称(例如用户命令中的`qwen2.5_3b`) 2. Modelfile的路径(通过`-f`参数指定) 错误信息`path or modelfile are required`通常意味着: - 指定的Modelfile路径不存在 - 路径拼写错误或权限问题导致无法访问 - `-f`参数后没有正确跟随文件路径 ### 可能的原因及解决方案 #### 1. **文件路径不正确** - 请确保当前工作目录下存在Modelfile文件,并且路径拼写正确(注意大小写和特殊字符)。 - 使用相对路径时,`./Modelfile`表示当前目录下的Modelfile文件。如果文件不在当前目录,需要调整路径。 - **验证方法**:在终端执行`ls ./Modelfile`(或实际路径),确认文件是否存在。 #### 2. **文件权限问题** - 如果Modelfile文件没有读取权限,Ollama将无法读取其内容。 - **解决方法**:使用`chmod +r ./Modelfile`赋予读取权限。 #### 3. **Modelfile内容格式错误** - 虽然用户提到删除`FROM`行会报错,但可能还有其他格式问题导致解析失败。例如: - 指令拼写错误(如`FRMO`代替`FROM`) - 参数格式不正确(如缺少必要的空格) - **验证方法**:逐行检查Modelfile,确保符合[Ollama Modelfile规范](https://github.com/ollama/ollama/blob/main/docs/modelfile.md)。 #### 4. **Ollama版本问题** - 旧版本Ollama可能存在解析Bug。建议升级到最新版本: ```bash curl -fsSL https://ollama.com/install.sh | sh ``` #### 5. **命令执行环境问题** - 在Docker或虚拟环境中运行时,路径可能映射错误。确保容器内能访问宿主机文件。 --- ### 逐步解决方案 #### 步骤1:验证Modelfile路径 在终端执行: ```bash ls -l ./Modelfile ``` 如果文件存在,输出类似: ``` -rw-r--r-- 1 user group 1024 Jun 15 10:00 ./Modelfile ``` 如果不存在,请检查路径或使用绝对路径(如`/home/user/Modelfile`)。 #### 步骤2:检查文件内容 使用`cat`命令打印Modelfile内容: ```bash cat ./Modelfile ``` 确保其包含有效指令,例如: ```modelfile # 正确示例 FROM qwen:3b # 基础模型名称 PARAMETER temperature 0.8 ``` #### 步骤3:显式指定绝对路径 如果相对路径无效,改用绝对路径: ```bash ollama create qwen2.5_3b -f /absolute/path/to/Modelfile ``` #### 步骤4:以调试模式运行 添加`OLAMA_DEBUG=1`环境变量查看详细日志: ```bash OLAMA_DEBUG=1 ollama create qwen2.5_3b -f ./Modelfile ``` --- ### 示例:成功创建模型流程 假设Modelfile内容如下: ```modelfile # Modelfile for Qwen2.5-3B FROM qwen:3b PARAMETER num_ctx 4096 ``` 执行命令: ```bash ollama create qwen2.5_3b -f ./Modelfile ``` 成功输出: ``` >>> Successfully created model: qwen2.5_3b ``` --- ### 常见错误排查表 | 错误现象 | 可能原因 | 解决方法 | |---------------------------|------------------------|------------------------------| | `file not found` | 路径错误 | 使用`ls`验证文件存在性 | | `permission denied` | 权限不足 | 执行`chmod +r ./Modelfile` | | `invalid modelfile format`| 指令语法错误 | 检查`FROM`/`PARAMETER`拼写 | | 服务未响应 | Ollama后台未启动 | 先运行`ollama serve` | --- §§相关问题§§ 1. Ollama Modelfile 的完整语法规则是什么? 2. 如何解决 Ollama 模型创建过程中的权限问题? 3. Ollama 是否支持从远程 URL 加载 Modelfile? 4. 如何更新已创建的 Ollama 模型配置?</think>### 解决 Ollama 创建模型时出现 "path or modelfile are required" 错误 根据您的描述和调试过程,虽然 Ollama 能检测到 Modelfile 的存在(移除 `FROM` 会报不同错误),但仍提示 `error: path or modelfile are required`。这通常由以下原因导致: #### 1. **路径格式问题(最常见原因)** - **相对路径解析错误**:Ollama 可能无法正确解析 `./` 开头的相对路径 - **解决方案**: ```bash # 使用绝对路径替代相对路径 ./bin/ollama create qwen2.5_3b -f $(pwd)/Modelfile # 或直接进入文件所在目录执行 cd /path/to/modelfile/directory ./bin/ollama create qwen2.5_3b -f Modelfile ``` #### 2. **文件权限问题** - Ollama 进程可能无权读取 Modelfile - **验证与修复**: ```bash # 检查文件权限 ls -l Modelfile # 赋予读写权限(示例) chmod 644 Modelfile ``` #### 3. **Modelfile 编码或隐藏字符问题** - 文件可能包含 BOM 头或 Windows 换行符(`\r\n`) - **解决方案**: ```bash # 转换文件格式 dos2unix Modelfile # 或重新创建文件 echo "FROM ./your_model_path" > Modelfile echo "PARAMETER temperature 0.7" >> Modelfile ``` #### 4. **Ollama 版本兼容性问题** - 某些版本存在路径解析 bug - **解决步骤**: ```bash # 升级到最新版 curl -fsSL https://ollama.com/install.sh | sh # 或使用调试模式 OLAMA_DEBUG=1 ./bin/ollama create qwen2.5_3b -f Modelfile ``` #### 5. **文件路径包含特殊字符** - 空格或特殊符号(`!@#$%^&*`)会导致解析失败 - **解决方案**: ```bash # 重命名路径或使用引号包裹 ./bin/ollama create qwen2.5_3b -f "My Modelfile" ``` ### 完整排查流程 1. **验证文件可访问性**: ```bash ./bin/ollama serve & # 确保服务已启动 cat Modelfile | ./bin/ollama create qwen2.5_3b -f - # 通过标准输入测试 ``` 2. **最小化 Modelfile 测试**: ```modelfile # 最小可用示例 FROM ./your_model_directory ``` 3. **环境变量检查**: ```bash echo $OLAMA_MODELS # 检查是否设置了冲突的环境变量 unset OLAMA_MODELS # 临时取消设置 ``` > **重要提示**:Ollama 对本地模型路径的引用要求: > - 路径必须是**目录**而非文件 > - 目录需包含 `model` 文件和 `tokenizer.json` > - 目录结构示例: > ``` > your_model_directory/ > ├── model > ├── tokenizer.json > └── config.json > ``` ### 成功创建后的验证 ```bash # 检查模型列表 ./bin/ollama list # 运行测试 ./bin/ollama run qwen2.5_3b "Hello" ``` 通过以上步骤,90% 以上的类似错误可被解决。若问题仍存在,建议提供 `OLAMA_DEBUG=1` 的输出日志以便进一步分析[^1][^4]。 ---
阅读全文

相关推荐

INFO 07-06 22:09:23 __init__.py:207] Automatically detected platform cuda. (VllmWorkerProcess pid=10542) INFO 07-06 22:09:23 multiproc_worker_utils.py:229] Worker ready; awaiting tasks (VllmWorkerProcess pid=10542) INFO 07-06 22:09:24 cuda.py:178] Cannot use FlashAttention-2 backend for Volta and Turing GPUs. (VllmWorkerProcess pid=10542) INFO 07-06 22:09:24 cuda.py:226] Using XFormers backend. (VllmWorkerProcess pid=10542) INFO 07-06 22:09:25 utils.py:916] Found nccl from library libnccl.so.2 (VllmWorkerProcess pid=10542) INFO 07-06 22:09:25 pynccl.py:69] vLLM is using nccl==2.21.5 INFO 07-06 22:09:25 utils.py:916] Found nccl from library libnccl.so.2 INFO 07-06 22:09:25 pynccl.py:69] vLLM is using nccl==2.21.5 INFO 07-06 22:09:25 custom_all_reduce_utils.py:206] generating GPU P2P access cache in /root/.cache/vllm/gpu_p2p_access_cache_for_0,1.json INFO 07-06 22:09:39 custom_all_reduce_utils.py:244] reading GPU P2P access cache from /root/.cache/vllm/gpu_p2p_access_cache_for_0,1.json (VllmWorkerProcess pid=10542) INFO 07-06 22:09:39 custom_all_reduce_utils.py:244] reading GPU P2P access cache from /root/.cache/vllm/gpu_p2p_access_cache_for_0,1.json WARNING 07-06 22:09:39 custom_all_reduce.py:145] Custom allreduce is disabled because your platform lacks GPU P2P capability or P2P test failed. To silence this warning, specify disable_custom_all_reduce=True explicitly. (VllmWorkerProcess pid=10542) WARNING 07-06 22:09:39 custom_all_reduce.py:145] Custom allreduce is disabled because your platform lacks GPU P2P capability or P2P test failed. To silence this warning, specify disable_custom_all_reduce=True explicitly. INFO 07-06 22:09:39 shm_broadcast.py:258] vLLM message queue communication handle: Handle(connect_ip='127.0.0.1', local_reader_ranks=[1], buffer_handle=(1, 4194304, 6, 'psm_b99a0acb'), local_subscribe_port=57279, remote_subscribe_port=None) INFO 07-06 22:09:39 model_runner.py:1110] Starting to load model /home/user/Desktop/DeepSeek-R1-Distill-Qwen-32B... (VllmWorkerProcess pid=10542) INFO 07-06 22:09:39 model_runner.py:1110] Starting to load model /home/user/Desktop/DeepSeek-R1-Distill-Qwen-32B... ERROR 07-06 22:09:39 engine.py:400] CUDA out of memory. Tried to allocate 136.00 MiB. GPU 0 has a total capacity of 21.49 GiB of which 43.50 MiB is free. Including non-PyTorch memory, this process has 21.44 GiB memory in use. Of the allocated memory 21.19 GiB is allocated by PyTorch, and 20.25 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables) ERROR 07-06 22:09:39 engine.py:400] Traceback (most recent call last): ERROR 07-06 22:09:39 engine.py:400] File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/engine/multiprocessing/engine.py", line 391, in run_mp_engine ERROR 07-06 22:09:39 engine.py:400] engine = MQLLMEngine.from_engine_args(engine_args=engine_args, ERROR 07-06 22:09:39 engine.py:400] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 07-06 22:09:39 engine.py:400] File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/engine/multiprocessing/engine.py", line 124, in from_engine_args ERROR 07-06 22:09:39 engine.py:400] return cls(ipc_path=ipc_path, ERROR 07-06 22:09:39 engine.py:400] ^^^^^^^^^^^^^^^^^^^^^^ ERROR 07-06 22:09:39 engine.py:400] File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/engine/multiprocessing/engine.py", line 76, in __init__ ERROR 07-06 22:09:39 engine.py:400] self.engine = LLMEngine(*args, **kwargs) ERROR 07-06 22:09:39 engine.py:400] ^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 07-06 22:09:39 engine.py:400] File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/engine/llm_engine.py", line 273, in __init__ ERROR 07-06 22:09:39 engine.py:400] self.model_executor = executor_class(vllm_config=vllm_config, ) ERROR 07-06 22:09:39 engine.py:400] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 07-06 22:09:39 engine.py:400] File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/executor/executor_base.py", line 271, in __init__ ERROR 07-06 22:09:39 engine.py:400] super().__init__(*args, **kwargs) ERROR 07-06 22:09:39 engine.py:400] File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/executor/executor_base.py", line 52, in __init__ ERROR 07-06 22:09:39 engine.py:400] self._init_executor() ERROR 07-06 22:09:39 engine.py:400] File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/executor/mp_distributed_executor.py", line 125, in _init_executor ERROR 07-06 22:09:39 engine.py:400] self._run_workers("load_model", ERROR 07-06 22:09:39 engine.py:400] File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/executor/mp_distributed_executor.py", line 185, in _run_workers ERROR 07-06 22:09:39 engine.py:400] driver_worker_output = run_method(self.driver_worker, sent_method, ERROR 07-06 22:09:39 engine.py:400] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 07-06 22:09:39 engine.py:400] File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/utils.py", line 2196, in run_method ERROR 07-06 22:09:39 engine.py:400] return func(*args, **kwargs) ERROR 07-06 22:09:39 engine.py:400] ^^^^^^^^^^^^^^^^^^^^^ ERROR 07-06 22:09:39 engine.py:400] File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/worker/worker.py", line 183, in load_model ERROR 07-06 22:09:39 engine.py:400] self.model_runner.load_model() ERROR 07-06 22:09:39 engine.py:400] File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/worker/model_runner.py", line 1112, in load_model ERROR 07-06 22:09:39 engine.py:400] self.model = get_model(vllm_config=self.vllm_config) ERROR 07-06 22:09:39 engine.py:400] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 07-06 22:09:39 engine.py:400] File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/model_executor/model_loader/__init__.py", line 14, in get_model ERROR 07-06 22:09:39 engine.py:400] return loader.load_model(vllm_config=vllm_config) ERROR 07-06 22:09:39 engine.py:400] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 07-06 22:09:39 engine.py:400] File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/model_executor/model_loader/loader.py", line 406, in load_model ERROR 07-06 22:09:39 engine.py:400] model = _initialize_model(vllm_config=vllm_config) ERROR 07-06 22:09:39 engine.py:400] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 07-06 22:09:39 engine.py:400] File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/model_executor/model_loader/loader.py", line 125, in _initialize_model ERROR 07-06 22:09:39 engine.py:400] return model_class(vllm_config=vllm_config, prefix=prefix) ERROR 07-06 22:09:39 engine.py:400] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 07-06 22:09:39 engine.py:400] File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/model_executor/models/qwen2.py", line 453, in __init__ ERROR 07-06 22:09:39 engine.py:400] self.model = Qwen2Model(vllm_config=vllm_config, ERROR 07-06 22:09:39 engine.py:400] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 07-06 22:09:39 engine.py:400] File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/compilation/decorators.py", line 151, in __init__ ERROR 07-06 22:09:39 engine.py:400] old_init(self, vllm_config=vllm_config, prefix=prefix, **kwargs) ERROR 07-06 22:09:39 engine.py:400] File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/model_executor/models/qwen2.py", line 307, in __init__ ERROR 07-06 22:09:39 engine.py:400] self.start_layer, self.end_layer, self.layers = make_layers( ERROR 07-06 22:09:39 engine.py:400] ^^^^^^^^^^^^ ERROR 07-06 22:09:39 engine.py:400] File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/model_executor/models/utils.py", line 557, in make_layers ERROR 07-06 22:09:39 engine.py:400] [PPMissingLayer() for _ in range(start_layer)] + [ ERROR 07-06 22:09:39 engine.py:400] ^ ERROR 07-06 22:09:39 engine.py:400] File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/model_executor/models/utils.py", line 558, in ERROR 07-06 22:09:39 engine.py:400] maybe_offload_to_cpu(layer_fn(prefix=f"{prefix}.{idx}")) ERROR 07-06 22:09:39 engine.py:400] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 07-06 22:09:39 engine.py:400] File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/model_executor/models/qwen2.py", line 309, in <lambda> ERROR 07-06 22:09:39 engine.py:400] lambda prefix: Qwen2DecoderLayer(config=config, ERROR 07-06 22:09:39 engine.py:400] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 07-06 22:09:39 engine.py:400] File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/model_executor/models/qwen2.py", line 220, in __init__ ERROR 07-06 22:09:39 engine.py:400] self.mlp = Qwen2MLP( ERROR 07-06 22:09:39 engine.py:400] ^^^^^^^^^ ERROR 07-06 22:09:39 engine.py:400] File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/model_executor/models/qwen2.py", line 82, in __init__ ERROR 07-06 22:09:39 engine.py:400] self.down_proj = RowParallelLinear( ERROR 07-06 22:09:39 engine.py:400] ^^^^^^^^^^^^^^^^^^ ERROR 07-06 22:09:39 engine.py:400] File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/model_executor/layers/linear.py", line 1062, in __init__ ERROR 07-06 22:09:39 engine.py:400] self.quant_method.create_weights( ERROR 07-06 22:09:39 engine.py:400] File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/model_executor/layers/linear.py", line 129, in create_weights ERROR 07-06 22:09:39 engine.py:400] weight = Parameter(torch.empty(sum(output_partition_sizes), ERROR 07-06 22:09:39 engine.py:400] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 07-06 22:09:39 engine.py:400] File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/torch/utils/_device.py", line 106, in __torch_function__ ERROR 07-06 22:09:39 engine.py:400] return func(*args, **kwargs) ERROR 07-06 22:09:39 engine.py:400] ^^^^^^^^^^^^^^^^^^^^^ ERROR 07-06 22:09:39 engine.py:400] torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 136.00 MiB. GPU 0 has a total capacity of 21.49 GiB of which 43.50 MiB is free. Including non-PyTorch memory, this process has 21.44 GiB memory in use. Of the allocated memory 21.19 GiB is allocated by PyTorch, and 20.25 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables) Process SpawnProcess-1: INFO 07-06 22:09:39 multiproc_worker_utils.py:128] Killing local vLLM worker processes Traceback (most recent call last): File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/multiprocessing/process.py", line 314, in _bootstrap self.run() File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/multiprocessing/process.py", line 108, in run self._target(*self._args, **self._kwargs) File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/engine/multiprocessing/engine.py", line 402, in run_mp_engine raise e File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/engine/multiprocessing/engine.py", line 391, in run_mp_engine engine = MQLLMEngine.from_engine_args(engine_args=engine_args, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/engine/multiprocessing/engine.py", line 124, in from_engine_args return cls(ipc_path=ipc_path, ^^^^^^^^^^^^^^^^^^^^^^ File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/engine/multiprocessing/engine.py", line 76, in __init__ self.engine = LLMEngine(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/engine/llm_engine.py", line 273, in __init__ self.model_executor = executor_class(vllm_config=vllm_config, ) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/executor/executor_base.py", line 271, in __init__ super().__init__(*args, **kwargs) File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/executor/executor_base.py", line 52, in __init__ self._init_executor() File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/executor/mp_distributed_executor.py", line 125, in _init_executor self._run_workers("load_model", File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/executor/mp_distributed_executor.py", line 185, in _run_workers driver_worker_output = run_method(self.driver_worker, sent_method, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/utils.py", line 2196, in run_method return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/worker/worker.py", line 183, in load_model self.model_runner.load_model() File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/worker/model_runner.py", line 1112, in load_model self.model = get_model(vllm_config=self.vllm_config) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/model_executor/model_loader/__init__.py", line 14, in get_model return loader.load_model(vllm_config=vllm_config) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/model_executor/model_loader/loader.py", line 406, in load_model model = _initialize_model(vllm_config=vllm_config) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/model_executor/model_loader/loader.py", line 125, in _initialize_model return model_class(vllm_config=vllm_config, prefix=prefix) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/model_executor/models/qwen2.py", line 453, in __init__ self.model = Qwen2Model(vllm_config=vllm_config, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/compilation/decorators.py", line 151, in __init__ old_init(self, vllm_config=vllm_config, prefix=prefix, **kwargs) File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/model_executor/models/qwen2.py", line 307, in __init__ self.start_layer, self.end_layer, self.layers = make_layers( ^^^^^^^^^^^^ File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/model_executor/models/utils.py", line 557, in make_layers [PPMissingLayer() for _ in range(start_layer)] + [ ^ File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/model_executor/models/utils.py", line 558, in maybe_offload_to_cpu(layer_fn(prefix=f"{prefix}.{idx}")) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/model_executor/models/qwen2.py", line 309, in <lambda> lambda prefix: Qwen2DecoderLayer(config=config, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/model_executor/models/qwen2.py", line 220, in __init__ self.mlp = Qwen2MLP( ^^^^^^^^^ File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/model_executor/models/qwen2.py", line 82, in __init__ self.down_proj = RowParallelLinear( ^^^^^^^^^^^^^^^^^^ File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/model_executor/layers/linear.py", line 1062, in __init__ self.quant_method.create_weights( File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/model_executor/layers/linear.py", line 129, in create_weights weight = Parameter(torch.empty(sum(output_partition_sizes), ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/torch/utils/_device.py", line 106, in __torch_function__ return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 136.00 MiB. GPU 0 has a total capacity of 21.49 GiB of which 43.50 MiB is free. Including non-PyTorch memory, this process has 21.44 GiB memory in use. Of the allocated memory 21.19 GiB is allocated by PyTorch, and 20.25 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables) [rank0]:[W706 22:09:40.345098611 ProcessGroupNCCL.cpp:1250] Warning: WARNING: process group has NOT been destroyed before we destruct ProcessGroupNCCL. On normal program exit, the application should call destroy_process_group to ensure that any pending NCCL operations have finished in this process. In rare cases this process can exit before this point and block the progress of another member of the process group. This constraint has always been present, but this warning has only been added since PyTorch 2.4 (function operator()) Traceback (most recent call last): File "/root/miniconda3/envs/deepseek_vllm/bin/vllm", line 8, in <module> sys.exit(main()) ^^^^^^ File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/entrypoints/cli/main.py", line 73, in main args.dispatch_function(args) File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/entrypoints/cli/serve.py", line 34, in cmd uvloop.run(run_server(args)) File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/uvloop/__init__.py", line 105, in run return runner.run(wrapper()) ^^^^^^^^^^^^^^^^^^^^^ File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/asyncio/runners.py", line 118, in run return self._loop.run_until_complete(task) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/uvloop/__init__.py", line 61, in wrapper return await main ^^^^^^^^^^ File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 947, in run_server async with build_async_engine_client(args) as engine_client: File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/contextlib.py", line 210, in __aenter__ return await anext(self.gen) ^^^^^^^^^^^^^^^^^^^^^ File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 139, in build_async_engine_client async with build_async_engine_client_from_engine_args( File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/contextlib.py", line 210, in __aenter__ return await anext(self.gen) ^^^^^^^^^^^^^^^^^^^^^ File "/root/miniconda3/envs/deepseek_vllm/lib/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 233, in build_async_engine_client_from_engine_args raise RuntimeError( RuntimeError: Engine process failed to start. See stack trace for the root cause. (deepseek_vllm) root@user-X99:/home/user/Desktop# /root/miniconda3/envs/deepseek_vllm/lib/python3.11/multiprocessing/resource_tracker.py:254: UserWarning: resource_tracker: There appear to be 1 leaked shared_memory objects to clean up at shutdown warnings.warn('resource_tracker: There appear to be %d ' (deepseek_vllm) root@user-X99:/home/user/Desktop#

[2025-06-16 05:44:39.016+0800] [1869236] [281473290531168] [batchscheduler] [ERROR] [model.py:59] : [Model] >>> Exception:External Comm Manager: Create the hccl communication group failed. export ASDOPS_LOG_LEVEL=ERROR, export ASDOPS_LOG_TO_STDOUT=1 to see more details. Default log path is $HOME/atb/log. Traceback (most recent call last): File "/usr/local/lib/python3.11/site-packages/model_wrapper/model.py", line 57, in initialize return self.python_model.initialize(config) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/model_wrapper/standard_model.py", line 117, in initialize self.generator = Generator( ^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/mindie_llm/text_generator/generator.py", line 230, in __init__ self.cache_manager = self.warm_up( ^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/mindie_llm/text_generator/generator.py", line 427, in warm_up raise e File "/usr/local/lib/python3.11/site-packages/mindie_llm/text_generator/generator.py", line 420, in warm_up self._generate_inputs_warm_up_backend(cache_manager, input_metadata, inference_mode, dummy=True) File "/usr/local/lib/python3.11/site-packages/mindie_llm/text_generator/generator.py", line 523, in _generate_inputs_warm_up_backend self.generator_backend._warm_up(model_inputs, inference_mode=inference_mode) File "/usr/local/lib/python3.11/site-packages/mindie_llm/text_generator/adapter/generator_torch.py", line 507, in _warm_up super()._warm_up(model_inputs) File "/usr/local/lib/python3.11/site-packages/mindie_llm/text_generator/adapter/generator_backend.py", line 219, in _warm_up _ = self.forward(model_inputs, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/mindie_llm/utils/decorators/time_decorator.py", line 69, in wrapper return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/mindie_llm/text_generator/adapter/generator_torch.py", line 197, in forward logits = self._forward(model_inputs, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/mindie_llm/text_generator/adapter/generator_torch.py", line 527, in _forward logits = self.model_wrapper.forward(model_inputs, self.cache_pool.npu_cache, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/mindie_llm/modeling/model_wrapper/atb/atb_model_wrapper.py", line 165, in forward result = self.forward_tensor( ^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/mindie_llm/modeling/model_wrapper/atb/atb_model_wrapper.py", line 205, in forward_tensor result = self.model_runner.forward( ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/Ascend/atb-models/atb_llm/runner/model_runner.py", line 297, in forward res = self.model.forward(**kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/Ascend/atb-models/atb_llm/models/base/flash_causal_lm.py", line 491, in forward self.init_ascend_weight() File "/usr/local/Ascend/atb-models/atb_llm/models/qwen2/flash_causal_qwen2.py", line 287, in init_ascend_weight self.acl_encoder_operation.set_param(json.dumps({**encoder_param})) RuntimeError: External Comm Manager: Create the hccl communication group failed. export ASDOPS_LOG_LEVEL=ERROR, export ASDOPS_LOG_TO_STDOUT=1 to see more details. Default log path is $HOME/atb/log.

RuntimeError: External Comm Manager: Create the hccl communication group failed. export ASDOPS_LOG_LEVEL=ERROR, export ASDOPS_LOG_TO_STDOUT=1 to see more details. Default log path is $HOME/atb/log. 2025-05-11 13:26:43,621 [ERROR] model.py:42 - [Model] >>> return initialize error result: {'status': 'error', 'npuBlockNum': '0', 'cpuBlockNum': '0'} 2025-05-11 13:26:43,618 [ERROR] model.py:39 - [Model] >>> Exception:External Comm Manager: Create the hccl communication group failed. export ASDOPS_LOG_LEVEL=ERROR, export ASDOPS_LOG_TO_STDOUT=1 to see more details. Default log path is $HOME/atb/log. Traceback (most recent call last): File "/root/anaconda3/envs/python311/lib/python3.11/site-packages/model_wrapper/model.py", line 37, in initialize return self.python_model.initialize(config) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/envs/python311/lib/python3.11/site-packages/model_wrapper/standard_model.py", line 146, in initialize self.generator = Generator( ^^^^^^^^^^ File "/root/anaconda3/envs/python311/lib/python3.11/site-packages/mindie_llm/text_generator/generator.py", line 119, in __init__ self.warm_up(max_prefill_tokens, max_seq_len, max_input_len, max_iter_times, inference_mode) File "/root/anaconda3/envs/python311/lib/python3.11/site-packages/mindie_llm/text_generator/generator.py", line 303, in warm_up raise e File "/root/anaconda3/envs/python311/lib/python3.11/site-packages/mindie_llm/text_generator/generator.py", line 296, in warm_up self._generate_inputs_warm_up_backend(input_metadata, inference_mode, dummy=True) File "/root/anaconda3/envs/python311/lib/python3.11/site-packages/mindie_llm/text_generator/generator.py", line 378, in _generate_inputs_warm_up_backend self.generator_backend.warm_up(model_inputs, inference_mode=inference_mode) File "/root/anaconda3/envs/python311/lib/python3.11/site-packages/mindie_llm/text_generator/adapter/generator_torch.py", line 198, in warm_up super().warm_up(model_inputs) File "/root/anaconda3/envs/python311/lib/python3.11/site-packages/mindie_llm/text_generator/adapter/generator_backend.py", line 170, in warm_up _ = self.forward(model_inputs, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/envs/python311/lib/python3.11/site-packages/mindie_llm/utils/decorators/time_decorator.py", line 38, in wrapper return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/envs/python311/lib/python3.11/site-packages/mindie_llm/text_generator/adapter/generator_torch.py", line 153, in forward logits = self.model_wrapper.forward(model_inputs, self.cache_pool.npu_cache, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/envs/python311/lib/python3.11/site-packages/mindie_llm/modeling/model_wrapper/atb/atb_model_wrapper.py", line 89, in forward logits = self.forward_tensor( ^^^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/envs/python311/lib/python3.11/site-packages/mindie_llm/modeling/model_wrapper/atb/atb_model_wrapper.py", line 116, in forward_tensor logits = self.model_runner.forward( ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/Ascend/atb-models/atb_llm/runner/model_runner.py", line 297, in forward res = self.model.forward(**kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/Ascend/atb-models/atb_llm/models/base/flash_causal_lm.py", line 491, in forward self.init_ascend_weight() File "/usr/local/Ascend/atb-models/atb_llm/models/qwen2/flash_causal_qwen2.py", line 287, in init_ascend_weight self.acl_encoder_operation.set_param(json.dumps({**encoder_param})) RuntimeError: External Comm Manager: Create the hccl communication group failed. export ASDOPS_LOG_LEVEL=ERROR, export ASDOPS_LOG_TO_STDOUT=1 to see more details. Default log path is $HOME/atb/log. 2025-05-11 13:26:43,623 [ERROR] model.py:42 - [Model] >>> return initialize error result: {'status': 'error', 'npuBlockNum': '0', 'cpuBlockNum': '0'} [ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared! [ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared! [ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared! [ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared! [ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared! [ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared! [ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared! [ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared! [ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared! [ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared! [ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared! [ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared! [ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared! [ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared! [ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared! [ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared! [ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared! [ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared! [ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared! [ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared! [ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared! [ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared! [ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared! [ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared! [ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared! [ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared! [ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared! [ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared! [ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared! [ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared! [ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared! [ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared! /root/anaconda3/envs/python311/lib/python3.11/multiprocessing/resource_tracker.py:254: UserWarning: resource_tracker: There appear to be 30 leaked semaphore objects to clean up at shutdown warnings.warn('resource_tracker: There appear to be %d ' /root/anaconda3/envs/python311/lib/python3.11/multiprocessing/resource_tracker.py:254: UserWarning: resource_tracker: There appear to be 30 leaked semaphore objects to clean up at shutdown warnings.warn('resource_tracker: There appear to be %d ' /root/anaconda3/envs/python311/lib/python3.11/multiprocessing/resource_tracker.py:254: UserWarning: resource_tracker: There appear to be 30 leaked semaphore objects to clean up at shutdown warnings.warn('resource_tracker: There appear to be %d ' /root/anaconda3/envs/python311/lib/python3.11/multiprocessing/resource_tracker.py:254: UserWarning: resource_tracker: There appear to be 30 leaked semaphore objects to clean up at shutdown warnings.warn('resource_tracker: There appear to be %d ' Daemon is killing... Killed (python311) root@zhangzhouzhixiao:/usr/local/Ascend/mindie/latest/mindie-service/bin# 我确认过当前容器内没有对应的hccl,我该如何安装?

PS D:\Ai\Tang_Transform> & D:/Ai/Tang_Transform/venv/Scripts/python.exe d:/Ai/Tang_Transform/main.py Traceback (most recent call last): File "d:\Ai\Tang_Transform\main.py", line 5, in <module> app = create_app() ^^^^^^^^^^^^ File "d:\Ai\Tang_Transform\app\api_service.py", line 24, in create_app embedding_model = EmbeddingModel.get_instance() ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "d:\Ai\Tang_Transform\app\embedding_model.py", line 25, in get_instance cls._instance = cls() ^^^^^ File "d:\Ai\Tang_Transform\app\embedding_model.py", line 38, in __init__ self.tokenizer = AutoTokenizer.from_pretrained(model_path) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\Ai\Tang_Transform\venv\Lib\site-packages\transformers\models\auto\tokenization_auto.py", line 880, in from_pretrained return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\Ai\Tang_Transform\venv\Lib\site-packages\transformers\tokenization_utils_base.py", line 2110, in from_pretrained return cls._from_pretrained( ^^^^^^^^^^^^^^^^^^^^^ File "D:\Ai\Tang_Transform\venv\Lib\site-packages\transformers\tokenization_utils_base.py", line 2336, in _from_pretrained tokenizer = cls(*init_inputs, **init_kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\Ai\Tang_Transform\venv\Lib\site-packages\transformers\models\qwen2\tokenization_qwen2_fast.py", line 120, in __init__ super().__init__( File "D:\Ai\Tang_Transform\venv\Lib\site-packages\transformers\tokenization_utils_fast.py", line 114, in __init__ fast_tokenizer = TokenizerFast.from_file(fast_tokenizer_file) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Exception: data did not match any variant of untagged enum ModelWrapper at line 757532 column 3 PS D:\Ai\Tang_Transform> 怎么还有错误?

我的 chat_server.py 里是这样的 应该具体怎么修改# chat_server.py import os import sys import subprocess import threading import time import signal import json import requests from flask import Flask, request, jsonify, send_from_directory app = Flask(__name__, template_folder='templates') # 存储子进程对象 processes = {} SERVICE_PORTS = { "llama": 8080, "search": 5001, "main": 5000 } def start_subprocesses(): """启动所有必要的子服务""" # 确保日志目录存在 os.makedirs("logs", exist_ok=True) # 启动大模型推理服务 llama_log = open(os.path.join("logs", "llama.log"), "a") llama_process = subprocess.Popen( [os.path.join(os.getcwd(), "llama-server.exe"), "-m", os.path.join("models", "DeepSeek-R1-Distill-Qwen-7B-Q4_K_M.gguf"), "-c", "4096", "--port", str(SERVICE_PORTS["llama"])], stdout=llama_log, stderr=subprocess.STDOUT, creationflags=subprocess.CREATE_NO_WINDOW if sys.platform == "win32" else 0 ) processes["llama"] = llama_process print(f"大模型服务已启动 (PID: {llama_process.pid})") # 启动搜索服务 search_log = open(os.path.join("logs", "search.log"), "a") search_process = subprocess.Popen( [sys.executable, "search.py"], stdout=search_log, stderr=subprocess.STDOUT, creationflags=subprocess.CREATE_NO_WINDOW if sys.platform == "win32" else 0 ) processes["search"] = search_process print(f"搜索服务已启动 (PID: {search_process.pid})") # 等待服务启动 print("等待服务启动...") time.sleep(5) # 检查服务健康状态 check_service_health() print("所有子服务已启动") def check_service_health(): """检查所有服务是否健康""" services_healthy = True # 检查搜索服务 try: response = requests.get(f"http://localhost:{SERVICE_PORTS['search']}/health", timeout=5) if response.status_code == 200: print("搜索服务健康检查通过") else: print(f"搜索服务健康检查失败: {response.status_code}") services_healthy = False except Exception as e: print(f"搜索服务健康检查异常: {str(e)}") services_healthy = False # 检查大模型服务 try: response = requests.get(f"http://localhost:{SERVICE_PORTS['llama']}/health", timeout=5) if response.status_code == 200: print("大模型服务健康检查通过") else: print(f"大模型服务健康检查失败: {response.status_code}") services_healthy = False except Exception as e: print(f"大模型服务健康检查异常: {str(e)}") services_healthy = False return services_healthy def stop_subprocesses(): """停止所有子服务""" print("正在停止子服务...") for name, proc in processes.items(): if proc.poll() is None: # 检查进程是否仍在运行 print(f"停止 {name} 服务 (PID: {proc.pid})") proc.terminate() try: proc.wait(timeout=5) except subprocess.TimeoutExpired: print(f"强制停止 {name} 服务") proc.kill() print("所有子服务已停止") @app.route('/') def index(): """提供前端页面""" return send_from_directory('templates', 'index.html') @app.route('/completion', methods=['POST']) def completion(): """处理大模型请求""" data = request.json prompt = data['prompt'] use_search = data.get('use_search', False) # 处理搜索请求 search_results = [] if use_search: try: search_response = requests.post( f"http://localhost:{SERVICE_PORTS['search']}/search", json={"query": prompt}, timeout=10 ) if search_response.status_code == 200: search_data = search_response.json() search_results = search_data.get("results", []) # 构建搜索提示 if search_results: prompt = f"用户问题: {prompt}\n\n相关搜索结果:\n" for i, result in enumerate(search_results[:3]): prompt += f"{i+1}. [{result['title']}]({result['url']})\n {result['snippet']}\n\n" prompt += "请根据以上信息回答用户问题:" except Exception as e: print(f"搜索请求失败: {e}") # 构建大模型请求 model_payload = { "prompt": prompt, "temperature": 0.7, "max_tokens": 2048, "stop": ["", "###"] } try: # 调用大模型推理服务 model_response = requests.post( f"http://localhost:{SERVICE_PORTS['llama']}/completion", json=model_payload, timeout=120 # 长超时 ) if model_response.status_code == 200: model_data = model_response.json() content = model_data.get("content", "").strip() # 添加搜索来源信息 if use_search and search_results: content += "\n\n信息来源:\n" for result in search_results[:3]: content += f"- [{result['title']}]({result['url']})\n" return jsonify({"content": content}) else: return jsonify({"error": f"大模型请求失败: {model_response.status_code}"}), 500 except Exception as e: return jsonify({"error": f"请求大模型时出错: {str(e)}"}), 500 def monitor_subprocesses(): """监控子进程状态""" while True: time.sleep(15) # 延长检查间隔 for name, proc in processes.items(): if proc.poll() is not None: # 进程已退出 print(f"{name}服务意外停止,尝试重启...") # 创建新的日志文件 log_file = open(os.path.join("logs", f"{name}_restart.log"), "a") # 根据名称重启服务 if name == "llama": new_proc = subprocess.Popen( [os.path.join(os.getcwd(), "llama-server.exe"), "-m", os.path.join("models", "DeepSeek-R1-Distill-Qwen-7B-Q4_K_M.gguf"), "-c", "4096", "--port", str(SERVICE_PORTS["llama"])], stdout=log_file, stderr=subprocess.STDOUT, creationflags=subprocess.CREATE_NO_WINDOW if sys.platform == "win32" else 0 ) processes["llama"] = new_proc print(f"大模型服务已重启 (PID: {new_proc.pid})") elif name == "search": new_proc = subprocess.Popen( [sys.executable, "search.py"], stdout=log_file, stderr=subprocess.STDOUT, creationflags=subprocess.CREATE_NO_WINDOW if sys.platform == "win32" else 0 ) processes["search"] = new_proc print(f"搜索服务已重启 (PID: {new_proc.pid})") # 等待服务启动 time.sleep(3) check_service_health() if __name__ == '__main__': # 创建日志目录 os.makedirs("logs", exist_ok=True) # 启动子服务 start_subprocesses() # 启动监控线程 monitor_thread = threading.Thread(target=monitor_subprocesses, daemon=True) monitor_thread.start() # 注册退出处理 def shutdown_handler(signum, frame): print(f"接收到信号 {signum},正在关闭服务...") stop_subprocesses() sys.exit(0) signal.signal(signal.SIGINT, shutdown_handler) signal.signal(signal.SIGTERM, shutdown_handler) # 启动Flask服务 print(f"主服务运行在 http://0.0.0.0:{SERVICE_PORTS['main']}") try: app.run(host='0.0.0.0', port=SERVICE_PORTS["main"]) finally: shutdown_handler(None, None)

最新推荐

recommend-type

微软解决方案面向服务的架构.doc

微软解决方案面向服务的架构.doc
recommend-type

Huawei S6780-H-V600R024SPH120

Huawei S6780-H_V600R024SPH120,里面包含补丁说明书和补丁安装指导书,该补丁支持哪些型号,支持哪些版本可以安装当前补丁,请参考补丁说明书和补丁安装指导书。
recommend-type

网络营销案例分析概述.pptx

网络营销案例分析概述.pptx
recommend-type

2025广西省道路路网矢量数据图层Shp数据最新版下载

2025广西省道路路网矢量数据图层,shp格式,包含多级道路分类属性,路名等属性,包含全省几十万条道路,坐标系为WGS1984坐标系统
recommend-type

最新中国移动通信年度报告样本 (1)(1).doc

最新中国移动通信年度报告样本 (1)(1).doc
recommend-type

VC图像编程全面资料及程序汇总

【标题】:"精通VC图像编程资料全览" 【知识点】: VC即Visual C++,是微软公司推出的一个集成开发环境(IDE),专门用于C++语言的开发。VC图像编程涉及到如何在VC++开发环境中处理和操作图像。在VC图像编程中,开发者通常会使用到Windows API中的GDI(图形设备接口)或GDI+来进行图形绘制,以及DirectX中的Direct2D或DirectDraw进行更高级的图形处理。 1. GDI(图形设备接口): - GDI是Windows操作系统提供的一套应用程序接口,它允许应用程序通过设备无关的方式绘制图形。 - 在VC图像编程中,主要使用CDC类(设备上下文类)来调用GDI函数进行绘制,比如绘制线条、填充颜色、显示文本等。 - CDC类提供了很多函数,比如`MoveTo`、`LineTo`、`Rectangle`、`Ellipse`、`Polygon`等,用于绘制基本的图形。 - 对于图像处理,可以使用`StretchBlt`、`BitBlt`、`TransparentBlt`等函数进行图像的位块传输。 2. GDI+: - GDI+是GDI的后继技术,提供了更丰富的图形处理功能。 - GDI+通过使用`Graphics`类来提供图像的绘制、文本的渲染、图像的处理和颜色管理等功能。 - GDI+引入了对矢量图形、渐变色、复杂的文本格式和坐标空间等更高级的图形处理功能。 - `Image`类是GDI+中用于图像操作的基础类,通过它可以进行图像的加载、保存、旋转、缩放等操作。 3. DirectX: - DirectX是微软推出的一系列API集合,用于在Windows平台上进行高性能多媒体编程。 - DirectX中的Direct2D是用于硬件加速的二维图形API,专门用于UI元素和简单的图形渲染。 - DirectDraw主要用于硬件加速的位图操作,比如全屏游戏开发中的画面渲染。 4. 位图操作: - 在VC图像编程中,位图操作是一个重要的部分。需要了解如何加载、保存和处理位图(BMP)文件。 - 可以使用位图文件格式的解析,来访问位图的像素数据,进行像素级别的图像处理和修改。 5. 高级图像处理技术: - 包括图像滤镜、图像转换、图像压缩和解压缩技术。 - 需要掌握一些图像处理算法,比如卷积、FFT(快速傅里叶变换)、DCT(离散余弦变换)等。 - 了解图像的色彩空间转换,比如RGB到YUV的转换,这在视频处理中非常重要。 6. 图像库的使用: - 除了直接使用API进行图像处理之外,还可以使用开源的图像处理库,如OpenCV。 - OpenCV是一个跨平台的计算机视觉和机器学习软件库,它提供了很多高级的图像处理功能。 【压缩包子文件的文件名称列表】: VC++ 文件名称列表显示为"VC++",这表明文件包中可能包含了Visual C++的项目、源代码文件、动态链接库(DLLs)、编译器设置、链接器设置等。如果要使用这些文件进行VC图像编程,需要确保Visual C++开发环境已经安装并且配置好,同时需要安装好所有依赖的库和工具。 要开始VC图像编程,开发者需要具备C++编程基础,熟悉Windows编程概念,并且对图形学有一定的了解。掌握VC图像编程技巧,对于进行桌面应用程序开发、图像处理软件开发以及游戏开发等都是至关重要的。
recommend-type

Pokemmo响应速度翻倍:多线程处理的高级技巧

# 摘要 多线程处理是提高软件并发性能和响应速度的关键技术,本文从基础概念与优势出发,深入探讨了多线程的实现机制,包括线程模型、调度算法、编程关键技术、线程安全性最佳实践、死锁预防与处理等方面。通
recommend-type

人名列表滚动抽奖

<think>嗯,用户想实现一个滚动抽奖功能,需要前端动画效果和随机选择逻辑。首先,我得考虑前端怎么实现滚动效果。可能用CSS动画或者JavaScript的requestAnimationFrame?比如,使用CSS的@keyframes来控制位移,或者用JS动态更新样式。然后,随机选择算法,可能需要确保公平性,比如用Fisher-Yates洗牌算法,或者用Math.random()来生成随机索引。然后,用户可能需要平滑的滚动动画,比如先快速滚动,然后逐渐减速,最后停在选中的人名上。这可能需要设置定时器,逐步改变位置,或者使用CSS过渡效果。另外,还要考虑性能,避免页面卡顿,可能需要使用硬件加
recommend-type

一站式JSF开发环境:即解压即用JAR包

标题:“jsf开发完整JAR包”所指的知识点: 1. JSF全称JavaServer Faces,是Java EE(现EE4J)规范之一,用于简化Java Web应用中基于组件的用户界面构建。JSF提供了一种模型-视图-控制器(MVC)架构的实现,使得开发者可以将业务逻辑与页面表示分离。 2. “开发完整包”意味着这个JAR包包含了JSF开发所需的所有类库和资源文件。通常来说,一个完整的JSF包会包含核心的JSF库,以及一些可选的扩展库,例如PrimeFaces、RichFaces等,这些扩展库提供了额外的用户界面组件。 3. 在一个项目中使用JSF,开发者无需单独添加每个必要的JAR文件到项目的构建路径中。因为打包成一个完整的JAR包后,所有这些依赖都被整合在一起,极大地方便了开发者的部署工作。 4. “解压之后就可以直接导入工程中使用”表明这个JAR包是一个可执行的归档文件,可能是一个EAR包或者一个可直接部署的Java应用包。解压后,开发者只需将其内容导入到他们的IDE(如Eclipse或IntelliJ IDEA)中,或者将其放置在Web应用服务器的正确目录下,就可以立即进行开发。 描述中所指的知识点: 1. “解压之后就可以直接导入工程中使用”说明这个JAR包是预先配置好的,它可能包含了所有必要的配置文件,例如web.xml、faces-config.xml等,这些文件是JSF项目运行所必需的。 2. 直接使用意味着减少了开发者配置环境和处理依赖的时间,有助于提高开发效率。 标签“jsf jar包”所指的知识点: 1. 标签指明了JAR包的内容是专门针对JSF框架的。因此,这个JAR包包含了JSF规范所定义的API以及可能包含的具体实现,比如Mojarra或MyFaces。 2. “jar包”是一种Java平台的归档文件格式,用于聚合多个文件到一个文件中。在JSF开发中,JAR文件经常被用来打包和分发库或应用程序。 文件名称列表“jsf”所指的知识点: 1. “jsf”文件名可能意味着这是JSF开发的核心库,它应该包含了所有核心的JavaServer Faces类文件以及资源文件。 2. 如果是使用特定版本的JSF,例如“jsf-2.2.jar”,则表明文件内包含了对应版本的JSF实现。这种情况下,开发者必须确认他们所使用的Web服务器或应用程序服务器支持该版本的JSF。 3. 文件名称也可能是“jsf-components.jar”、“jsf-impl.jar”等,表明这个JAR包是JSF的一个子模块或特定功能组件。例如,“jsf-components.jar”可能包含了一系列用于在JSF应用中使用的自定义组件。 4. 对于开发者而言,了解文件名称中所蕴含的信息非常重要,因为这将决定他们需要下载哪些JAR包来满足特定项目的需求。 综合以上信息,开发者在使用JSF进行Java Web应用开发时,会通过一个预先配置好的JAR包来快速地搭建和启动项目。这样做不仅简化了项目初始化的过程,也使得开发者能够更加聚焦于业务逻辑的实现和界面设计,而不必深究底层框架配置的细节。
recommend-type

Pokemmo内存优化揭秘:专家教你如何降低50%资源消耗

# 摘要 本文综述了Pokemmo游戏的内存优化方法,从内存管理基础出发,探讨内存使用效率的影响因素,并介绍了性能监控与分析工具。在内存优化实践技巧章节中,详细讨论了代码层面的优化、数据结构和算法选择对内存效率的影响,并通过案例分析展示了实际的优化过程。针对Pokemmo游戏特点,分析了内存消耗特性并提出了特定优化技术。最后,本文展望了未来内存管理技术的发展方向,以及游戏开发中面临的新挑战,为Pokemmo及类似游戏提供了优化建议。 # 关键字 内存优化;内存管理;性能监控;数据结构;算法效率;游戏开发 参考资源链接:[Pokemmo必备资源包:四种ROM与汉化补丁](https://we