Kylin V10下使用Atlas 300I Pro显卡部署deepseek-R1 14B模型

应客户需求，需要使用华为Atlas 300I Pro显卡及Kylin V10操作系统中部署deepseek-R1 14B的显卡模型。

本文档部署环境需连接互联网，如需完全离线环境请提前准备环境。

配置信息

CPU	内存	显卡数量
32C	512G	Atlas 300I Pro 24GB * 4

系统版本

[root@lolicp ~]# cat /etc/os-release
NAME="Kylin Linux Advanced Server"
VERSION="V10 (Tercel)"
ID="kylin"
VERSION_ID="V10"
PRETTY_NAME="Kylin Linux Advanced Server V10 (Tercel)"
ANSI_COLOR="0;31"

显卡版本

[root@lolicp ~]# npu-smi info
+--------------------------------------------------------------------------------------------------------+
| npu-smi 24.1.rc2                                 Version: 24.1.rc2                                     |
+-------------------------------+-----------------+------------------------------------------------------+
| NPU     Name                  | Health          | Power(W)     Temp(C)           Hugepages-Usage(page) |
| Chip    Device                | Bus-Id          | AICore(%)    Memory-Usage(MB)                        |
+===============================+=================+======================================================+
| 2       310P3                 | OK              | NA           45                9014  / 9014          |
| 0       2                     | 0000:2F:00.0    | 0            19868/ 21527                            |
+===============================+=================+======================================================+
| 3       310P3                 | OK              | NA           48                9014  / 9014          |
| 0       0                     | 0000:06:00.0    | 0            19867/ 21527                            |
+===============================+=================+======================================================+
| 5       310P3                 | OK              | NA           49                9014  / 9014          |
| 0       3                     | 0000:D8:00.0    | 0            19866/ 21527                            |
+===============================+=================+======================================================+
| 6       310P3                 | OK              | NA           46                9014  / 9014          |
| 0       1                     | 0000:07:00.0    | 0            19868/ 21527                            |
+===============================+=================+======================================================+
+-------------------------------+-----------------+------------------------------------------------------+
| NPU     Chip                  | Process id      | Process name             | Process memory(MB)        |
+===============================+=================+======================================================+
| 2       0                     | 529485          | mindie_llm_back          | 18114                     |
+===============================+=================+======================================================+
| 3       0                     | 529481          | mindie_llm_back          | 18113                     |
+===============================+=================+======================================================+
| 5       0                     | 529487          | mindie_llm_back          | 18113                     |
+===============================+=================+======================================================+
| 6       0                     | 529483          | mindie_llm_back          | 18114                     |
+===============================+=================+======================================================+

初始化环境

服务器环境为x86架构，理论上arm架构也可部署。

安装驱动

Ascend-hdk-310p-npu-driver_24.1.rc1_linux-x86-64.run 或 Ascend-hdk-310p-npu-driver-24.1.rc2-1.x86-64.rpm

安装固件

Ascend-hdk-310p-npu-firmware_7.1.0.12.220.run 或 Ascend-hdk-310p-npu-firmware-7.3.0.1.231-1.noarch.rpm

安装容器工具

Ascend Docker Runtime 容器引擎插件

Ascend-docker-runtime_6.0.0_linux-x86_64.run

下载模型

7B参数: 运行需要1-2张300I Pro显卡(超出2张报错)
14B参数：可使用4张300I Pro显卡
32B参数：显存不足以支撑。

可访问本站的huggingface镜像加速： https://lolicp.com/api/huggingface/

通过python下载

python脚本依赖安装

pip3 install requests argparse tqdm

下载脚本

wget https://file.lolicp.com/script/python/huggingface_down.py

下载模型，其中deepseek-ai/DeepSeek-R1-Distill-Llama-14B为huggingface网站的模型路径。

--download-dir 为下载目录

在/models目录下执行

python3 huggingface_down.py deepseek-ai/DeepSeek-R1-Distill-Llama-14B --download --download-dir /models/DeepSeek-R1-Distill-Llama-14B

准备镜像

前往官网下载mindie镜像：https://www.hiascend.com/developer/ascendhub/detail/mindie

或使用本站的x86镜像 https://public.lolicp.com/container_images/mindie/mindie_1_0_0-300I-Duo-py311-openeuler24_03-lts.tgz

部署模型

获取显卡ID

记录id值，后续启动镜像需替换。

ls -l /dev/davinci*

启动前修改

修改文件权限

chmod 750 /models/DeepSeek-R1-Distill-Qwen-14B/config.json

修改模型权重config.json中torch_dtype字段为float16

启动镜像

启动镜像前请提前下载镜像并加载。

docker run -it -d --privileged --net=host --shm-size=32g \
 --hostname lolicp \
 --name deepseek-r1-lolicp \
 --device=/dev/davinci_manager \
 --device=/dev/hisi_hdc \
 --device=/dev/devmm_svm \
 --device=/dev/davinci0 \
 --device=/dev/davinci2 \
 --device=/dev/davinci4 \
 --device=/dev/davinci6 \
 -v /etc/vnpu.cfg:/etc/vnpu.cfg:ro \
 -v /usr/local/Ascend/driver:/usr/local/Ascend/driver \
 -v /usr/local/dcmi/:/usr/local/dcmi/ \
 -v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \
 -v /etc/ascend_install.info:/etc/ascend_install.info:ro \
 -v /models:/model_weights \
 swr.cn-south-1.myhuaweicloud.com/ascendhub/mindie:1.0.0-300I-Duo-py311-openeuler24.03-lts bash

进入镜像

docker exec -it deepseek-r1-lolicp bash

修改mindie-service

编辑配置文件

vi /usr/local/Ascend/mindie/latest/mindie-service/conf/config.json

按照实际情况修改

修改https、显卡ID、显卡数量、模型名称、模型路径

{
    "Version" : "1.1.0",
    "LogConfig" :
    {
        "logLevel" : "Info",
        "logFileSize" : 20,
        "logFileNum" : 20,
        "logPath" : "logs/mindservice.log"
    },

    "ServerConfig" :
    {
        "ipAddress" : "127.0.0.1", 
        "managementIpAddress" : "127.0.0.2",
        "port" : 1025, # 监听端口
        "managementPort" : 1026,
        "metricsPort" : 1027,
        "allowAllZeroIpListening" : false,
        "maxLinkNum" : 1000,
        "httpsEnabled" : false, # 开启https
        "fullTextEnabled" : false,
        "tlsCaPath" : "security/ca/",
        "tlsCaFile" : ["ca.pem"],
        "tlsCert" : "security/certs/server.pem",
        "tlsPk" : "security/keys/server.key.pem",
        "tlsPkPwd" : "security/pass/key_pwd.txt",
        "tlsCrlPath" : "security/certs/",
        "tlsCrlFiles" : ["server_crl.pem"],
        "managementTlsCaFile" : ["management_ca.pem"],
        "managementTlsCert" : "security/certs/management/server.pem",
        "managementTlsPk" : "security/keys/management/server.key.pem",
        "managementTlsPkPwd" : "security/pass/management/key_pwd.txt",
        "managementTlsCrlPath" : "security/management/certs/",
        "managementTlsCrlFiles" : ["server_crl.pem"],
        "kmcKsfMaster" : "tools/pmt/master/ksfa",
        "kmcKsfStandby" : "tools/pmt/standby/ksfb",
        "inferMode" : "standard",
        "interCommTLSEnabled" : true,
        "interCommPort" : 1121,
        "interCommTlsCaPath" : "security/grpc/ca/",
        "interCommTlsCaFiles" : ["ca.pem"],
        "interCommTlsCert" : "security/grpc/certs/server.pem",
        "interCommPk" : "security/grpc/keys/server.key.pem",
        "interCommPkPwd" : "security/grpc/pass/key_pwd.txt",
        "interCommTlsCrlPath" : "security/grpc/certs/",
        "interCommTlsCrlFiles" : ["server_crl.pem"],
        "openAiSupport" : "vllm"
    },

    "BackendConfig" : {
        "backendName" : "mindieservice_llm_engine",
        "modelInstanceNumber" : 1,
        "npuDeviceIds" : [[0,1,2,3]], # 显卡ID
        "tokenizerProcessNumber" : 8,
        "multiNodesInferEnabled" : false,
        "multiNodesInferPort" : 1120,
        "interNodeTLSEnabled" : true,
        "interNodeTlsCaPath" : "security/grpc/ca/",
        "interNodeTlsCaFiles" : ["ca.pem"],
        "interNodeTlsCert" : "security/grpc/certs/server.pem",
        "interNodeTlsPk" : "security/grpc/keys/server.key.pem",
        "interNodeTlsPkPwd" : "security/grpc/pass/mindie_server_key_pwd.txt",
        "interNodeTlsCrlPath" : "security/grpc/certs/",
        "interNodeTlsCrlFiles" : ["server_crl.pem"],
        "interNodeKmcKsfMaster" : "tools/pmt/master/ksfa",
        "interNodeKmcKsfStandby" : "tools/pmt/standby/ksfb",
        "ModelDeployConfig" :
        {
            "maxSeqLen" : 2560,
            "maxInputTokenLen" : 2048,
            "truncation" : false,
            "ModelConfig" : [
                {
                    "modelInstanceType" : "Standard",
                    "modelName" : "DeepSeek-R1-Distill-Qwen-14B", # 模型名称，请求时使用
                    "modelWeightPath" : "/model_weights/DeepSeek-R1-Distill-Qwen-14B", # 模型权重目录
                    "worldSize" : 4, # 使用的显卡数
                    "cpuMemSize" : 10,
                    "npuMemSize" : -1,
                    "backendType" : "atb",
                    "trustRemoteCode" : false
                }
            ]
        },

        "ScheduleConfig" :
        {
            "templateType" : "Standard",
            "templateName" : "Standard_LLM",
            "cacheBlockSize" : 128,

            "maxPrefillBatchSize" : 50,
            "maxPrefillTokens" : 8192,
            "prefillTimeMsPerReq" : 150,
            "prefillPolicyType" : 0,

            "decodeTimeMsPerReq" : 50,
            "decodePolicyType" : 0,

            "maxBatchSize" : 200,
            "maxIterTimes" : 512,
            "maxPreemptCount" : 0,
            "supportSelectBatch" : false,
            "maxQueueDelayMicroseconds" : 5000
        }
    }
}

启动模型服务

在修改完成配置后，启动服务。

/usr/local/Ascend/mindie/latest/mindie-service/bin
./mindieservice_daemon

调用接口

方式一

可以使用openai接口格式

curl -H "Content-type: application/json" -X POST -d '{
"model": "DeepSeek-R1-Distill-Qwen-14B",
"messages": [{
 "role": "user",
 "content": "我有五天假期，我想去海南玩，请给我一个攻略"
}],
"max_tokens": 512,
"presence_penalty": 1.03,
"frequency_penalty": 1.0,
"seed": null,
"temperature": 0.5,
"top_p": 0.95,
"stream": false
}' http://127.0.0.1:1025/v1/chat/completions

方式二

curl -v  -X POST -H 'Content-Type: application/json' -d '{"prompt":"自我介绍自己","max_tokens":150,"stream":false,"do_sample":true,"repetition_penalty":1.5,"temperature":0.7,"top_p":0.95,"top_k":100,"model":"DeepSeek-R1-Distill-Qwen-14B"}' 127.0.0.1:1025/generate

问题记录

启动提示无权限

[root@lolicp bin]# ./mindieservice_daemon
terminate called after throwing an instance of 'std::system_error'
  what():  Operation not permitted
Aborted (core dumped)

启动容器需添加--privileged

无法查看显卡

[root@lolicp /]# npu-smi info
dcmi module initialize failed. ret is -8005

启动容器需添加--privileged

启动提示权限不符

[root@lolicp bin]# ./mindieservice_daemon
Check path: config.json failed, by: Check Other group permission failed: Current permission is 4, but required no greater than 0. Required permission: 750, but got 644
Failed to check config.json under model weight path.
ERR: Failed to init endpoint! Please check the service log or console output.
Killed

赋权

chmod 750 /models/DeepSeek-R1-Distill-Qwen-14B/config.json

启动提示无证书

[root@lolicp bin]# ./mindieservice_daemon
The serverConfig.kmcKsfMaster path is invalid by: The input file: ksfa is not a regular file or not exists
The serverConfig.kmcKsfStandby path is invalid by: The input file: ksfb is not a regular file or not exists
The serverConfig_.tlsCert path is invalid by: The input file: server.pem is not a regular file or not exists
ERR: serverConfig_.tlsCrlFiles file not exit .
The serverConfig_.tlsCaFile path is invalid by: The input file: ca.pem is not a regular file or not exists
The serverConfig_.tlsPk path is invalid by: The input file: server.key.pem is not a regular file or not exists
The serverConfig_.tlsPkPwd path is invalid by: The input file: key_pwd.txt is not a regular file or not exists
The ServerConfig.managementTlsCert path is invalid by: The input file: server.pem is not a regular file or not exists
The ServerConfig.managementTlsCrlPath path is not a dir by:
ERR: serverConfig_.managementTlsCrlFiles file not exit .
ERR: serverConfig_.managementTlsCaFile file not exit .
The ServerConfig.managementTlsPk path is invalid by: The input file: server.key.pem is not a regular file or not exists
The ServerConfig.managementTlsPkPwd path is invalid by: The input file: key_pwd.txt is not a regular file or not exists
ERR: Failed to init endpoint! Please check the service log or console output.
Killed

修改/usr/local/Ascend/mindie/latest/mindie-service/conf/config.json配置文件中的httpsEnabled为false

显卡总数错误

[root@lolicp bin]# ./mindieservice_daemon
the size of npuDeviceIds (subset) does not equal to worldSize
ERR: Failed to init endpoint! Please check the service log or console output.
Killed

worldSize不等于npuDeviceIds数量，需修改/usr/local/Ascend/mindie/latest/mindie-service/conf/config.json

显卡id无效

[root@lolicp bin]# ./mindieservice_daemon
invalid argument: npuDeviceId
Daemon is killing...
Killed

npuDeviceIds的ID不对，需修改/usr/local/Ascend/mindie/latest/mindie-service/conf/config.json 通常为0-4区间

预热失败

2025-02-25 11:59:11,563 [ERROR] model.py:39 - [Model]   >>> Exception:Warmup failed. This issue could be caused by setting both `max_prefill_tokens` and `max_input_length` to very large values, or setting insufficient `npu_mem`. Reducing either `max_prefill_tokens` or `max_input_length` may help resolve this issue. If `npu_mem` is -1, try to increase the environment value `NPU_MEMORY_FRACTION` or `npu_mem` in configuration directly. Increase `world_size` can be another choice.
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/model_wrapper/model.py", line 37, in initialize
    return self.python_model.initialize(config)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/model_wrapper/standard_model.py", line 146, in initialize
    self.generator = Generator(
                     ^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/mindie_llm/text_generator/generator.py", line 119, in __init__
    self.warm_up(max_prefill_tokens, max_seq_len, max_input_len, max_iter_times, inference_mode)
  File "/usr/local/lib/python3.11/site-packages/mindie_llm/text_generator/generator.py", line 281, in warm_up
    prefill_reqs, block_tables, prefill_blocks = self._get_warm_up_reqs(total_blocks, max_prefill_tokens,
                                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/mindie_llm/text_generator/generator.py", line 427, in _get_warm_up_reqs
    raise RuntimeError(message)
RuntimeError: Warmup failed. This issue could be caused by setting both `max_prefill_tokens` and `max_input_length` to very large values, or setting insufficient `npu_mem`. Reducing either `max_prefill_tokens` or `max_input_length` may help resolve this issue. If `npu_mem` is -1, try to increase the environment value `NPU_MEMORY_FRACTION` or `npu_mem` in configuration directly. Increase `world_size` can be another choice.
2025-02-25 11:59:11,565 [ERROR] model.py:42 - [Model]   >>> return initialize error result: {'status': 'error', 'npuBlockNum': '0', 'cpuBlockNum': '0'}

将对应参数调小(不懂的使用默认即可)

[root@lolicp bin]# cat ../conf/config.json |python -m json.tool|grep max
            "maxSeqLen": 32768,
            "maxInputTokenLen": 32768,
            "maxPrefillTokens": 32768,
            "maxIterTimes": 32768,

启动提示显卡异常

2025-02-25 12:32:32,229 [ERROR] model.py:39 - [Model]   >>> Exception:npuSynchronizeDevice:build/CMakeFiles/torch_npu.dir/compiler_depend.ts:467 NPU function error: AclrtSynchronizeDeviceWithTimeout, error code is 507899
[ERROR] 2025-02-25-12:32:31 (PID:3163, Device:0, RankID:-1) ERR00100 PTA call acl api failed
[Error]: An internal error occurs in the Driver module.
        Rectify the fault based on the error information in the ascend log.
EL0004: [PID: 3163] 2025-02-25-12:32:32.111.528 Failed to allocate memory.
        Possible Cause: Available memory is insufficient.
        Solution: Close applications not in use.
        TraceBack (most recent call last):
        rtDeviceSynchronizeWithTimeout execute failed, reason=[driver error:internal error][FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:53]
        wait for compute device to finish failed, runtime result = 507899.[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161]
        Failed to allocate memory.
        Init device error msg handler failed, retCode=0x7020016.[FUNC:GetDevErrMsg][FILE:api_impl.cc][LINE:5380]
        rtGetDevMsg execute failed, reason=[driver error:out of memory][FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:53]
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/model_wrapper/model.py", line 37, in initialize
    return self.python_model.initialize(config)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/model_wrapper/standard_model.py", line 146, in initialize
    self.generator = Generator(
                     ^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/mindie_llm/text_generator/generator.py", line 119, in __init__
    self.warm_up(max_prefill_tokens, max_seq_len, max_input_len, max_iter_times, inference_mode)
  File "/usr/local/lib/python3.11/site-packages/mindie_llm/text_generator/generator.py", line 305, in warm_up
    npu.synchronize()
  File "/usr/local/lib64/python3.11/site-packages/torch_npu/npu/utils.py", line 35, in synchronize
    return torch_npu._C._npu_synchronize()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: npuSynchronizeDevice:build/CMakeFiles/torch_npu.dir/compiler_depend.ts:467 NPU function error: AclrtSynchronizeDeviceWithTimeout, error code is 507899
[ERROR] 2025-02-25-12:32:31 (PID:3163, Device:0, RankID:-1) ERR00100 PTA call acl api failed
[Error]: An internal error occurs in the Driver module.
        Rectify the fault based on the error information in the ascend log.
EL0004: [PID: 3163] 2025-02-25-12:32:32.111.528 Failed to allocate memory.
        Possible Cause: Available memory is insufficient.
        Solution: Close applications not in use.
        TraceBack (most recent call last):
        rtDeviceSynchronizeWithTimeout execute failed, reason=[driver error:internal error][FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:53]
        wait for compute device to finish failed, runtime result = 507899.[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161]
        Failed to allocate memory.
        Init device error msg handler failed, retCode=0x7020016.[FUNC:GetDevErrMsg][FILE:api_impl.cc][LINE:5380]
        rtGetDevMsg execute failed, reason=[driver error:out of memory][FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:53]

2025-02-25 12:32:32,231 [ERROR] model.py:42 - [Model]   >>> return initialize error result: {'status': 'error', 'npuBlockNum': '0', 'cpuBlockNum': '0'}

NPU显卡已掉卡，导致内存不足。

模型启动或请求报错

2025-02-25 12:44:21,818 [ERROR] standard_model.py:194 - [Model] >>>Execute type:1, Exception:Execute fail, enable log: export ASDOPS_LOG_LEVEL=ERROR, export ASDOPS_LOG_TO_STDOUT=1 to findthe first error. For more details, see the MindIE official document.
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/model_wrapper/standard_model.py", line 183, in execute
    return self._prefill(requests)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/model_wrapper/standard_model.py", line 217, in _prefill
    return self.__generate(requests, is_prefill=True, is_mix=False)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/model_wrapper/standard_model.py", line 348, in __generate
    request_tokens_np, eos_np, truncation_indices = self.__handle_requests(requests, is_prefill, is_mix)
                                                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/model_wrapper/standard_model.py", line 532, in __handle_requests
    request_tokens_np, eos_np, _, truncation_indices = self.generator.generate_token(metadata)
                                                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/mindie_llm/text_generator/generator.py", line 181, in generate_token
    raise e
  File "/usr/local/lib/python3.11/site-packages/mindie_llm/text_generator/generator.py", line 157, in generate_token
    self.plugin.generate_token(input_metadata))
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/mindie_llm/utils/decorators/time_decorator.py", line 38, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/mindie_llm/text_generator/plugins/plugin.py", line 17, in generate_token
    logits = self.generator_backend.forward(model_inputs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/mindie_llm/utils/decorators/time_decorator.py", line 38, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/mindie_llm/text_generator/adapter/generator_torch.py", line 153, in forward
    logits = self.model_wrapper.forward(model_inputs, self.cache_pool.npu_cache, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/mindie_llm/modeling/model_wrapper/atb/atb_model_wrapper.py", line 89, in forward
    logits = self.forward_tensor(
             ^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/mindie_llm/modeling/model_wrapper/atb/atb_model_wrapper.py", line 116, in forward_tensor
    logits = self.model_runner.forward(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/Ascend/atb-models/atb_llm/runner/model_runner.py", line 193, in forward
    return self.model.forward(**kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/Ascend/atb-models/atb_llm/models/base/flash_causal_lm.py", line 458, in forward
    logits = self.execute_ascend_operator(acl_inputs, acl_param, is_prefill)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/Ascend/atb-models/atb_llm/models/qwen2/flash_causal_qwen2.py", line 374, in execute_ascend_operator
    acl_model_out = model_operation.execute(acl_inputs, acl_param)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Execute fail, enable log: export ASDOPS_LOG_LEVEL=ERROR, export ASDOPS_LOG_TO_STDOUT=1 to findthe first error. For more details, see the MindIE official document.

多NPU卡联动异常，可开启debug查看日志。可能是内存或显存溢出。