当前位置：首页 > news >正文

昇腾CANN asc-devkit 工具链：从环境配置到第一个推理结果

news 2026/5/28 7:23:58

前言买了一台 Atlas 服务器想把 PyTorch 模型跑在昇腾 NPU 上。第一步装什么、环境怎么配、第一个 demo 怎么跑起来asc-devkit 给你一套完整的工具链。这篇文章手把手从零开始完整走一遍环境配置到 ResNet50 推理的全流程。环境准备驱动和 CANN 版本对应硬件与软件版本检查昇腾 NPU 的运行环境有一套严格的版本对应关系硬件层Atlas 训练服务器Ascend 910× 8 驱动层驱动版本 23.0.rc3 CANN层CANN 8.0.RC3 框架层PyTorch 2.1.0 torch_npu 工具层asc-devkit版本错配是最常见的报错原因。升级之前先去社区查兼容性矩阵。# 1. 检查 NPU 状态npu-smi info# 预期输出# -----------------------------------------------------------------------------# | NPU 0 Card Type: Ascend 910P8 | 0 Used 32GB | Product: Atlas |# | NPU 1 Card Type: Ascend 910P8 | 0 Free 32GB | Product: Atlas |# -----------------------------------------------------------------------------# 2. 检查驱动版本cat/usr/local/Ascend/driver/version.info# 3. 检查 CANN 版本python-cimport acl; print(acl.__version__)# 或python-cimport torch; print(torch.__version__); import torch_npu; print(torch_npu.__version__)驱动安装如果还没装驱动安装需要 root 权限按以下顺序# 1. 下载驱动包从昇腾官网下载对应版本# 注意驱动版本必须跟 CANN 版本匹配wgethttps://www.hiascend.com/document/detail/Ascend/Resources/drivepack/ATlas800-9000/...# 2. 安装驱动sudobashAscend-driver-{version}-linux.run--full# 3. 验证安装ls/usr/local/Ascend/driver/# 应该看到 driver/ 目录和 version.info# 4. 检查版本cat/usr/local/Ascend/driver/version.info# 输出示例Driver Version23.0.rc3 Build Date2024-03-15驱动和 CANN 版本对应规则CANN 版本驱动版本要求备注CANN 8.0.RC3驱动 23.0.rc3当前最新CANN 7.1驱动 22.0.x长期支持版CANN 6.4驱动 21.0.x旧版本兼容版本错配会导致 ACL 初始化失败error code 101。CANN 安装CANN 是昇腾的异构计算架构包含了算子库、编译器、Runtime 等组件。# 1. 下载 CANN 包社区版或商业版# 社区版下载地址https://www.hiascend.com/document/detail/Ascend/Resources/cann/wgethttps://www.hiascend.com/document/detail/Ascend/Resources/cann/...# 2. 安装 CANN社区版不需要 rootpipinstallAscend-cann-community-8.0.RC3-linux.x86_64.run# 3. 设置环境变量source/usr/local/Ascend/ascend-toolkit/set_env.sh# 4. 验证 CANN 安装python-cimport acl; print(acl.__version__)# 或用命令行atc--version# 预期输出Ascend CANN 8.0.RC3# 5. 永久写入环境变量推荐echosource /usr/local/Ascend/ascend-toolkit/set_env.sh~/.bashrcCANN 组件说明组件功能重要目录ACLAscend Computing Language统一 API 层/usr/local/Ascend/ascend-toolkitHCCL集合通信库分布式训练用GEGraph Engine图编译器模型转换用Runtime运行时推理执行用算子库各类算子实现ops-* 仓库asc-devkit 安装pip 安装推荐# 安装最新稳定版pipinstallascend-mindx-sdk-ihttps://repo.huaweicloud.com/repository/pypi/simple/# 或者从源码安装体验最新功能gitclone https://atomgit.com/cann/asc-devkitcdasc-devkit pipinstall-e.# 验证安装python-cimport asc_devkit; print(asc-devkit 版本:, asc_devkit.__version__)# 如果 import 报错检查安装路径pip show ascend-mindx-sdkconda 环境隔离强烈建议每个项目用独立的 conda 环境避免依赖冲突和版本污染# 创建昇腾专用环境conda create-nascend-envpython3.10-yconda activate ascend-env# 安装 PyTorch NPU 版注意版本对应关系pipinstalltorch2.1.0 pipinstalltorch-npu5.1.rc3-ihttps://repo.huaweicloud.com/repository/pypi/simple/# 验证 PyTorch 识别 NPUpython-cimport torch; print(PyTorch 版本:, torch.__version__)python-cimport torch_npu; print(NPU 版本:, torch_npu.__version__)# 确认 NPU 可用python-cimport torch; print(CUDA if torch.cuda.is_available() else CPU)# 输出应为NPU表示昇腾 NPU 被识别模型转换.onnx → .om步骤1从 PyTorch 导出 ONNXasc-devkit 支持多种模型格式。这里以 PyTorch ResNet50 为例先导出 ONNX# 1_pytorch_to_onnx.pyimporttorchimporttorchvision.modelsasmodels# 加载预训练模型modelmodels.resnet50(weightsmodels.ResNet50_Weights.DEFAULT)model.eval()# 准备输入标准 ImageNet 预处理尺寸dummy_inputtorch.randn(1,3,224,224)# 导出 ONNXtorch.onnx.export(model,dummy_input,resnet50.onnx,input_names[input],output_names[output],opset_version13,# 推荐 13 或以上dynamic_axes{# 动态 batch size方便推理时调整input:{0:batch_size},output:{0:batch_size}})print(ONNX 导出成功resnet50.onnx)步骤2ONNX 转 OMAcore 模型格式# 2_onnx_to_om.pyimportcann# 模型转换配置configcann.ModelConvertConfig(input_formatNCHW,input_shapeinput:1,3,224,224,output_pathresnet50.om,soc_versionAscend910P8,precision_modeforce_fp16,# 混合精度FP16 推理op_debug_level0)# 执行转换modelcann.ModelConverter()model.convert(resnet50.onnx,config)print(OM 转换成功resnet50.om)步骤3ATC 命令行转换备选如果 Python API 有问题可以用 ATC 命令行# 设置环境变量source/usr/local/Ascend/ascend-toolkit/set_env.sh# 执行转换atc\--modelresnet50.onnx\--framework5\--outputresnet50\--input_shapeinput:1,3,224,224\--soc_versionAscend910P8\--precision_modeforce_fp16\--op_debug_level0# 参数说明# --framework5 表示 ONNX 格式# --soc_version 芯片型号# --precision_mode 精度模式force_fp16/auto/force_fp32推理部署ACL 接口调用基本推理流程# 3_inference.pyimportcannimportnumpyasnpfromPILimportImage# 1. 加载 OM 模型modelcann.model.load_model(resnet50.om)# 2. 图片预处理ImageNet 标准defpreprocess(image_path):imgImage.open(image_path).convert(RGB)imgimg.resize((224,224))img_arraynp.array(img).astype(np.float32)/255.0# 标准化ImageNet 统计值meannp.array([0.485,0.456,0.406])stdnp.array([0.229,0.224,0.225])img_array(img_array-mean)/std# HWC → CHWimg_arrayimg_array.transpose(2,0,1)# 加 batch 维度returnimg_array[np.newaxis,:,:,:]# 3. 执行推理imagepreprocess(test_image.jpg)outputsmodel.execute(image)# 4. 后处理取最大概率类别pred_classint(np.argmax(outputs[0]))print(f预测类别{pred_class})批量推理# 4_batch_inference.pyimportcannimportnumpyasnpimportglob# 加载模型modelcann.model.load_model(resnet50.om)# 批量处理文件夹中的图片image_pathsglob.glob(test_images/*.jpg)batch_size8foriinrange(0,len(image_paths),batch_size):batch_pathsimage_paths[i:ibatch_size]# 批量读取和预处理batch_images[]forpathinbatch_paths:imgpreprocess(path)batch_images.append(img)# 拼接 batchbatch_tensornp.concatenate(batch_images,axis0)# 推理outputsmodel.execute(batch_tensor)# 批量后处理forj,outinenumerate(outputs):predint(np.argmax(out))print(f图片{batch_paths[j]}: 类别{pred})性能验证# 5_performance_test.pyimportcannimportnumpyasnpimporttime modelcann.model.load_model(resnet50.om)# Warmup第一次有 JIT 编译_model.execute(np.random.randn(1,3,224,224).astype(np.float32))# 测试 100 次推理iterations100times[]for_inrange(iterations):dummynp.random.randn(1,3,224,224).astype(np.float32)starttime.time()_model.execute(dummy)elapsed(time.time()-start)*1000# mstimes.append(elapsed)# 统计times.sort()print(fAvg:{np.mean(times):.2f}ms)print(fP50:{times[iterations//2]:.2f}ms)print(fP95:{times[int(iterations*0.95)]:.2f}ms)print(fP99:{times[int(iterations*0.99)]:.2f}ms)# 吞吐print(fThroughput:{1000/np.mean(times):.2f}FPS)常见错误码和解决方式错误1驱动版本不匹配Error: aclInit failed with error code 101这是最常见的错误。解决方式# 检查驱动和 CANN 版本对应关系# CANN 8.0 配驱动 23.0.rc3# 降级或升级 CANN/驱动到匹配版本# 临时方案设置忽略版本检查不推荐生产环境exportASCEND_SKIP_VERSION_CHECK1错误2模型转换 Shape 不匹配Error: Invalid input shape, expected [1,3,224,224]检查输入数据的实际 shapeimportnumpyasnpprint(f实际 shape:{input_tensor.shape})print(f预期 shape: (1, 3, 224, 224))错误3OM 加载失败Error: Model file not found or format error检查 OM 文件是否损坏# 检查文件是否存在ls-lhresnet50.om# 用 ATC 重新转换开启详细日志atc...--log_level3错误4NPU 显存不足Error: Out of memory in device 0清理显存或减小 batch size# 查看显存使用importcann infocann.rt.get_mem_info()print(fUsed:{info.used/1024**3:.2f}GB)# 减小 batch sizemodel.set_option(batch_size,4)踩坑记录Atlas 服务器特殊注意事项Atlas 服务器的环境配置跟普通开发机有些不同问题原因解决方式镜像版本选择错误Atlas A2/A3 服务器镜像不同社区下载页按机器型号选对应包环境变量不生效多用户同时操作互相覆盖每个项目用独立 conda 环境推理第一次很慢JIT 编译Warmup 10 次再正式测性能batch size 大了 OOM默认 batch_size 配置逐步加 batch观察显存峰值# Atlas 服务器专用检查# 1. 确认机器型号决定驱动版本cat/proc/device-tree/model# 2. 确认 NUMA 亲和性多卡时性能关键numactl--hardware# 3. 设置 NPU 可见性8 卡时exportASCEND_VISIBLE_DEVICES0,1,2,3,4,5,6,7完整脚本汇总# 完整流程脚本run_resnet.sh#!/bin/bashset-e# 环境激活source/usr/local/Ascend/ascend-toolkit/set_env.sh# 1. 推理已有 OM 文件时python 3_inference.py# 2. 批量推理性能测试python 5_performance_test.py# 3. 如果需要重新转换模型python 1_pytorch_to_onnx.py python 2_onnx_to_om.pyecho完成总结asc-devkit 工具链的核心流程查版本驱动/CANN/PyTorch 版本对应正确装工具asc-devkit torch_npu转模型ONNX → OMATC 或 Python API跑推理ACL 接口调用测性能batch size 吞吐验证遇到问题先去社区 FAQ 查90% 的问题在 FAQ 里有答案。仓库地址https://atomgit.com/cann/asc-devkit

查看全文

http://www.gsyq.cn/news/1411240.html