//Description: 记录在银河Kylin系统+华为昇腾一体机进行模型推理部署过程中的笔记,写于2025年3月
//Create Date: 2025-03-18 14:20:28
//Author: channy
[toc]
Edge-FHDS2300-Z1算力机。
计划在原ubuntu 20.04 (x8664) 服务器上继续训练,只在算力机 (Kylin V10 aarch64) 上做推理部署,不在算力机上训练(机器风扇太太太吵了@@)。
export LD_LIBRARY_PATH=/home/channy/Ascend/ascend-toolkit/latest/x86_64-linux/devlib/:$LD_LIBRARY_PATH
或者把
```sh
source /home/channy/Ascend/ascend-toolkit/set_env.sh
加入到~/.bashrc
中
$ atc
ATC start working now, please wait for a moment.
...
ATC run failed, Please check the detail log, Try 'atc --help' for more information
E10007: [PID: 10490] 2025-03-18-09:02:18.501.341 [--framework] is required. The value must be [0(Caffe) or 1(MindSpore) or 3(TensorFlow) or 5(Onnx)].
官网的第三方库支持版本列表:完整清单列表
torch_npu 这时还只支持torch 2.3.1以下的。
torch | torchvision | torch_npu |
---|---|---|
2.1.0 | 0.16.0 | 5.0.rc2 |
同时建议训练机和算力机的toolkit版本也保持一致,否则同样有概率模型推理时会报错,模型转换过程中没有报错。
选择samples中的样例resnet50_imagenet_classification
,可以根据Readme.md的指示直接下载.caffemodel、.prototxt和.om模型。
直接跑python脚本能够跑成功
跑pyACL加载模型
最开始看到拿到的机器里下载好了一个Ascend-cann-toolkit-8.0.RC1.alpha001_linux-aarch64.run
,没有怀疑直接安装,然后使用pyACL
加载模型一直加载失败返回错误码500002内部错误,见附录1。没有头绪尝试着重新下载安装RC3高一点版本的Ascend-cann-toolkit-8.0.RC3_linux-aarch64.run
然后加载就好了。。。好了。。。了。。。
$ npu-smi info
+--------------------------------------------------------------------------------------------------------+
| npu-smi 23.0.0 Version: 23.0.0 |
+-------------------------------+-----------------+------------------------------------------------------+
| NPU Name | Health | Power(W) Temp(C) Hugepages-Usage(page) |
| Chip Device | Bus-Id | AICore(%) Memory-Usage(MB) |
+===============================+=================+======================================================+
| 24 310P3 | OK | NA 29 0 / 0 |
| 0 0 | 0000:04:00.0 | 0 1832 / 21527 |
+===============================+=================+======================================================+
+-------------------------------+-----------------+------------------------------------------------------+
| NPU Chip | Process id | Process name | Process memory(MB) |
+===============================+=================+======================================================+
| No running processes found in NPU 24 |
+===============================+=================+======================================================+
其中Name
下方的310P3
前加上Ascend
即为版本号Ascend310P3
atc --model=caffe_model/resnet50.prototxt --weight=caffe_model/resnet50.caffemodel --framework=0 --output=model/resnet50 --soc_version=Ascendxxx --input_format=NCHW --input_fp16_nodes=data --output_type=FP32 --out_nodes=prob:0
发现同样的sample脚本代码,直接下载的.om模型能够跑成功,但自己转换的.om模型跑失败。开始以为是转换操作出问题,甚至还用netron对比了直接下载的模型和自己转换的模型,确实不一样。但实际上并不要紧,只是CANN的toolkit等软件的版本不一致导致的。全部统一成8.0.0后就可以正常跑了。
resnet50.om直接下载的模型(左)和自己转换的模型(右)对比图:
统一版本后自己转换的模型依旧比较“杂乱”,但其实是可以正常推理使用的。
torch.onnx.export
转换。
torch.onnx.export(model,
# torch.randn(1, 256, 95, 127).cpu(),
torch.randn(1, 3, 3040, 4064).cuda(),
output_name,
export_params=True,
input_names=['input'],
output_names=['output'],
opset_version = 15
)
.onnx模型到.om模型使用ATC工具转换(安装完toolkit后即有)
atc --model=encodeModel.onnx --framework=5 --output=encodeModel --soc_version=Ascend310B4
atc --model=encodeModel.onnx --framework=5 --output=encodeModel --soc_version=Ascend310P3
--framework
原始框架类型,各框架对应的数值如下:
0:Caffe; 1:MindSpore; 3:Tensorflow; 5:ONNX
torch.onnx.export
导出的.onnx输入输出都是双精度float32,可通过netron工具查看。当输入维度过大时,可能会出现在copy输入数据到Device时报错错误码207001,在转换时增加参数--input_fp16_nodes
转成单精度可以解决。模型输入是1x3x4064x3040的尺寸,写入数据一直报内存不够,刚开机16G内存也不够
ret = acl.rt.memcpy(input_data[0]["buffer"], input_data[0]["size"], np_ptr, input_data[0]["size"], ACL_MEMCPY_HOST_TO_DEVICE)
重新按样例中的转换增加参数后内存失败错误消失
atc --model=encodeModel.onnx --framework=5 --output=encodeModel --soc_version=Ascend310P3 --input_format=NCHW --input_fp16_nodes="input" --output_type=FP32 --out_nodes="output"
也有可能是上一次内存没有正常释放,过一段时间或重启后能正常调用。
把cuda改成npu,如torch.cuda.is_available
改成torch.npu.is_available
,xxx.cuda()
改成xxx.npu()
等。。。是不够的。。。
直接用torch.load
加载.om模型,报错
encodeModel = torch.load(sEncodeModel).float()
File "/home/edge/.local/lib/python3.8/site-packages/torch/serialization.py", line 1028, in load
return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
File "/home/edge/.local/lib/python3.8/site-packages/torch/serialization.py", line 1246, in _legacy_load
magic_number = pickle_module.load(f, **pickle_load_args)
ValueError: could not convert string to int
推理代码直接参考samples中的样例即可。Python可以参考resnet50_imagenet_classification/src/acl_net.py
。
--input_fp16_nodes
,则输入图像数据也需要转换成float16推理一直报错,见附录2
ret = acl.mdl.execute(model_id, load_input_dataset, load_output_dataset)
如果CANN各软件版本不一致时,在ubuntu训练机上转换模型并不会报任何错误,可以转换成功。但在算力机上运行时会报推理错误,错误码507011。统一版本即可解决,包括训练机上的x86_64的toolkit、算力机上的aarch64的toolkit、算力机上的kernel、算力机上的nnrt共四样,最好版本都保持一致。
atc --model=encodeModel.onnx --framework=5 --output=encodeModel --soc_version=Ascend310P3 --input_format=ND --input_fp16_nodes="input" --output_type=FP32 --out_nodes="output" --input_shape="input:1,3,-1,-1" --dynamic_dims="3040,4064;1088,1952"
atc --model=decodeModel.onnx --framework=5 --output=decodeModel --soc_version=Ascend310P3 --input_format=ND --input_fp16_nodes="input" --output_type=FP32 --out_nodes="output" --input_shape="input:1,256,-1,-1" --dynamic_dims="95,127;34,61"
aipp没有缩放,The max padding size is 32
|操作|aipp|
|:—:|:—:|:—:|
| 缩放 | 无 | 无 |
| RGB->BGR | 通道转换 | rbuv_swap_switch: true |
| /255.0 np.astype(“float16) | 归一化 | var_reci_chn_0: 0.00392156862745098 |
| cv::copyMakeBorder | padding | padding: true |
可见附录3 aipp配置
dvpp并不一定支持所有jpg图像,见图片格式不支持
从sample
中取出AclLiteImage
、AclLiteResource
和AclLiteImageProc
,对图像做缩放,发现并不是所有缩放保存成jpg的图像都正常。如opencv的/samples/data
下的ela_original.jpg
和orange.jpg
,暂未知两图像的区别。
解码JPEG图片,只支持JPEG图片为huffman编码(colorspace: yuv, subsample: 444/440/422/420/400 ),不支持算术编码,不支持渐进编码,不支持jpeg2000格式。 ```py import os import sys sys.path.append(os.path.abspath(os.path.join(os.path.dirname(file), ‘..’)))
from acl_net.acllite_image import AclLiteImage from acl_net.acllite_resource import AclLiteResource from acl_net.acllite_imageproc import AclLiteImageProc
import inspect import cv2
class DvppCls(object): def init(self): self.dvpp = None
def init_resource(self):
self.acl_resource = AclLiteResource()
self.acl_resource.init()
self.dvpp = AclLiteImageProc(self.acl_resource)
def dvpp_resize(self, image_path, save_path = 'dvpp.jpg'):
image_acl = AclLiteImage(image_path)
image_dvpp = image_acl.copy_to_dvpp()
yuv_image = self.dvpp.jpegd(image_dvpp)
image_resize = self.dvpp.resize(yuv_image, 640, 640)
print('line', inspect.stack()[0].lineno)
jpeg_image = self.dvpp.jpege(image_resize)
jpeg_image.save(save_path)
def dvpp_resize2(self, image_path, save_path = 'dvpp.jpg'):
image = cv2.imread(image_path)
ret, image_bytes = cv2.imencode('.jpg', image)
image_acl = AclLiteImage(image_bytes, image.shape[1], image.shape[0], size = image.size)
image_dvpp = image_acl.copy_to_dvpp()
yuv_image = self.dvpp.jpegd(image_dvpp)
image_resize = self.dvpp.resize(yuv_image, 640, 640)
print('line', inspect.stack()[0].lineno)
jpeg_image = self.dvpp.jpege(image_resize)
jpeg_image.save(save_path)
def release_source(self):
self.dvpp.__del__()
self.acl_resource.__del__()
AclLiteResource.__del__ = lambda x:0
AclLiteImage.__del__ = lambda x:0
AclLiteImageProc.__del__ = lambda x:0
if name == ‘main’: cls = DvppCls()
cls.init_resource()
cls.dvpp_resize('/home/edge/HGInference/opencv/samples/data/ela_original.jpg', '1.jpg')
cls.dvpp_resize('/home/edge/HGInference/opencv/samples/data/orange.jpg', '2.jpg')
cls.dvpp_resize2('/home/edge/HGInference/opencv/samples/data/orange.jpg', '3.jpg')
cls.release_source() ``` ### yuv查看 ```py def image2yuv():
input_image = cv2.imread('./tmp.jpg')
input_image = cv2.resize(input_image, (640, 640))
yuv_cv = cv2.cvtColor(input_image, cv2.COLOR_RGB2YUV_I420)
h, w = input_image.shape[:2]
y = yuv_cv[:h, :w]
uv = yuv_cv[h:, :]
uv_interleaved = uv.reshape(-1, w)
image_cv = np.vstack((y, uv_interleaved))
image_cv.tofile('output.yuv')
return image_cv ``` 使用ffmpeg转换成mp4后查看 ```sh ffmpeg -pixel_format yuv420p -s 640x640 -framerate 30 -i output.yuv -c:v libx264 output.mp4 ```
使用Yolov8n-seg的语义分割模型,发现不同的export导出onnx参数影响.om模型转换的成功率。
.onnx模型导出代码
# 直接导出,后面模型转换失败,报算子未注册 No parser is registered for Op
model.export(format = 'onnx', amp = False)
# 隆低op_version,后面模型转换成功
model.export(format = 'onnx', amp = False, dynamic=False, opset=9)
模型转换命令:
atc --model=yolov8n-seg.onnx --framework=5 --output=yolov8n-seg --input_format=NCHW --input_fp16_nodes="images" --input_shape="images:1,3,640,640" --log=error --soc_version=Ascend310P3
貌似只能指定[1,100]个输入维度,未找到像pt或onnx那种可以任意维度输入的设置。。。
ATC run failed, Please check the detail log, Try ‘atc –help’ for more information EC0010: [PID: 6039] 2025-04-21-15:01:07.931.973 Failed to import Python module [ModuleNotFoundError: No module named ‘tbe.common’.].
ATC run failed, Please check the detail log, Try ‘atc –help’ for more information
EC0010: [PID: 7651] 2025-04-21-15:18:29.050.937 Failed to import Python module [AttributeError: np.float_
was removed in the NumPy 2.0 release. Use np.float64
instead..].
使用miniconda,自己pip安装的tbe依旧报错,需要把`Ascend/ascend-toolkit/8.0.0/python/site-packages`中的tbe复制到对应环境目录下`miniconda3/envs/segment/lib/python3.10/site-package`。numpy版本不能太高,否则会报np.float相关错误
### run_model是ACL_HOST
`acl.rt.get_run_mode()`返回的是1即ACL_HOST,不管是样例还是自己的模型都是。
### 带无线网卡开机卡在开机画面
插着无线网卡开机会卡在“银河Kylin”那个蓝色标志的开机画面,只有先拔掉无线网卡再开机等开机进桌面后再插上无线网卡才能正常。不知道是无线网卡的原因还是什么原因。
## OpenVino
OpenVino更多支持Intel的CPU/GPU
[推理设备支持](https://docs.openvino.ai/cn/2022.3/openvino_workflow_zh_CN/deployment_intro_zh_CN/openvino_intro_zh_CN/GPU_zh_CN.html)
### apt install 安装依赖
libcurl4-openssl-dev
iperf3
llvm-12-dev
clang-12
libclang-12-dev
scons
### 编译调试
能够编译成功,但发现无法在aarch64的华为盒子里跑OpenVino,无论是CPU还是GPU或是NPU。CPU直接崩溃,GPU报找不到
```sh
Device with "GPU" name is not registedred in the OpenVINO Runtime
onnxruntime能够调用起华为盒子的CPU做推理,但速度巨慢,近10s(对应于调用Intel的CPU用时2s,Intel的GPU用时0.5s)
使用python的acl.mdl.load_from_file('xxx.om')
一直失败,错误码500002内部错误。
/home/xxx/ascend/log/debug/plog/里面的日志报错
[ERROR] DRV(69072,python3):2025-03-18-14:35:48.599.331 [ascend][curpid: 69072, 69072][drv][devmm][_devmm_mem_remote_map 1888]<errno:22, 8> Mem_remote_map ioctl error. (src_va=0x124080000000; size=4194304; devid=0; ret=8)
[ERROR] RUNTIME(69072,python3):2025-03-18-14:35:48.599.713 [pool.cc:1031]69072 MallocPcieBarBuffer:Pcie Host Register failed, retCode=0x7020010, size=4194304(Byte), dev_id=0.
[ERROR] RUNTIME(69072,python3):2025-03-18-14:35:48.599.725 [pool.cc:190]69072 BufferAllocator:allocFunc failed, init count=1024, item size=4096(Byte)
[ERROR] GE(69072,python3):2025-03-18-14:35:48.619.348 [model_utils.cc:1313]69072 GetHbmFeatureMapMemInfo: ErrorNo: 4294967295(failed) [LOAD][DEFAULT]Assert (sub_memory_info.size() == 3U) failed, expect 3 actual 4
[ERROR] GE(69072,python3):2025-03-18-14:35:48.619.603 [model_utils.cc:1329]69072 GetAllMemoryTypeSize: ErrorNo: 4294967295(failed) [LOAD][DEFAULT]Assert ((GetHbmFeatureMapMemInfo(ge_model, all_mem_info)) == ge::SUCCESS) failed
[ERROR] GE(69072,python3):2025-03-18-14:35:48.619.628 [model_utils.cc:1286]69072 InitRuntimeParams: ErrorNo: 4294967295(failed) [LOAD][DEFAULT]Assert (total_hbm_size == (static_cast<int64_t>(runtime_param.mem_size) - runtime_param.zero_copy_size)) failed, expect 469474304 actual 0
[ERROR] GE(69072,python3):2025-03-18-14:35:48.619.645 [davinci_model.cc:481]69072 InitRuntimeParams: ErrorNo: 4294967295(failed) [LOAD][DEFAULT]Assert ((ModelUtils::InitRuntimeParams(ge_model_, runtime_param_, device_id_)) == ge::SUCCESS) failed
[ERROR] GE(69072,python3):2025-03-18-14:35:48.619.661 [davinci_model.cc:686]69072 Init: ErrorNo: 4294967295(failed) [LOAD][DEFAULT]Assert ((InitRuntimeParams()) == ge::SUCCESS) failed
[ERROR] GE(69072,python3):2025-03-18-14:35:48.619.670 [model_manager.cc:1210]69072 LoadModelOffline: ErrorNo: 4294967295(failed) [LOAD][DEFAULT][Init][DavinciModel] failed, ret:1343225857.
[ERROR] GE(69072,python3):2025-03-18-14:35:48.620.302 [graph_loader.cc:143]69072 LoadModelFromData: ErrorNo: 1343225857(Parameter's invalid!) [LOAD][DEFAULT][Load][Model] failed, model_id:1.
[ERROR] ASCENDCL(69072,python3):2025-03-18-14:35:48.620.316 [model.cpp:280]69072 ModelLoadFromFileWithMem: [LOAD][DEFAULT][Model][FromData]load model from data failed, ge result[1343225857]
[ERROR] ASCENDCL(69072,python3):2025-03-18-14:35:48.620.518 [model.cpp:1637]69072 aclmdlLoadFromFile: [LOAD][DEFAULT]Load model from file failed!
[ERROR] GE(69072,python3):2025-03-18-14:35:48.620.624 [model_manager.cc:954]69072 GetInputOutputDescInfo: ErrorNo: 145003(Model id invalid.) [GET][DEFAULT][Get][Model] Failed, Invalid model id 0!
[ERROR] GE(69072,python3):2025-03-18-14:35:48.620.634 [graph_executor.cc:409]69072 GetInputOutputDescInfo: ErrorNo: 145003(Model id invalid.) [GET][DEFAULT][Get][InputOutputDescInfo] failed, model_id:0.
[ERROR] GE(69072,python3):2025-03-18-14:35:48.620.650 [ge_executor.cc:744]69072 GetModelDescInfo: ErrorNo: 145003(Model id invalid.) [GET][DEFAULT][Get][InputOutputDescInfo] failed. ret = 145003, model id:0
[ERROR] ASCENDCL(69072,python3):2025-03-18-14:35:48.620.658 [model.cpp:1386]69072 aclmdlGetDesc: [GET][DEFAULT][Get][ModelDescInfo]get model description failed, ge result[545008], model id[0]
[ERROR] DRV(10686,msame):2025-03-19-15:09:45.659.913 [ascend][curpid: 10686, 10686][drv][devmm][_devmm_mem_remote_map 1888]<errno:22, 8> Mem_remote_map ioctl error. (src_va=0x124080000000; size=4194304; devid=0; ret=8)
[ERROR] RUNTIME(10686,msame):2025-03-19-15:09:45.659.990 [h2d_copy_mgr.cc:360]10686 MallocPcieBarBuffer:Pcie Host Register failed, retCode=0x7020010, size=4194304(Byte), dev_id=0.
[ERROR] RUNTIME(10686,msame):2025-03-19-15:09:45.659.998 [buffer_allocator.cc:52]10686 BufferAllocator:allocFunc failed, init count=1024, item size=4096(Byte)
[ERROR] RUNTIME(10686,msame):2025-03-19-15:09:46.643.861 [hwts_engine.cc:123]10686 ReportExceptProc:[EXEC][DEFAULT]Real task exception! device_id=0, stream_id=5, task_id=2, task_type=0 (KERNEL_AICORE)
[ERROR] RUNTIME(10686,msame):2025-03-19-15:09:46.643.880 [hwts_engine.cc:128]10686 ReportExceptProc:[EXEC][DEFAULT]Task exception! device_id=0, stream_id=3, task_id=1, type=13(MODEL_EXECUTE), failuremode =0, retCode=0x91, [the model stream execute failed]
[ERROR] RUNTIME(10686,msame):2025-03-19-15:09:46.645.768 [device_error_proc.cc:634]10686 PrintCoreErrorInfo:[EXEC][DEFAULT]report error module_type=5, module_name=EZ9999
[ERROR] RUNTIME(10686,msame):2025-03-19-15:09:46.645.779 [device_error_proc.cc:634]10686 PrintCoreErrorInfo:[EXEC][DEFAULT]The error from device(0), serial number is 1, there is an aicore error, core id is 0, error code = 0x800000, dump info: pc start: 0x800124040115000, current: 0x1240401154e4, vec error info: 0x1fe5feff, mte error info: 0x30000c2, ifu error info: 0x3733fa77ff500, ccu error info: 0xffe5ff7b001f1ffb, cube error info: 0xfc, biu error info: 0, aic error mask: 0x65000200d000288, para base: 0x124040114180, errorStr: The DDR address of the MTE instruction is out of range.
[ERROR] RUNTIME(10686,msame):2025-03-19-15:09:46.645.897 [device_error_proc.cc:665]10686 PrintCoreErrorInfo:[EXEC][DEFAULT]report error module_type=5, module_name=EZ9999
[ERROR] RUNTIME(10686,msame):2025-03-19-15:09:46.645.904 [device_error_proc.cc:665]10686 PrintCoreErrorInfo:[EXEC][DEFAULT]The extend info from device(0), serial number is 1, there is aicore error, core id is 0, aicore int: 0x10, aicore error2: 0, axi clamp ctrl: 0, axi clamp state: 0x1717, biu status0: 0x101d14000000000, biu status1: 0x80000201020000, clk gate mask: 0, dbg addr: 0, ecc en: 0, mte ccu ecc 1bit error: 0x2680000000000000, vector cube ecc 1bit error: 0, run stall: 0x1, dbg data0: 0, dbg data1: 0, dbg data2: 0, dbg data3: 0, dfx data: 0xf6
[ERROR] RUNTIME(10686,msame):2025-03-19-15:09:46.645.943 [device_error_proc.cc:634]10686 PrintCoreErrorInfo:[EXEC][DEFAULT]report error module_type=5, module_name=EZ9999
[ERROR] RUNTIME(10686,msame):2025-03-19-15:09:46.645.949 [device_error_proc.cc:634]10686 PrintCoreErrorInfo:[EXEC][DEFAULT]The error from device(0), serial number is 1, there is an aicore error, core id is 1, error code = 0x800000, dump info: pc start: 0x800124040115000, current: 0x1240401154e4, vec error info: 0x1ff7bdd1, mte error info: 0x30000c2, ifu error info: 0xce7fb8f80f80, ccu error info: 0xbf1d825a004a9c77, cube error info: 0xff, biu error info: 0, aic error mask: 0x65000200d000288, para base: 0x124040114180, errorStr: The DDR address of the MTE instruction is out of range.
[ERROR] RUNTIME(10686,msame):2025-03-19-15:09:46.645.983 [device_error_proc.cc:665]10686 PrintCoreErrorInfo:[EXEC][DEFAULT]report error module_type=5, module_name=EZ9999
[ERROR] RUNTIME(10686,msame):2025-03-19-15:09:46.645.989 [device_error_proc.cc:665]10686 PrintCoreErrorInfo:[EXEC][DEFAULT]The extend info from device(0), serial number is 1, there is aicore error, core id is 1, aicore int: 0x10, aicore error2: 0, axi clamp ctrl: 0, axi clamp state: 0x1717, biu status0: 0x101d14000000000, biu status1: 0x80000201020000, clk gate mask: 0, dbg addr: 0, ecc en: 0, mte ccu ecc 1bit error: 0x3580000000000000, vector cube ecc 1bit error: 0, run stall: 0x1, dbg data0: 0, dbg data1: 0, dbg data2: 0, dbg data3: 0, dfx data: 0xd8
[ERROR] RUNTIME(10686,msame):2025-03-19-15:09:46.646.020 [device_error_proc.cc:634]10686 PrintCoreErrorInfo:[EXEC][DEFAULT]report error module_type=5, module_name=EZ9999
[ERROR] RUNTIME(10686,msame):2025-03-19-15:09:46.646.026 [device_error_proc.cc:634]10686 PrintCoreErrorInfo:[EXEC][DEFAULT]The error from device(0), serial number is 1, there is an aicore error, core id is 2, error code = 0x800000, dump info: pc start: 0x800124040115000, current: 0x1240401154e4, vec error info: 0x13d384cd, mte error info: 0x30000c2, ifu error info: 0x2f5b80a07aa00, ccu error info: 0xdffccfc5007fff7f, cube error info: 0xfb, biu error info: 0, aic error mask: 0x65000200d000288, para base: 0x124040114180, errorStr: The DDR address of the MTE instruction is out of range.
[ERROR] RUNTIME(10686,msame):2025-03-19-15:09:46.646.059 [device_error_proc.cc:665]10686 PrintCoreErrorInfo:[EXEC][DEFAULT]report error module_type=5, module_name=EZ9999
[ERROR] RUNTIME(10686,msame):2025-03-19-15:09:46.646.065 [device_error_proc.cc:665]10686 PrintCoreErrorInfo:[EXEC][DEFAULT]The extend info from device(0), serial number is 1, there is aicore error, core id is 2, aicore int: 0x10, aicore error2: 0, axi clamp ctrl: 0, axi clamp state: 0x1717, biu status0: 0x101d14000000000, biu status1: 0x80000201020000, clk gate mask: 0, dbg addr: 0, ecc en: 0, mte ccu ecc 1bit error: 0x780000000000000, vector cube ecc 1bit error: 0, run stall: 0x1, dbg data0: 0, dbg data1: 0, dbg data2: 0, dbg data3: 0, dfx data: 0xbe
[ERROR] RUNTIME(10686,msame):2025-03-19-15:09:46.646.095 [device_error_proc.cc:634]10686 PrintCoreErrorInfo:[EXEC][DEFAULT]report error module_type=5, module_name=EZ9999
[ERROR] RUNTIME(10686,msame):2025-03-19-15:09:46.646.102 [device_error_proc.cc:634]10686 PrintCoreErrorInfo:[EXEC][DEFAULT]The error from device(0), serial number is 1, there is an aicore error, core id is 3, error code = 0x800000, dump info: pc start: 0x800124040115000, current: 0x1240401154e4, vec error info: 0xf5cfe7b, mte error info: 0x30000c2, ifu error info: 0xffb7656fbf00, ccu error info: 0xfffcedff0078fcb7, cube error info: 0x86, biu error info: 0, aic error mask: 0x65000200d000288, para base: 0x124040114180, errorStr: The DDR address of the MTE instruction is out of range.
[ERROR] RUNTIME(10686,msame):2025-03-19-15:09:46.646.130 [device_error_proc.cc:665]10686 PrintCoreErrorInfo:[EXEC][DEFAULT]report error module_type=5, module_name=EZ9999
[ERROR] RUNTIME(10686,msame):2025-03-19-15:09:46.646.136 [device_error_proc.cc:665]10686 PrintCoreErrorInfo:[EXEC][DEFAULT]The extend info from device(0), serial number is 1, there is aicore error, core id is 3, aicore int: 0x10, aicore error2: 0, axi clamp ctrl: 0, axi clamp state: 0x1717, biu status0: 0x101d14000000000, biu status1: 0x80000201020000, clk gate mask: 0, dbg addr: 0, ecc en: 0, mte ccu ecc 1bit error: 0x2780000000000000, vector cube ecc 1bit error: 0, run stall: 0x1, dbg data0: 0, dbg data1: 0, dbg data2: 0, dbg data3: 0, dfx data: 0xfe
[ERROR] RUNTIME(10686,msame):2025-03-19-15:09:46.646.167 [device_error_proc.cc:634]10686 PrintCoreErrorInfo:[EXEC][DEFAULT]report error module_type=5, module_name=EZ9999
[ERROR] RUNTIME(10686,msame):2025-03-19-15:09:46.646.174 [device_error_proc.cc:634]10686 PrintCoreErrorInfo:[EXEC][DEFAULT]The error from device(0), serial number is 1, there is an aicore error, core id is 4, error code = 0x800000, dump info: pc start: 0x800124040115000, current: 0x1240401154e4, vec error info: 0x1ffafabd, mte error info: 0x30000c2, ifu error info: 0x33fb746e47080, ccu error info: 0xd97e9cf70019a4fc, cube error info: 0xdf, biu error info: 0, aic error mask: 0x65000200d000288, para base: 0x124040114180, errorStr: The DDR address of the MTE instruction is out of range.
[ERROR] RUNTIME(10686,msame):2025-03-19-15:09:46.646.207 [device_error_proc.cc:665]10686 PrintCoreErrorInfo:[EXEC][DEFAULT]report error module_type=5, module_name=EZ9999
[ERROR] RUNTIME(10686,msame):2025-03-19-15:09:46.646.213 [device_error_proc.cc:665]10686 PrintCoreErrorInfo:[EXEC][DEFAULT]The extend info from device(0), serial number is 1, there is aicore error, core id is 4, aicore int: 0x10, aicore error2: 0, axi clamp ctrl: 0, axi clamp state: 0x1717, biu status0: 0x101d14000000000, biu status1: 0x80000201020000, clk gate mask: 0, dbg addr: 0, ecc en: 0, mte ccu ecc 1bit error: 0x2580000000000000, vector cube ecc 1bit error: 0, run stall: 0x1, dbg data0: 0, dbg data1: 0, dbg data2: 0, dbg data3: 0, dfx data: 0xf9
[ERROR] RUNTIME(10686,msame):2025-03-19-15:09:46.646.246 [device_error_proc.cc:634]10686 PrintCoreErrorInfo:[EXEC][DEFAULT]report error module_type=5, module_name=EZ9999
[ERROR] RUNTIME(10686,msame):2025-03-19-15:09:46.646.252 [device_error_proc.cc:634]10686 PrintCoreErrorInfo:[EXEC][DEFAULT]The error from device(0), serial number is 1, there is an aicore error, core id is 5, error code = 0x800000, dump info: pc start: 0x800124040115000, current: 0x1240401154e4, vec error info: 0x1a37f6cf, mte error info: 0x30000c2, ifu error info: 0x24ff2cb6f6f80, ccu error info: 0xf1cfe3ff004edf77, cube error info: 0xee, biu error info: 0, aic error mask: 0x65000200d000288, para base: 0x124040114180, errorStr: The DDR address of the MTE instruction is out of range.
[ERROR] RUNTIME(10686,msame):2025-03-19-15:09:46.646.285 [device_error_proc.cc:665]10686 PrintCoreErrorInfo:[EXEC][DEFAULT]report error module_type=5, module_name=EZ9999
[ERROR] RUNTIME(10686,msame):2025-03-19-15:09:46.646.291 [device_error_proc.cc:665]10686 PrintCoreErrorInfo:[EXEC][DEFAULT]The extend info from device(0), serial number is 1, there is aicore error, core id is 5, aicore int: 0x10, aicore error2: 0, axi clamp ctrl: 0, axi clamp state: 0x1717, biu status0: 0x101d14000000000, biu status1: 0x80000201020000, clk gate mask: 0, dbg addr: 0, ecc en: 0, mte ccu ecc 1bit error: 0x2780000000000000, vector cube ecc 1bit error: 0, run stall: 0x1, dbg data0: 0, dbg data1: 0, dbg data2: 0, dbg data3: 0, dfx data: 0xe7
[ERROR] RUNTIME(10686,msame):2025-03-19-15:09:46.646.323 [device_error_proc.cc:634]10686 PrintCoreErrorInfo:[EXEC][DEFAULT]report error module_type=5, module_name=EZ9999
[ERROR] RUNTIME(10686,msame):2025-03-19-15:09:46.646.330 [device_error_proc.cc:634]10686 PrintCoreErrorInfo:[EXEC][DEFAULT]The error from device(0), serial number is 1, there is an aicore error, core id is 6, error code = 0x800000, dump info: pc start: 0x800124040115000, current: 0x1240401154e4, vec error info: 0x7de7dbc, mte error info: 0x30000c2, ifu error info: 0x3fbfffff6f700, ccu error info: 0xfffffffd005dffbb, cube error info: 0xdf, biu error info: 0, aic error mask: 0x65000200d000288, para base: 0x124040114180, errorStr: The DDR address of the MTE instruction is out of range.
[ERROR] RUNTIME(10686,msame):2025-03-19-15:09:46.646.361 [device_error_proc.cc:665]10686 PrintCoreErrorInfo:[EXEC][DEFAULT]report error module_type=5, module_name=EZ9999
[ERROR] RUNTIME(10686,msame):2025-03-19-15:09:46.646.367 [device_error_proc.cc:665]10686 PrintCoreErrorInfo:[EXEC][DEFAULT]The extend info from device(0), serial number is 1, there is aicore error, core id is 6, aicore int: 0x10, aicore error2: 0, axi clamp ctrl: 0, axi clamp state: 0x1717, biu status0: 0x101d14000000000, biu status1: 0x80000201020000, clk gate mask: 0, dbg addr: 0, ecc en: 0, mte ccu ecc 1bit error: 0x3f80000000000000, vector cube ecc 1bit error: 0, run stall: 0x1, dbg data0: 0, dbg data1: 0, dbg data2: 0, dbg data3: 0, dfx data: 0xdf
[ERROR] RUNTIME(10686,msame):2025-03-19-15:09:46.646.398 [device_error_proc.cc:634]10686 PrintCoreErrorInfo:[EXEC][DEFAULT]report error module_type=5, module_name=EZ9999
[ERROR] RUNTIME(10686,msame):2025-03-19-15:09:46.646.404 [device_error_proc.cc:634]10686 PrintCoreErrorInfo:[EXEC][DEFAULT]The error from device(0), serial number is 1, there is an aicore error, core id is 7, error code = 0x800000, dump info: pc start: 0x800124040115000, current: 0x1240401152d8, vec error info: 0x1723591d, mte error info: 0x300003e, ifu error info: 0x2ff3a3fa3fa80, ccu error info: 0xd4c90eef0056b77f, cube error info: 0xff, biu error info: 0, aic error mask: 0x65000200d000288, para base: 0x124040114180, errorStr: The DDR address of the MTE instruction is out of range.
[ERROR] RUNTIME(10686,msame):2025-03-19-15:09:46.646.434 [device_error_proc.cc:665]10686 PrintCoreErrorInfo:[EXEC][DEFAULT]report error module_type=5, module_name=EZ9999
[ERROR] RUNTIME(10686,msame):2025-03-19-15:09:46.646.440 [device_error_proc.cc:665]10686 PrintCoreErrorInfo:[EXEC][DEFAULT]The extend info from device(0), serial number is 1, there is aicore error, core id is 7, aicore int: 0x10, aicore error2: 0, axi clamp ctrl: 0, axi clamp state: 0x1717, biu status0: 0x101d14000000000, biu status1: 0x80000201020000, clk gate mask: 0, dbg addr: 0, ecc en: 0, mte ccu ecc 1bit error: 0x2780000000000000, vector cube ecc 1bit error: 0, run stall: 0x1, dbg data0: 0, dbg data1: 0, dbg data2: 0, dbg data3: 0, dfx data: 0xf8
[ERROR] RUNTIME(10686,msame):2025-03-19-15:09:46.646.471 [device_error_proc.cc:755]10686 ProcessCoreErrorInfo:[EXEC][DEFAULT]report error module_type=5, module_name=EZ9999
[ERROR] RUNTIME(10686,msame):2025-03-19-15:09:46.646.477 [device_error_proc.cc:755]10686 ProcessCoreErrorInfo:[EXEC][DEFAULT]The dha(mata) info from device(0), dha id is 0, dha status 1 info:0x23
[ERROR] RUNTIME(10686,msame):2025-03-19-15:09:46.646.508 [device_error_proc.cc:755]10686 ProcessCoreErrorInfo:[EXEC][DEFAULT]report error module_type=5, module_name=EZ9999
[ERROR] RUNTIME(10686,msame):2025-03-19-15:09:46.646.514 [device_error_proc.cc:755]10686 ProcessCoreErrorInfo:[EXEC][DEFAULT]The dha(mata) info from device(0), dha id is 1, dha status 1 info:0x3
[ERROR] RUNTIME(10686,msame):2025-03-19-15:09:46.649.425 [task_info.cc:523]10686 ReportErrorInfoForModelExecuteTask:[EXEC][DEFAULT]model execute error, retCode=0x91, [the model stream execute failed].
[ERROR] RUNTIME(10686,msame):2025-03-19-15:09:46.649.435 [task_info.cc:496]10686 PrintErrorInfoForModelExecuteTask:[EXEC][DEFAULT]model execute task failed, device_id=0, model stream_id=3, model task_id=1, flip_num=0, model_id=0, first_task_id=65535
[ERROR] RUNTIME(10686,msame):2025-03-19-15:09:46.649.508 [davinci_kernel_task.cc:1204]10686 PrintErrorInfoForDavinciTask:[EXEC][DEFAULT]Aicore kernel execute failed, device_id=0, stream_id=5, report_stream_id=3, task_id=2, flip_num=0, fault kernel_name=te_cast_06ca905b2a08ab8e9303e27ed61c5bddb78b7a7c9bce542cbfe68cef10938a088cbd93f630d5540a52d984102096c11480095129442a02e731d7f3fcdd25b90c_static_bin, fault kernel info ext=te_cast_06ca905b2a08ab8e9303e27ed61c5bddb78b7a7c9bce542cbfe68cef10938a08__kernel0, program id=0, hash=9889620186169183069.
[ERROR] RUNTIME(10686,msame):2025-03-19-15:09:46.649.555 [davinci_kernel_task.cc:1143]10686 GetArgsInfo:[EXEC][DEFAULT][AIC_INFO] args(0 to 2) after execute:0x9cb82200, 0x97908200,
[ERROR] RUNTIME(10686,msame):2025-03-19-15:09:46.649.564 [davinci_kernel_task.cc:1146]10686 GetArgsInfo:[EXEC][DEFAULT]tilingKey = 0, print 1 Times totalLen=(2*8)Bytes, argsSize=16, blockDim=8
[ERROR] RUNTIME(10686,msame):2025-03-19-15:09:46.649.573 [davinci_kernel_task.cc:1208]10686 PrintErrorInfoForDavinciTask:[EXEC][DEFAULT][AIC_INFO] after execute:args print end
[ERROR] GE(10686,msame):2025-03-19-15:09:46.649.604 [error_tracking.cc:110]10686 ErrorTrackingCallback: ErrorNo: 4294967295(failed) [EXEC][DEFAULT]Error happened, origin_op_name [trans_Cast_0], op_name [trans_Cast_0], task_id 2, stream_id 5.
[ERROR] RUNTIME(10686,msame):2025-03-19-15:09:46.649.676 [stream.cc:1081]10686 GetError:[EXEC][DEFAULT]Stream Synchronize failed, stream_id=3, retCode=0x91, [the model stream execute failed].
[ERROR] RUNTIME(10686,msame):2025-03-19-15:09:46.649.719 [stream.cc:1084]10686 GetError:[EXEC][DEFAULT]report error module_type=5, module_name=EZ9999
[ERROR] RUNTIME(10686,msame):2025-03-19-15:09:46.649.724 [stream.cc:1084]10686 GetError:[EXEC][DEFAULT]Aicore kernel execute failed, device_id=0, stream_id=5, report_stream_id=3, task_id=2, flip_num=0, fault kernel_name=te_cast_06ca905b2a08ab8e9303e27ed61c5bddb78b7a7c9bce542cbfe68cef10938a088cbd93f630d5540a52d984102096c11480095129442a02e731d7f3fcdd25b90c_static_bin, fault kernel info ext=te_cast_06ca905b2a08ab8e9303e27ed61c5bddb78b7a7c9bce542cbfe68cef10938a08__kernel0, program id=0, hash=9889620186169183069.
[ERROR] RUNTIME(10686,msame):2025-03-19-15:09:46.649.753 [stream.cc:1084]10686 GetError:[EXEC][DEFAULT]report error module_type=5, module_name=EZ9999
[ERROR] RUNTIME(10686,msame):2025-03-19-15:09:46.649.759 [stream.cc:1084]10686 GetError:[EXEC][DEFAULT][AIC_INFO] after execute:args print end
[ERROR] RUNTIME(10686,msame):2025-03-19-15:09:46.649.783 [model.cc:744]10686 SynchronizeExecute:[EXEC][DEFAULT]Fail to synchronize forbbiden stream_id=3, retCode=0x7150050!
[ERROR] RUNTIME(10686,msame):2025-03-19-15:09:46.649.791 [model.cc:774]10686 GetStreamToSyncExecute:[EXEC][DEFAULT]report error module_type=0, module_name=EE9999
[ERROR] RUNTIME(10686,msame):2025-03-19-15:09:46.649.796 [model.cc:774]10686 GetStreamToSyncExecute:[EXEC][DEFAULT]Model synchronize execute failed, model_id=0!
[ERROR] RUNTIME(10686,msame):2025-03-19-15:09:46.649.823 [api_error.cc:2350]10686 ModelExecute:[EXEC][DEFAULT]Execute model failed.
[ERROR] RUNTIME(10686,msame):2025-03-19-15:09:46.649.836 [api_c.cc:2028]10686 rtModelExecute:[EXEC][DEFAULT]ErrCode=507011, desc=[the model stream execute failed], InnerCode=0x7150050
[ERROR] RUNTIME(10686,msame):2025-03-19-15:09:46.649.842 [error_message_manage.cc:53]10686 FuncErrorReason:[EXEC][DEFAULT]report error module_type=3, module_name=EE8888
[ERROR] RUNTIME(10686,msame):2025-03-19-15:09:46.649.851 [error_message_manage.cc:53]10686 FuncErrorReason:[EXEC][DEFAULT]rtModelExecute execute failed, reason=[the model stream execute failed]
[ERROR] GE(10686,msame):2025-03-19-15:09:46.649.891 [davinci_model.cc:6369]10686 NnExecute: ErrorNo: 1343225859(Failed to call runtime API!) [EXEC][DEFAULT]Call rt api failed, ret: 0x7BC83
[ERROR] GE(10686,msame):2025-03-19-15:09:46.649.904 [graph_loader.cc:231]10686 ExecuteModel: ErrorNo: 507011() [EXEC][DEFAULT][Execute][Model] failed, model_id:1.
[ERROR] ASCENDCL(10686,msame):2025-03-19-15:09:46.649.914 [model.cpp:911]10686 ModelExecute: [EXEC][DEFAULT][Exec][Model]Execute model failed, ge result[507011], modelId[1]
[ERROR] ASCENDCL(10686,msame):2025-03-19-15:09:46.649.947 [model.cpp:2115]10686 aclmdlExecute: [EXEC][DEFAULT][Exec][Model]modelId[1] execute failed, result[507011]
aipp_op {
aipp_mode: static
related_input_rank: 0
input_format: RGB888_U8
src_image_size_w: 640
src_image_size_h: 640
crop: true
load_start_pos_w: 0
load_start_pos_h: 0
crop_size_w: 640
crop_size_h: 624
# RGB888_U8转BGR
csc_switch: false
rbuv_swap_switch: true
# int8->fp16
# 当uint8->fp16时,pixel_out_chx(i) = [pixel_in_chx(i) – mean_chn_i – min_chn_i] * var_reci_chn
# 每个通道的均值
# 类型:uint8
# 取值范围:[0, 255]
mean_chn_0: 0
mean_chn_1: 0
mean_chn_2: 0
# 每个通道的最小值
# 类型:float16
# 取值范围:[0, 255]
min_chn_0: 0.0
min_chn_1: 0.0
min_chn_2: 0.0
# 每个通道方差的倒数
# 类型:float16
# 取值范围:[-65504, 65504]
var_reci_chn_0: 0.00392156862745098
var_reci_chn_1: 0.00392156862745098
var_reci_chn_2: 0.00392156862745098
padding: true
left_padding_size: 0
right_padding_size: 0
top_padding_size: 0
bottom_padding_size: 16
padding_value: 0
}