LLM生成计算图的深度学习模型编译器缺陷检测

潘丽敏; 赵智洋; 邵思源; 罗森林; 张浩然

doi:10.15918/j.tbit1001-0645.2025.071

LLM生成计算图的深度学习模型编译器缺陷检测

Defect Detection in Deep Learning Model Compilers for LLM-Generated Computation Graphs

摘要

摘要: 深度学习模型编译器缺陷容易引发模型推理崩溃，严重影响模型的可用性和安全性，目前缺陷检测代码行覆盖严重不足、缺陷类型有限. 现有方法以局部算子为约束进行检测，多算子交互引发的缺陷触发困难；语义保持的变异策略限制了计算图节点算子的类型造成检测的代码行覆盖不足，较大影响了检出缺陷的数量. 本文提出多轮提示LLM构造测试用例的缺陷检测方法，创建提示词引导LLM生成计算图，再掩码掉常用算子替换为非常用算子，多次迭代更新计算图生成多样化测试用例. 在多种深度学习模型编译器上的实测结果表明，方法大幅提升了代码行测试覆盖率和检出缺陷数量，可靠性高，实用价值大.

Abstract: Defects in deep learning model compilers risk model inference crashes, compromising deployment security and usability. Current defect detection methods suffer from inadequate code-line coverage and limited diversity in detectable defect types. Existing approaches rely on local operator constraints for detection, failing to trigger defects caused by multi-operator interactions, while semantic-preserving mutation strategies restrict the operator types in computation graph nodes, resulting in insufficient code-line coverage and significantly reducing defect detection rates. In this paper, a defect detection method was proposed, which employs multi-round prompting of LLMs to construct test cases. Prompts were created to guide LLMs in generating computation graphs, after which common operators were masked and substituted with rare ones. The graphs were iteratively updated to produce diverse test cases. Experimental results on multiple deep learning model compilers demonstrate that the proposed method significantly improves code coverage and defect detection rates compared to baseline approaches, exhibiting high reliability and practical value.

HTML全文

参考文献(22)

施引文献

资源附件(0)