Abstract:
Defects in deep learning model compilers risk model inference crashes, compromising deployment security and usability. Current defect detection methods suffer from inadequate code-line coverage and limited diversity in detectable defect types. Existing approaches rely on local operator constraints for detection, failing to trigger defects caused by multi-operator interactions, while semantic-preserving mutation strategies restrict the operator types in computation graph nodes, resulting in insufficient code-line coverage and significantly reducing defect detection rates. In this paper, a defect detection method was proposed, which employs multi-round prompting of LLMs to construct test cases. Prompts were created to guide LLMs in generating computation graphs, after which common operators were masked and substituted with rare ones. The graphs were iteratively updated to produce diverse test cases. Experimental results on multiple deep learning model compilers demonstrate that the proposed method significantly improves code coverage and defect detection rates compared to baseline approaches, exhibiting high reliability and practical value.