Abstract:
To improve the performance of multi-modal side-channel attacks and enhance the utilization rate of multi-modal leakage information, a cross-modal side-channel attack method based on Convolutional Neural Network(CNN) and cross-attention mechanism was proposed. First, a cross-modal attack network model was designed: CNN was used to extract leakage features from raw data, while bidirectional cross-attention mechanism was introduced for feature fusion. Subsequently, a Bayesian optimization algorithm tailored for side-channel networks was improved, and model parameter tuning was completed based on this algorithm. The experimental results demonstrate that in low Signal-to-Noise Ratio(SNR) scenarios characterized by severe noise interference and temporal misalignment, the proposed method effectively enhances multi-modal feature extraction capabilities and reduces trace consumption. Compared with other methods, the number of traces required to achieve a success rate of 1 and a guessing entropy of 0 on the low-SNR dataset is reduced by at least 9.8% and 13.6%, respectively.