基于词袋绑定的问句新特征自动生成

Generation of New Type of Question Features Based on Bag-of-Words Binding

  • 摘要: 针对中文问句分类缺乏丰富的句法语义特征,提出一种基于词袋绑定的问句新特征自动生成方法. 在词袋(BOW)、词性(POS)和词义(WS)等基本特征的基础上,通过将词性、词义等与词袋分别进行绑定,自动获取一类新的问句特征即词袋绑定特征. 采用SVM分类器在哈工大中文问句集上实验,结果表明与原来单个的POS、WS等基本特征相比,对应的W/POS、W/WS等词袋绑定特征在分类精度上均获得了显著的提升;而且对这些词袋绑定特征进行启发式组合以后,在77个小类问题类别的总体分类精度达到82.333%,取得了较好的分类效果. 说明在基本特征基础上借助词袋绑定操作进一步构造问句新特征的方法简单而有效.

     

    Abstract: Aiming at difficulties from lack of rich syntax and semantic features for Chinese question classification, a method is proposed to automatically generate new types of features based on bag-of-words binding in this work. Considering the basic features of bag-of-words(BOW), part of speech(POS), word sense(WS) and others, new types of features could be generated by binding them with bag-of-words respectively, named as W/POS, W/WS, etc. Experiment has been implemented with SVM classifier and the Chinese question set provided by Harbin Institute of Technology. The results show that, compared with the basic features of POS, WS and others, the classification accuracies of bag-of-words binding features of W/POS, W/WS and others get significantly increase. Furthermore, the classification accuracy of the combined bag-of-words binding features for 77 question categories could be up to 82.333%, which indicates the effectiveness of the proposed method for question classification.

     

/

返回文章
返回
Baidu
map