最大熵方法中特征选择算法的改进与纠错排歧
Improvement of Feature Selection Algorithm in Maximum Entropy Model and Disambiguation of Error-Correction Candidates
-
摘要: 对应用最大熵原理建立语言模型的特征选取方法作了改进.用特征模板从训练样本中获得候选特征集,应用频次与平均互信息相结合的方法从候选特征集中选取特征.在选择有效特征时,对候选特征集中出现频次大于某一限值的特征或平均互信息很大的特征直接加入有效特征集,且不是每选一个特征都调用参数的求解过程,从而加快了特征选择的速度.将改进的算法应用于文本纠错建议的排歧,实验证明,所改进的特征选择算法有效.Abstract: An improved feature selection algorithm in maximum entropy modeling approach is presented.Candidate feature set is acquired from the training sample corpus using templates,and the features are selected from the candidate feature set according to the combination of feature frequency and average mutual information.When selecting the effective feature,features in the candidate set whose frequency or average mutual information value is larger than a threshold are put into the effective feature set directly.The execution of parameter acquisition algorithm is not for each choice of feature,so the speed of feature selection is improved.The improved model is applied to sort the candidates of error-correction.The experiment shows that it has higher efficiency and precision.
下载: