一种快速文本归类算法的设计与实现

Design and Implementation of a Fast Text Categorization Algorithm

  • 摘要: 为实现无维数约减技术而使分类算法可行且不浪费空间存储的超稀疏文档向量,同时保证分类精度和速度且两者相互独立的目标,提出使用类别特征信息数据库、类别特征权重向量模型、待归类文档压缩向量表示法和改进的Rocchio分类算法等技术实现文档的高速归类.在相同的Reuters测试语料集上,与CRF算法和改进的kNN算法进行对比实验.结果表明,在基本不牺牲精度的情况下,归类算法的分类速度明显高于对比算法.

     

    Abstract: In the text categorization algorithm given in the paper,there are three involved objects vid.: feasible algorithm without dimension reduction,no space for super sparse vectors,and independent effectiveness and efficiency.These objects are turned into reality by means of category-feature database,category feature weight vector model,compressed document vector representation and improved Rocchio classifier.Contrasting experiments have been carried out on the same Reuters corpus with the CRF and improved kNN algorithm.It is proved that the method has better efficiency and tolerable effectiveness.

     

/

返回文章
返回
Baidu
map