Abstract:
In the text categorization algorithm given in the paper,there are three involved objects vid.: feasible algorithm without dimension reduction,no space for super sparse vectors,and independent effectiveness and efficiency.These objects are turned into reality by means of category-feature database,category feature weight vector model,compressed document vector representation and improved Rocchio classifier.Contrasting experiments have been carried out on the same Reuters corpus with the CRF and improved kNN algorithm.It is proved that the method has better efficiency and tolerable effectiveness.