Abstract:
Points out the limitations of general text feature extraction method based on TfIdf in problems of text classification,and presents the standpoint that combines the term distribution characteristic,term frequency and document frequency to extract the text feature,thus giving a new method to compute term's weight,and a new way of text classification.Experiment showed that the method can keep the text's feature to a maximum,and avoid the problem of dimensional disaster in VSM effectively,so it can be applied in problems of large scale text classification.