Abstract:
In order to get high compression ratio for a compresed Chinese text, the compression algorithm for unfixed length encoding set expansion encodes the text by matching for high compression ratio, based on a set of fixed dictionaries that comprise unfixed length and high frequency Chinese character strings following features of the Chinese language. This algorithm fits the Chinese character string as Markov message source. It also suits different lengths and the language style of the source data. This algorithm can result in higher compression ratio.